Skip to content

[DO NOT MERGE] Standalone Nexus Operations#685

Draft
bergundy wants to merge 22 commits intotemporalio:masterfrom
bergundy:standalone-nexus-op
Draft

[DO NOT MERGE] Standalone Nexus Operations#685
bergundy wants to merge 22 commits intotemporalio:masterfrom
bergundy:standalone-nexus-op

Conversation

@bergundy
Copy link
Member

@bergundy bergundy commented Dec 9, 2025

What changed?

  • Added the full API scope for standalone nexus operations.

Server PR

  • Not implemented yet, which is why this PR is still in draft.


// The number of attempts made to start/deliver the operation request.
// This number represents a minimum bound since the attempt is incremented after the request completes.
int32 attempt = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is attempt to deliver the start request. Will we support overall operation retry in the future? Will this name be confusing if we do? Maybe we should call it start_attempt so that people will not confuse it with activity attempt which has a different meaning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this for consistency with PendingNexusOperationInfo.

string request_id = 19;

// Operation token. Only set for asynchronous operations after a successful StartOperation call.
string operation_token = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we said we didn't want to expose this to callers? They should only have one way of referencing their operations: their caller-side operation ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still worth exposing this information as we do for workflow callers.

Copy link
Contributor

@stephanos stephanos Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no horse in this race, but I'm curious, why is it useful to have?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful for debugging and can be used in the direct Nexus APIs to reattach to the same operation (future capability).

// The run ID of the operation, useful when run_id was not specified in the request.
string run_id = 1;

// Stage to wait for. The operation may be in a more advanced stage when the poll is unblocked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, is this the stage the original request sent? Or does it represent the current stage of the operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current stage. Let me fix the docstring.

// Updated on terminal status.
int64 state_transition_count = 10;
// Updated once on scheduled and once on terminal status.
int64 state_size_bytes = 11;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally a field only present in list? It was mentioned for standalone activities that everything in list was expected to be in describe.

Also, for standalone activities it was mentioned there would be a tool that would make sure everything in list was also in describe result. Can we prioritize that? It's a lot of effort for me to have to continually confirm our assertion on every PR and find these issues since we chose not to reuse types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to call this out that we don't have this guarantee for schedules or batch which are much older archetypes: https://github.com/temporalio/api/blob/master/temporal/api/schedule/v1/message.proto https://github.com/temporalio/api/blob/master/temporal/api/workflowservice/v1/request_response.proto#L1715-L1751.

I don't think this guarantee needs to be high priority but we should keep track of it because I do think that it is nice to have. Ideally the SDKs would allow the types to have completely different fields, there's no need to reuse the models here.

Copy link
Contributor

@cretz cretz Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this guarantee/promise was made as part of not reusing models knowing the SDK will need this guarantee. Was not expecting a "nice to have" guarantee when the promise/guarantee was made.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take this offline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, we will write a tool soon.

// Response to a successful UnpauseWorkflowExecution request.
message UnpauseWorkflowExecutionResponse { }

message StartNexusOperationRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional we don't have Priority support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, priorities only apply to durable matching queues, those are not used for nexus tasks.

stephanos and others added 9 commits February 23, 2026 12:35
# Conflicts:
#	openapi/openapiv2.json
#	openapi/openapiv3.yaml
#	temporal/api/errordetails/v1/message.proto
#	temporal/api/nexus/v1/message.proto
#	temporal/api/workflowservice/v1/request_response.proto
#	temporal/api/workflowservice/v1/service.proto
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
string blocked_reason = 7;

// A reason that may be specified in the CancelNexusOpertionRequest.
string reason = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was set to 24 and I changed it; I don't see why it should be 24.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wonder why we don't have a linter for this?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasta probably. Thanks for catching this.


// How long this operation has been running for, including all attempts and backoff between attempts.
// Elapsed time from schedule_time to now for running operations or to close_time for closed
// operations, including all attempts and backoff between attempts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifies it works for running operations (as opposed to NexusOperationListInfo); correct me if that's wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// Status of a standalone Nexus operation.
// The status is updated once, when the operation is originally scheduled, and again when the operation reaches a
// terminal status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to NexusOperationListInfo where it seems to be applicable; and only there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's applicable here too.

Copy link
Contributor

@stephanos stephanos Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see so UNSPECIFIED -> RUNNING -> [terminal state].

Copy link
Member Author

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanos added a few more comments but this is almost good to merge AFAIC.

}

// A link to a standalone Nexus operation execution.
message NexusOperationExecution {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the naming is inconsistent with BatchJob and WorkflowEvent. Neither of those use the term Execution. I'm on the fence whether this is acceptable or not.

NEXUS_OPERATION_EXECUTION_STATUS_CANCELED = 4;
// The operation was terminated. Termination happens immediately without notifying the handler.
NEXUS_OPERATION_EXECUTION_STATUS_TERMINATED = 5;
// The operation has timed out by reaching the specified schedule-to-close timeout.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be any timeout now that we've added start-to-close and schedule-to-close.

// Requesting to cancel an operation does not automatically transition the operation to canceled status, depending
// on the current operation status and the cancelation type used.
NEXUS_OPERATION_EXECUTION_STATUS_CANCELED = 4;
// The operation was terminated. Termination happens immediately without notifying the handler.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to future us: This comment may not be true if we add a Nexus equivalent of a "parent close policy".

string blocked_reason = 7;

// A reason that may be specified in the CancelNexusOpertionRequest.
string reason = 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasta probably. Thanks for catching this.

// This is the only timeout settable for a Nexus operation.
// (-- api-linter: core::0140::prepositions=disabled
// aip.dev/not-precedent: "to" is used to indicate interval. --)
google.protobuf.Duration schedule_to_close_timeout = 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add start_to_close_timeout and schedule_to_start_timeout already. Those are already supported by the server for workflow ops.


// How long this operation has been running for, including all attempts and backoff between attempts.
// Elapsed time from schedule_time to now for running operations or to close_time for closed
// operations, including all attempts and backoff between attempts.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants