Skip to content

Add Nexus worker service for server-to-worker commands#708

Open
rkannan82 wants to merge 14 commits intokannan/add-worker-instance-key-to-wft-completefrom
kannan/activity-cancel
Open

Add Nexus worker service for server-to-worker commands#708
rkannan82 wants to merge 14 commits intokannan/add-worker-instance-key-to-wft-completefrom
kannan/activity-cancel

Conversation

@rkannan82
Copy link
Contributor

@rkannan82 rkannan82 commented Feb 3, 2026

Summary

Defines a Nexus service for server-to-worker communication, starting with activity cancellation support.

Design Decision

We chose a generic command API (WorkerCommandsRequest with oneof command types) instead of a cancel-specific API. This allows a future optimization to batch multiple commands (cancel, pause, etc) in a single request and deliver to a worker in one RPC.

Files

  • temporal/api/nexusservices/workerservice/v1/request_response.proto - Message definitions
  • temporal/api/nexusservices/workerservice/v1/service.yaml - Nexus service definition

Related

@rkannan82 rkannan82 requested review from a team as code owners February 3, 2026 21:30
@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch 3 times, most recently from 4b2e0d1 to 54edb8e Compare February 3, 2026 21:42
}

// Container for batching multiple control tasks delivered to a worker in one Nexus operation.
message WorkerControlTasks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO input to the worker Nexus service should provide the worker instance key the control call is targeting. This is a shared Nexus worker.

On a side note, I think we will want to implement a form of this end-to-end before we merge what we think the models needed will be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that we should have something in server and SDK working (in branches) with this definition before we merge

Copy link
Contributor Author

@rkannan82 rkannan82 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The WIP server side implementation is here: PR #9233.
I have a functional test that simulates SDK nexus poller and verifies it receives the cancel task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the server side implementation pending review in PR #9233. I have an e2e test that simulates the SDK behavior by polling on the control queue. Let me know if this suffices to unblock this PR.

@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch 2 times, most recently from afb40a2 to b920471 Compare February 4, 2026 05:43
@rkannan82 rkannan82 changed the title Add API to send control tasks from server to worker Add Nexus request and response payloads to cancel activities Feb 4, 2026
@rkannan82 rkannan82 requested a review from cretz February 4, 2026 05:50
@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch from b920471 to 84c0bac Compare February 4, 2026 05:51
@rkannan82 rkannan82 marked this pull request as draft February 4, 2026 05:55
@rkannan82 rkannan82 removed the request for review from ShahabT February 5, 2026 18:05
Copy link
Contributor

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though would want to see end-to-end working including in an SDK if possible before approving

@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch from 84c0bac to 28ea1dd Compare February 11, 2026 21:20
@rkannan82 rkannan82 changed the base branch from master to kannan/add-worker-instance-key-to-wft-complete February 11, 2026 21:23
@rkannan82 rkannan82 requested review from bergundy and cretz February 12, 2026 18:56
@rkannan82 rkannan82 marked this pull request as ready for review February 12, 2026 19:39
@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch from 7d3bfb7 to e0d0049 Compare February 12, 2026 19:43
@rkannan82 rkannan82 force-pushed the kannan/add-worker-instance-key-to-wft-complete branch from 61952c9 to 98350a9 Compare February 12, 2026 19:46
Add WorkerControlTasks, WorkerControlTask, and CancelActivityTask to worker/v1/message.proto
for pushing control messages to workers via Nexus control queue.
- Move Nexus command payloads to worker_nexus_service_commands.proto
- Rename to CancelActivitiesRequestPayload/ResponsePayload
- Use task_tokens instead of workflow_execution + scheduled_event_ids
- Add file-level documentation for Nexus conventions
Adds a nexusrpc.yaml file defining the temporal.api.worker.v1.WorkerService
for server-to-worker communication via Nexus. Uses $goRef to reference
proto-generated types.
@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch from e0d0049 to 6c21af4 Compare February 12, 2026 19:47
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, don't merge this until you've generated the code and verified that this API works across SDK and server.

There will probably be a separate code generation project that will construct the YAML from protos to prevent the need to define services in separate places but this is a good start IMHO.

Copy link
Contributor

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our first Nexus service, so lots of things undecided. I added comments, but would like to hear others' opinions.

nexusrpc: 1.0.0

services:
temporal.api.worker.v1.WorkerService:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we have to follow our gRPC naming here. We don't have a best practice since this is the first one, but I think just WorkerControlService is fine or TemporalWorkerControlService (task queue is enough to keep things separated) and similar file name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roey recommended this: You should use temporal.api.workerservice.v1.WorkerService IMHO to be consistent with other services defined in this repo.

Copy link
Contributor

@cretz cretz Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other services defined in this repo? gRPC ones? To confirm, are we saying Nexus services have to somehow be consistent with gRPC services? Surely it's understood they are already not consistent because they are completely different protocols/contracts.

Nexus services should be consistent with Nexus services and if there are gaps in expected consistency as Nexus use grows, we remedy it, but it's not correct to pretend completely unrelated proto/gRPC's rules/approaches have any bearing on Nexus.

activities need to be cancelled (e.g., workflow cancellation or
explicit activity cancellation).
input:
$goRef: "go.temporal.io/api/worker/v1.CancelActivitiesRequest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider using JSON schema here and no protos (but if we must use protos, we should reference all languages)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have ref for all languages.

@rkannan82 rkannan82 force-pushed the kannan/activity-cancel branch 2 times, most recently from 37ffb33 to 6c21af4 Compare February 27, 2026 22:19
Change from CancelActivitiesRequest to WorkerCommandsRequest with
repeated WorkerCommand using oneof type. This allows adding new
command types without changing the Nexus operation.

- Rename operation: cancel-activities -> execute-commands
- Request: WorkerCommandsRequest with WorkerCommand.CancelActivity
- Response: WorkerCommandsResponse with WorkerCommandResult.CancelActivityResult

Made-with: Cursor
…e/v1

New package: temporal.api.nexusservices.workerservice.v1

Made-with: Cursor
- Rename operation: execute-commands -> executeCommands
- Add refs for all languages: Go, Java, Python, TypeScript, .NET, Ruby

Made-with: Cursor
@rkannan82 rkannan82 changed the title Add Nexus request and response payloads to cancel activities Add Nexus worker service for server-to-worker commands Feb 27, 2026
@rkannan82 rkannan82 requested a review from bergundy February 27, 2026 22:56

// Response payload for the "executeCommands" Nexus operation.
message WorkerCommandsResponse {
repeated WorkerCommandResult results = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the expectation (at least, what I assume is the expectation... hence the need for a docstrings) that this results list must be 1:1 with the request list, of the same size and order.

Internal Nexus service for server-to-worker communication.
Used by the Temporal server to send commands to workers.
operations:
executeCommands:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
executeCommands:
ExecuteCommands:

message WorkerCommandsRequest {
repeated WorkerCommand commands = 1;

message WorkerCommand {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pull this out to a top level message and put it in temporal.api.worker.v1 to make it easier to use and available in other contexts, traditionally we have only put requests and responses in request_response.proto files.

}
}

message CancelActivity {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, make this a top level temporal.api.worker.v1.CancelActivityCommand.

message WorkerCommandsResponse {
repeated WorkerCommandResult results = 1;

message WorkerCommandResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


message WorkerCommandResult {
oneof type {
CancelActivity cancel_activity = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporal.api.worker.v1.CancelActivityResult

// --)

// Request payload for the "executeCommands" Nexus operation.
message WorkerCommandsRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API is called ExecuteCommands, the request and response should be named accordingly.

Suggested change
message WorkerCommandsRequest {
message ExecuteCommandsRequest {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants