☂️  Coherent update epic

### What you would like to be added?

Disaggregated inference splits the LLM serving into distinct (separately deployed and scaled) components - prefill (context generation) and decode (token generation). This introduces a hard operational constraint during version upgrades (often incompatible). Only compatible prefill and decode instances should communicate and the decode:prefill update ratio should be kept proportional else will be result in mismatched pools of compatible instances. This reduces the effective end-to-end serving capacity during the update.



### Why is this needed?

The goal of `coherent` update strategy is to maintain balanced compatible capacity across components so that version upgrades (especially incompatible) does not reduce serving capacity during upgrades.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

☂️ Coherent update epic #638

What you would like to be added?

Why is this needed?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

☂️ Coherent update epic #638

Description

What you would like to be added?

Why is this needed?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions