Skip to content

BatchRunner: actor-pool execution mode for warm model loads #13

@korbonits

Description

@korbonits

Follow-up from v0.6 scope-trimming. Deferred from the initial BatchRunner implementation (v0.6 v1 uses stateless map_batches(fn, ...) with a module-level backend cache inside worker processes).

What v0.6 v1 does

BatchRunner runs backend inference as a stateless Ray Data function. A module-level _BACKEND_CACHE inside each worker process means load() fires once per worker rather than once per batch — good enough for small/medium models.

What stateless misses

For models with very expensive load() (FLUX ≈ 30-60s, GraphCast ≈ similar, SDXL ≈ 20-40s), every new Ray worker process pays the full load cost. Ray Data can spin up many short-lived workers under some schedules — the load cost dominates.

Proposal: opt-in actor-pool mode

Add a BatchSpec.compute: Literal["tasks", "actors"] = "tasks" field (or similar).

When compute="actors":

  • BatchRunner.run() switches to ds.map_batches(_BackendActor, compute=ActorPoolStrategy(size=num_actors), ...)
  • _BackendActor.__init__ calls backend.load() once per actor
  • __call__ runs batch_predict per batch

New BatchSpec fields:

  • num_actors: int — pool size (independent of ResourceConfig.replicas, which is serving semantics)
  • Existing num_gpus / num_cpusnum_gpus_per_actor / num_cpus_per_actor

Lifecycle wrinkles to handle

  • Cold start: first batch per actor blocks on load(). For FLUX that's minutes. Document this; users can pre-warm with a dummy batch if needed.
  • GPU reservation: each actor pins its num_gpus slot for its lifetime. Pool sizing needs clear docs (num_actors * num_gpus_per_actor <= cluster GPUs).
  • Shutdown hygiene: Ray Data kills actors at end of map_batches, but driver crashes can leave actors lingering. Handle with a finally-block cleanup or a periodic-sweep GC.

When to ship

After the v1 JSONL pipeline is proven end-to-end and we've added a GPU-heavy backend to the batch test matrix (likely FLUX or SDXL smoke via Modal).

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions