Skip to content

feat(training): add accelerator_type filtering to training shape selection#308

Open
hershalb wants to merge 1 commit intomainfrom
hb/filter-b200-shapes
Open

feat(training): add accelerator_type filtering to training shape selection#308
hershalb wants to merge 1 commit intomainfrom
hb/filter-b200-shapes

Conversation

@hershalb
Copy link
Copy Markdown
Contributor

@hershalb hershalb commented Apr 7, 2026

Description

Add accelerator_type filtering to the training shape selector so jobs forced onto a specific accelerator (e.g. B200 via UseTrainingV2) fail fast when no matching shapes exist, instead of silently selecting the wrong accelerator's shapes and failing later with a cryptic 400.

Companion PR: https://github.com/fw-ai/fireworks/pull/21714 (updates training_shape_utils.py to thread accelerator_type from the job config)

Changes

  • ShapeSelectionRequest: new optional accelerator_type field (backward compatible default None)
  • _build_latest_validated_training_shape_filter / _build_compatible_training_shape_filter: add snapshot.accelerator_type clause to server-side filter when set
  • _compatible_training_shape_candidates: client-side accelerator check as belt-and-suspenders
  • _select_training_shape_candidate: thread accelerator_type to both filter builders
  • Split error reporting:
    • _format_internal_shape_error: full constraint details logged at ERROR (base_model, accelerator_type, mode, seq_len)
    • _format_user_facing_shape_error: generic message raised to users ("No training configuration available")
  • 4 new unit tests covering server filter, client filter, mismatch error, and backward compat

Testing

All 12 tests in test_training_shapes.py pass.

Made with Cursor

…ction

Filter training shape candidates by accelerator_type (both server-side
and client-side) so jobs forced onto B200 fail fast when no B200 shapes
exist. Logs detailed diagnostics at ERROR level; raises a generic
user-facing message.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant