feat(training): add accelerator_type filtering to training shape selection#308
Open
feat(training): add accelerator_type filtering to training shape selection#308
Conversation
…ction Filter training shape candidates by accelerator_type (both server-side and client-side) so jobs forced onto B200 fail fast when no B200 shapes exist. Logs detailed diagnostics at ERROR level; raises a generic user-facing message. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add
accelerator_typefiltering to the training shape selector so jobs forced onto a specific accelerator (e.g. B200 viaUseTrainingV2) fail fast when no matching shapes exist, instead of silently selecting the wrong accelerator's shapes and failing later with a cryptic 400.Companion PR: https://github.com/fw-ai/fireworks/pull/21714 (updates
training_shape_utils.pyto threadaccelerator_typefrom the job config)Changes
ShapeSelectionRequest: new optionalaccelerator_typefield (backward compatible defaultNone)_build_latest_validated_training_shape_filter/_build_compatible_training_shape_filter: addsnapshot.accelerator_typeclause to server-side filter when set_compatible_training_shape_candidates: client-side accelerator check as belt-and-suspenders_select_training_shape_candidate: threadaccelerator_typeto both filter builders_format_internal_shape_error: full constraint details logged at ERROR (base_model, accelerator_type, mode, seq_len)_format_user_facing_shape_error: generic message raised to users ("No training configuration available")Testing
All 12 tests in
test_training_shapes.pypass.Made with Cursor