Use operand format signatures for kernel selection#230
Conversation
Replace KernelSpec.dtypes with format_signatures so registrations can describe role-specific operand tensor formats instead of a flat dtype set. Model each tensor role with TensorFormat and optional ScaleFormat, using storage_dtype consistently for payload and scale sidecar storage. Convert in-tree GEMM, attention, embedding, quantization, MoE, reference, and plugin registrations to the new FormatSignature API. Move quantized GEMM and MoE fused format choices out of selection traits, with MoE fused call sites passing weight_format explicitly. Update selection, numerics, benchmark, docs, and tests to resolve by full FormatSignature while preserving dtype-oriented convenience filters through primary_storage_dtype helpers. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Keep registry imports aligned with main where the imported symbol set did not change. This leaves only the functional KernelSpec import removal in numerics/inputs.py as a semantic change. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Collapse parenthesized tokenspeed_kernel.signature imports when they import three or fewer symbols. Keep larger imports wrapped, and split the one long three-symbol import so formatter hooks do not rewrap it. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Remove ScaleFormat.layout and let TensorFormat.format carry representation-specific names such as mxfp8, mxfp4, and nvfp4. Keep ScaleFormat focused on scale storage, granularity, and optional block shape. Use dense TensorFormat entries for scaled FP8 payloads, with ScaleFormat recording tensor or channel granularity. Update GEMM numerics generation to identify MXFP8 block scales from the tensor format instead of duplicated scale layout metadata. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Restore fp8 as a distinct TensorFormat.format value for scaled FP8 GEMM and MoE signatures, while keeping ScaleFormat.layout removed and scale granularity normalized to tensor, channel, and block. Update test sample registrations to use register_kernel(..., signatures=...) instead of constructing KernelSpec(format_signatures=...) directly. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Rename KernelSpec.format_signature_for_storage_dtype to format_signature_for_primary_storage_dtype so callers see that the helper matches the signature primary storage dtype used by dtype-oriented filters. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Rename dense_format to dense_tensor_format so the helper name matches TensorFormat and the tensor_format helper. Update call sites and add examples to format_signature and format_signatures docstrings. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Document that each FormatSignature is one concrete operand-format combination with one TensorFormat per role. Expand the helper examples to show the concrete signatures produced by format_signature and format_signatures. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Clarify that warmup_selection without explicit ops picks one deterministic representative signature per registered operator and is not comprehensive. Direct model init code to pass exact format signatures and traits for hot-path warmup. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Remove the implicit [128, 128] fallback from GEMM numerics input generation so mxfp8 block-scaled signatures must provide block_shape metadata. Add regression coverage for missing block_shape and rename the Triton GEMM MXFP8 scale constant to emphasize that it describes block scale metadata. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Document what each moe_fused weight_format value means, including main tensor storage, scale storage, and how dtype disambiguates uint8 activations. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
Add focused selection tests for exact mixed-operand signatures and kernels with multiple registered format signatures. Also keep optional backend placeholder helpers out of registry.__all__ while preserving explicit imports. Signed-off-by: Lei Zhang <antiagainst@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d37eeb2fa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| format_signature( | ||
| x=tensor_format("fp8", torch.float8_e4m3fn, scale=_FP8_SCALE), | ||
| weight=tensor_format("fp8", torch.float8_e4m3fn, scale=_FP8_SCALE), | ||
| ), |
There was a problem hiding this comment.
Register dense-activation FP8 MoE CUTLASS signatures
The new _CUTLASS_FUSED_FORMAT_SIGNATURES only registers FP8 weight support when x is also FP8 (x=tensor_format("fp8", ...)), but moe_fused(..., weight_format="fp8") builds a dense-activation signature whenever dtype is torch.bfloat16/torch.float16 in _moe_fused_format_signature (ops/moe/__init__.py). That means valid pre-routed FP8 calls (previously selectable via traits={"weight_dtype": "fp8"} and BF16/FP16 dtypes) no longer match any flashinfer_cutlass_fused_moe registration and will fail selection with NoKernelFoundError.
Useful? React with 👍 / 👎.
|
@antiagainst please help fix the ci failures |
Replace KernelSpec.dtypes with format_signatures so registrations can describe role-specific operand tensor formats instead of a flat dtype set. Model each tensor role with TensorFormat and optional ScaleFormat, using storage_dtype consistently for payload and scale sidecar storage.
Convert in-tree GEMM, attention, embedding, quantization, MoE, reference, and plugin registrations to the new FormatSignature API. Move quantized GEMM and MoE fused format choices out of selection traits, with MoE fused call sites passing weight_format explicitly.
Update selection, numerics, benchmark, docs, and tests to resolve by full FormatSignature while preserving dtype-oriented convenience filters through primary_storage_dtype helpers.