Use operand format signatures for kernel selection by antiagainst · Pull Request #230 · lightseekorg/tokenspeed

antiagainst · 2026-05-23T20:06:16Z

Replace KernelSpec.dtypes with format_signatures so registrations can describe role-specific operand tensor formats instead of a flat dtype set. Model each tensor role with TensorFormat and optional ScaleFormat, using storage_dtype consistently for payload and scale sidecar storage.

Convert in-tree GEMM, attention, embedding, quantization, MoE, reference, and plugin registrations to the new FormatSignature API. Move quantized GEMM and MoE fused format choices out of selection traits, with MoE fused call sites passing weight_format explicitly.

Update selection, numerics, benchmark, docs, and tests to resolve by full FormatSignature while preserving dtype-oriented convenience filters through primary_storage_dtype helpers.

Replace KernelSpec.dtypes with format_signatures so registrations can describe role-specific operand tensor formats instead of a flat dtype set. Model each tensor role with TensorFormat and optional ScaleFormat, using storage_dtype consistently for payload and scale sidecar storage. Convert in-tree GEMM, attention, embedding, quantization, MoE, reference, and plugin registrations to the new FormatSignature API. Move quantized GEMM and MoE fused format choices out of selection traits, with MoE fused call sites passing weight_format explicitly. Update selection, numerics, benchmark, docs, and tests to resolve by full FormatSignature while preserving dtype-oriented convenience filters through primary_storage_dtype helpers. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Keep registry imports aligned with main where the imported symbol set did not change. This leaves only the functional KernelSpec import removal in numerics/inputs.py as a semantic change. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Collapse parenthesized tokenspeed_kernel.signature imports when they import three or fewer symbols. Keep larger imports wrapped, and split the one long three-symbol import so formatter hooks do not rewrap it. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Remove ScaleFormat.layout and let TensorFormat.format carry representation-specific names such as mxfp8, mxfp4, and nvfp4. Keep ScaleFormat focused on scale storage, granularity, and optional block shape. Use dense TensorFormat entries for scaled FP8 payloads, with ScaleFormat recording tensor or channel granularity. Update GEMM numerics generation to identify MXFP8 block scales from the tensor format instead of duplicated scale layout metadata. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Restore fp8 as a distinct TensorFormat.format value for scaled FP8 GEMM and MoE signatures, while keeping ScaleFormat.layout removed and scale granularity normalized to tensor, channel, and block. Update test sample registrations to use register_kernel(..., signatures=...) instead of constructing KernelSpec(format_signatures=...) directly. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Rename KernelSpec.format_signature_for_storage_dtype to format_signature_for_primary_storage_dtype so callers see that the helper matches the signature primary storage dtype used by dtype-oriented filters. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Rename dense_format to dense_tensor_format so the helper name matches TensorFormat and the tensor_format helper. Update call sites and add examples to format_signature and format_signatures docstrings. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Document that each FormatSignature is one concrete operand-format combination with one TensorFormat per role. Expand the helper examples to show the concrete signatures produced by format_signature and format_signatures. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Clarify that warmup_selection without explicit ops picks one deterministic representative signature per registered operator and is not comprehensive. Direct model init code to pass exact format signatures and traits for hot-path warmup. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Remove the implicit [128, 128] fallback from GEMM numerics input generation so mxfp8 block-scaled signatures must provide block_shape metadata. Add regression coverage for missing block_shape and rename the Triton GEMM MXFP8 scale constant to emphasize that it describes block scale metadata. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Document what each moe_fused weight_format value means, including main tensor storage, scale storage, and how dtype disambiguates uint8 activations. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Add focused selection tests for exact mixed-operand signatures and kernels with multiple registered format signatures. Also keep optional backend placeholder helpers out of registry.__all__ while preserving explicit imports. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d37eeb2fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T16:46:18Z

+        format_signature(
+            x=tensor_format("fp8", torch.float8_e4m3fn, scale=_FP8_SCALE),
+            weight=tensor_format("fp8", torch.float8_e4m3fn, scale=_FP8_SCALE),
+        ),


Register dense-activation FP8 MoE CUTLASS signatures

The new _CUTLASS_FUSED_FORMAT_SIGNATURES only registers FP8 weight support when x is also FP8 (x=tensor_format("fp8", ...)), but moe_fused(..., weight_format="fp8") builds a dense-activation signature whenever dtype is torch.bfloat16/torch.float16 in _moe_fused_format_signature (ops/moe/__init__.py). That means valid pre-routed FP8 calls (previously selectable via traits={"weight_dtype": "fp8"} and BF16/FP16 dtypes) no longer match any flashinfer_cutlass_fused_moe registration and will fail selection with NoKernelFoundError.

Useful? React with 👍 / 👎.

lightseek-bot · 2026-05-24T18:54:40Z

@antiagainst please help fix the ci failures

antiagainst added 14 commits May 23, 2026 19:18

Restore registry import formatting

1c5b967

Keep registry imports aligned with main where the imported symbol set did not change. This leaves only the functional KernelSpec import removal in numerics/inputs.py as a semantic change. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Rename dense tensor format helper

fe90cf3

Rename dense_format to dense_tensor_format so the helper name matches TensorFormat and the tensor_format helper. Update call sites and add examples to format_signature and format_signatures docstrings. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

Format

4faadec

Refine some comments

4dc5350

Clarify MoE fused weight formats

97c73b5

Document what each moe_fused weight_format value means, including main tensor storage, scale storage, and how dtype disambiguates uint8 activations. Signed-off-by: Lei Zhang <antiagainst@gmail.com>

antiagainst marked this pull request as ready for review May 24, 2026 16:38

antiagainst requested a review from a team as a code owner May 24, 2026 16:38

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use operand format signatures for kernel selection#230

Use operand format signatures for kernel selection#230
antiagainst wants to merge 14 commits into
mainfrom
lei/operand-precison

antiagainst commented May 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Uh oh!

lightseek-bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

antiagainst commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

lightseek-bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antiagainst commented May 23, 2026 •

edited

Loading