Skip to content

[E2E Model Support] Add TileGym kernel integration for Llama 4 #97

@cogniera

Description

@cogniera

Overview

This issue tracks the addition of end-to-end TileGym kernel support for Meta's Llama 4 model family.

I am planning to work on this and opening this issue to avoid duplicate efforts.

Approach

Llama 4 shares the core Llama architecture (RoPE, RMSNorm, SwiGLU MLP, GQA attention),
so the implementation will largely follow the existing Llama 3.1 integration pattern.

Planned steps:

  1. Add apply_tilegym_kernel_to_llama4 in monkey_patch.py
  2. Register it in MODEL_TYPE_TO_APPLY_TILEGYM_FN
  3. Handle Llama 4-specific differences (e.g. MoE experts in the Maverick variant)
  4. Add E2E inference test

Questions / Discussion Points

  • Is there a preferred inference framework to target, or is HuggingFace Transformers the expected path?
  • Does the Maverick MoE variant require a new dispatch path, or can it reuse the existing MoE infrastructure from DeepSeek V2?
  • What is the target GPU for validation?

Happy to discuss the approach before diving in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions