Skip to content

[Feature] Support Speculative Decoding (MTP) #379

@wmertens

Description

@wmertens

Summary

Gemma and other models allow for speculative decoding.

Problem / Motivation

Generation is slow.

Proposed Solution

https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/

Alternatives Considered

Platform

  • Android
  • iOS
  • Both

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions