Skip to content

Feature Request: Support ZAYA1-8B (Sparse MoE) and Markovian RSA Sampling #22776

@Juste-Leo2

Description

@Juste-Leo2

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

ZAYA1-8B is a sparse Mixture-of-Experts (MoE) model with 760M active parameters (8.4B total) developed by Zyphra. It introduces a reasoning method called Markovian RSA (Rational Speech Acts) sampling, which optimizes the chain-of-thought during inference to improve logical accuracy.

Motivation

It is an ultra-sparse model, making it ideal for local use.
Its performance is SOTA for its size

Possible Implementation

the GitHub fork of vllm with the following commit (Zyphra/vllm@641dc7a)
Zyphra's Technical Report: https://www.zyphra.com/zaya1-8b-technical-report

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions