-
Notifications
You must be signed in to change notification settings - Fork 33.4k
[new model] Add Zyphra/ZAYA1-8B #45862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b35c5e0
d26fffc
8191d39
c125ef3
c90df6f
b90759f
7e29999
cf083aa
69d09f3
d936d54
733e687
eb7c8cc
4d5bda4
f3e8e02
f4f206c
059912d
7c48ee1
498c252
3d63061
4d74296
d77d5d4
1c16fec
3f53fbc
dc7ac50
8be4b1e
7bb5122
b315ae0
0df3204
d362c90
f617896
bc55e03
6ad8e9f
ecb80ed
12f3f95
db1db76
86215d5
6460002
ebeb8c3
d71306f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| <!--Copyright 2026 the HuggingFace Inc. team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was released on 2026-05-06 and added to Hugging Face Transformers on 2026-05-26.* | ||
|
|
||
| # ZAYA | ||
|
|
||
| ## Overview | ||
|
|
||
| ZAYA1 is a 760M active / 8.4B total parameter MoE language model trained by Zyphra. It combines Compressed | ||
| Convolutional Attention (CCA), a nonlinear ZAYA1 router, and residual scaling. | ||
|
|
||
| ZAYA1 uses the Gemma 3 tokenizer. For more details, see the [ZAYA1 model card](https://huggingface.co/Zyphra/ZAYA1-8B) | ||
| and Zyphra's technical reports. | ||
|
|
||
| This model was contributed by [JJJYmmm](https://github.com/JJJYmmm). | ||
|
|
||
| ## Usage examples | ||
|
|
||
| ```python | ||
| from transformers import AutoModelForCausalLM, AutoTokenizer | ||
|
|
||
| model_id = "Zyphra/ZAYA1-8B" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we need another repo for these then since the weights need to be restructured 🤔 cc @nanduruganesh Best would be to have some other repo with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My plan is to update "Zyphra/ZAYA1-8B" for this upstream merge and move the current checkpoint there to "Zyphra/ZAYA1-8B-Legacy" to support people still on the old runtime (e.g. the vllm / llama cpp currently still depends on old checkpoint). |
||
| tokenizer = AutoTokenizer.from_pretrained(model_id) | ||
| model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") | ||
|
|
||
| inputs = tokenizer.apply_chat_template( | ||
| [{"role": "user", "content": "Write a haiku about recursion in programming."}], | ||
| tokenize=True, | ||
| add_generation_prompt=True, | ||
| enable_thinking=False, | ||
| return_tensors="pt", | ||
| ) | ||
| inputs = inputs.to(model.device) | ||
| outputs = model.generate(**inputs, max_new_tokens=256) | ||
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | ||
| ``` | ||
|
|
||
| ## ZayaConfig | ||
|
|
||
| [[autodoc]] ZayaConfig | ||
|
|
||
| ## ZayaModel | ||
|
|
||
| [[autodoc]] ZayaModel | ||
| - forward | ||
|
|
||
| ## ZayaForCausalLM | ||
|
|
||
| [[autodoc]] ZayaForCausalLM | ||
| - forward | ||
|
JJJYmmm marked this conversation as resolved.
|
|
JJJYmmm marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Copyright 2026 Zyphra and The HuggingFace Inc. team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_zaya import * | ||
| from .modeling_zaya import * | ||
|
|
||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 | ||
| # This file was automatically generated from src/transformers/models/zaya/modular_zaya.py. | ||
| # Do NOT edit this file manually as any edits will be overwritten by the generation of | ||
| # the file from the modular. If any change should be done, please apply the change to the | ||
| # modular_zaya.py file directly. One of our CI enforces this. | ||
| # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 | ||
| # Copyright 2026 Zyphra and the HuggingFace Inc. team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import Any, Literal | ||
|
|
||
| from huggingface_hub.dataclasses import strict | ||
|
|
||
| from ...configuration_utils import PreTrainedConfig | ||
| from ...modeling_rope_utils import RopeParameters | ||
| from ...utils import auto_docstring | ||
|
|
||
|
|
||
| @auto_docstring(checkpoint="Zyphra/ZAYA1-8B") | ||
| @strict | ||
| class ZayaConfig(PreTrainedConfig): | ||
| r""" | ||
| lm_head_bias (`bool`, *optional*, defaults to `False`): | ||
| Whether to add a bias to the language modeling head. | ||
| router_hidden_size (`int`, *optional*, defaults to 256): | ||
| Hidden size used by the ZAYA router. | ||
| cca_time0 (`int`, *optional*, defaults to 2): | ||
| First temporal parameter of the CCA projection. | ||
| cca_time1 (`int`, *optional*, defaults to 2): | ||
| Second temporal parameter of the CCA projection. | ||
|
|
||
| ```python | ||
| >>> from transformers import ZayaConfig, ZayaModel | ||
|
|
||
| >>> configuration = ZayaConfig() | ||
| >>> model = ZayaModel(configuration) | ||
|
|
||
| >>> configuration = model.config | ||
| ``` | ||
| """ | ||
|
|
||
| model_type = "zaya" | ||
| keys_to_ignore_at_inference = ["past_key_values"] | ||
|
|
||
| base_model_fsdp_plan = { | ||
| "embed_tokens": "free_full_weight", | ||
| "layers.*": "free_full_weight", | ||
| "norm": "keep_full_weight", | ||
| } | ||
|
|
||
| vocab_size: int = 262272 | ||
| hidden_size: int = 2048 | ||
| num_hidden_layers: int = 40 | ||
| num_attention_heads: int = 8 | ||
| num_key_value_heads: int = 2 | ||
| hidden_act: str = "silu" | ||
| max_position_embeddings: int = 131072 | ||
| initializer_range: float = 0.02 | ||
| rms_norm_eps: float = 1e-5 | ||
| use_cache: bool = True | ||
| tie_word_embeddings: bool = True | ||
| rope_parameters: RopeParameters | dict | None = None | ||
| sliding_window: int | None = None | ||
| attention_dropout: float | int = 0.0 | ||
| moe_intermediate_size: int = 2048 | ||
|
|
||
| num_experts_per_tok: int = 1 | ||
| num_experts: int = 16 | ||
| output_router_logits: bool = False | ||
| layer_types: list[str] | None = None | ||
| pad_token_id: int | None = 0 | ||
| bos_token_id: int | None = 2 | ||
| eos_token_id: int | list[int] | None = 106 | ||
|
|
||
| # Zaya-specific attention | ||
| head_dim: int = 128 | ||
| attention_bias: bool = False | ||
|
|
||
| lm_head_bias: bool = False | ||
| router_hidden_size: int = 256 | ||
| cca_time0: int = 2 | ||
| cca_time1: int = 2 | ||
|
|
||
| def __post_init__(self, **kwargs): | ||
| self.layer_types = ["hybrid"] * self.num_hidden_layers if self.layer_types is None else list(self.layer_types) | ||
|
|
||
| default_rope_params: dict[Literal["hybrid", "hybrid_sliding"], dict[str, Any]] = { | ||
| "hybrid": { | ||
| "rope_type": "default", | ||
| "rope_theta": 5_000_000.0, | ||
| "partial_rotary_factor": 0.5, | ||
| }, | ||
| "hybrid_sliding": { | ||
| "rope_type": "default", | ||
| "rope_theta": 10_000.0, | ||
| "partial_rotary_factor": 0.5, | ||
| }, | ||
| } | ||
| if self.rope_parameters is None: | ||
| self.rope_parameters = default_rope_params | ||
|
|
||
| super().__post_init__(**kwargs, ignore_keys_at_rope_validation={"hybrid", "hybrid_sliding"}) | ||
|
|
||
| def convert_rope_params_to_dict(self, **kwargs): | ||
| # No legacy flat RoPE format is supported here; conversion writes the nested ZAYA layer-type format directly. | ||
| return kwargs | ||
|
|
||
| def validate_architecture(self): | ||
| """Part of ``@strict``-powered validation.""" | ||
| if self.num_experts_per_tok != 1: | ||
| raise ValueError("ZAYA currently supports `num_experts_per_tok=1` only.") | ||
| if self.num_attention_heads % self.num_key_value_heads != 0: | ||
| raise ValueError("`num_attention_heads` must be a multiple of `num_key_value_heads`.") | ||
| if "hybrid_sliding" in self.layer_types and self.sliding_window is None: | ||
| raise ValueError("`sliding_window` must be set when `layer_types` contains `hybrid_sliding`.") | ||
|
|
||
|
|
||
| __all__ = ["ZayaConfig"] |
Uh oh!
There was an error while loading. Please reload this page.