Add CoPE (Clipped RoPE) soft-clipping for zero-shot context extension by machiabeli · Pull Request #1344 · Blaizzy/mlx-vlm

machiabeli · 2026-06-10T06:02:44Z

Add CoPE (Clipped RoPE) as an opt-in rope option

This adds the soft-clipping strategy from CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs (reference implementation) to MRoPERotaryEmbedding, enabling zero-shot context extension past a model's native window via a config edit — no weight changes, no behavior change for any existing config.

Text-side counterpart: ml-explore/mlx-lm#1387.

Method

RoPE extrapolation past the trained context fails primarily because of the lowest-frequency components — their rotation periods exceed the pre-training window, so they never complete a full rotation during training and go out-of-distribution on longer sequences. CoPE attenuates exactly those components with a raised-cosine (Hann) taper on inv_freq:

Single hook point — the taper is applied where inv_freq is computed in MRoPERotaryEmbedding.__init__, so it propagates through every mrope style (interleaved, chunked, sectioned) and the fused Metal apply path with no kernel changes. A fully clipped component is simply inv_freq=0 (identity rotation).
Auto-sized clip — every component with period 2π/inv_freq[i] greater than original_max_position_embeddings is clipped (overridable via clip_n). For the Qwen3.5/3.6 family (rope_theta=10M, 64 rotary dims, native 262144) this clips 10 of 32 components.
Smooth rolloff — the mask goes 1 → 0 across the clipped range: the boundary component is untouched, the lowest-frequency component is frozen, avoiding the spectral leakage / attention ringing of a hard cutoff. Unlike YaRN, in-distribution frequencies are untouched, so short-context behavior is preserved.

Qwen3_5RotaryEmbedding now forwards rope_parameters so the Qwen3.5/3.6 family (262K-native) picks this up directly from config.

Usage

"rope_parameters": {
  "rope_type": "cope",
  "original_max_position_embeddings": 262144
}

Testing

mlx_vlm/tests/test_cope_rope.py: auto clip sizing, head-frequency preservation, taper monotonicity, frozen tail, explicit clip_n / no-op / bounds, ValueError without sizing info, MRoPERotaryEmbedding integration (on and off), and the Qwen3_5RotaryEmbedding passthrough. All passing.
End-to-end on Qwen3.6-35B-A3B (4-bit), config-driven via rope_parameters exactly as a user would enable it: taper verified in the constructed rotary embeddings on every full-attention layer (inv_freq[-1] == 0), with coherent text and image generation through the interleaved mrope + fused Metal apply path (a zero-frequency component reduces to the identity rotation, so no kernel changes are needed). Note: the config pipeline normalizes rope_type -> type; the hook accepts both keys.
Empirical validation of the same clip on the text side (Qwen3.6-35B-A3B, native 262K): needle-in-a-haystack at 393K tokens passes where default RoPE fails outright; short-context next-token distribution preserved (KL 0.0067 at 8K); no speed cost — details in mlx-lm#1387.

black clean.

Implements the soft-clipping strategy from "CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs" (arXiv:2602.05258) as an opt-in rope_parameters option. Low-frequency components whose rotation periods exceed the pre-training context window are attenuated with a raised-cosine taper applied to inv_freq inside MRoPERotaryEmbedding, so it propagates through every mrope style (interleaved, chunked, sectioned) and the fused Metal apply path with no kernel changes. Enabled via: "rope_parameters": {"rope_type": "cope", "original_max_position_embeddings": N} with optional explicit "clip_n". No behavior change for any existing config. Wired through Qwen3_5RotaryEmbedding for the Qwen3.5/3.6 family (e.g. 262K-native models extended toward 1M). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

machiabeli · 2026-06-10T07:44:24Z

Evidence grid for the shared clip math (perplexity deltas + depth × length NIAH, zero-shot on Qwen3.6-35B-A3B): ml-explore/mlx-lm#1387 (comment) — the inv_freq produced there is bit-identical to this implementation.

lucasnewman · 2026-06-11T15:20:01Z

@machiabeli Are there are models that use this in their config today? Or is this just speculative?

machiabeli · 2026-06-14T05:43:24Z

@machiabeli Are there are models that use this in their config today? Or is this just speculative?

I used it successfully with Qwen 3.6 35b A3b. I extended the context window to approx 400k with near 0 loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CoPE (Clipped RoPE) soft-clipping for zero-shot context extension#1344

Add CoPE (Clipped RoPE) soft-clipping for zero-shot context extension#1344
machiabeli wants to merge 1 commit into
Blaizzy:mainfrom
machiabeli:feat/cope-rope

machiabeli commented Jun 10, 2026 •

edited

Loading

Uh oh!

machiabeli commented Jun 10, 2026

Uh oh!

lucasnewman commented Jun 11, 2026

Uh oh!

machiabeli commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

machiabeli commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add CoPE (Clipped RoPE) as an opt-in rope option

Method

Usage

Testing

Uh oh!

machiabeli commented Jun 10, 2026

Uh oh!

lucasnewman commented Jun 11, 2026

Uh oh!

machiabeli commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

machiabeli commented Jun 10, 2026 •

edited

Loading