- [02/05/2026] CoPE was released on arXiv!
CoPE is a plug-and-play enchancement of RoPE that softly clips the unstable low-frequency components, delivering consistent gains both within the training context and during long-context extrapoaltion.
With a simple yet effective soft clipping strategy, CoPE
1️⃣ Eliminates severe OOD outliers, whose periods exceed the pre-training context window and are the primary cause of OOD extrapolation.
2️⃣ Refines Long-range Semantic Signals by alleviating the secret long-term decay of semantic attention introduced by RoPE.
3️⃣ Prevents Spectral Leakage induced by hard frequency truncation, which otherwise leads to long-range oscillatory ringing in the attention scores across relative token distances and introduces spurious correlations.
All our models and data are released on Hugging Face, including RoPE, HardClip, and CoPE checkpoints (64k) obtained via continued pre-training and SFT, starting from Llama-3-8B (8k). [Link]
Our training code is based on ProLong. And we recommend having two seperate environments for training and evaluation to avoid dependency conflicts.
- Setup training environment.
cd train/
bash setup_env.sh
- Download training data.
git clone https://huggingface.co/datasets/haoranli-ml/prolong-data-64K datasets/long-context-65536
git clone https://huggingface.co/datasets/haoranli-ml/prolong-ultrachat-64K datasets/prolong-ultrachat-64K
- Start training.
bash train_64k.sh
bash train_sft.sh
We note that the default method in training is CoPE, while you can easily switch to HardClip or vanilla RoPE by modifying CoPE/train/training/modeling_flash_llama_cope.py.
We primarily conduct evaluations on the HELMET benchmark given its diverse collection of real-world tasks, while also including results on synthetic benchmarks such as RULER and InfiniteBench. For standard short-context benchmark evaluations, we utilze lm-evaluation-harness.
- Setup Eval Environment
bash setup_eval_env.sh
- Replace the code in
CoPE/transformers-4.50.0/src/transformers/models/llama/modeling_llama.py
with CoPE/modeling_cope.py. And rename it back to modeling_llama.py. Note that you have to modify LlamaRotaryEmbedding in the modeling file to use CoPE/RoPE/HardClip.
@article{li2026cope,
title={CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs},
author={Li, Haoran and Ren, Sucheng and Yuille, Alan and Wang, Feng},
journal={arXiv preprint arXiv:2602.05258},
year={2026}
}

