This is the official implementation of our paper “ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought”.
While Chain-of-Thought (CoT) significantly enhances the performance of Large Language Models (LLMs), explicit reasoning chains introduce substantial computational redundancy. Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space, but often suffer from severe performance degradation due to the lack of appropriate compression guidance. In this study, we propose Rendered CoT-Guided variational Latent Reasoning (ReGuLaR), a simple yet novel latent learning paradigm resolving this issue. Fundamentally, we formulate latent reasoning within the Variational Auto-Encoding (VAE) framework, sampling the current latent reasoning state from the posterior distribution conditioned on previous ones. Specifically, when learning this variational latent reasoning model, we render explicit reasoning chains as images, from which we extract dense visual-semantic representations to regularize the posterior distribution, thereby achieving efficient compression with minimal information loss. Extensive experiments demonstrate that ReGuLaR significantly outperforms existing latent reasoning methods across both computational efficiency and reasoning effectiveness, and even surpasses CoT through multi-modal reasoning, providing a new and insightful solution to latent reasoning.
Note: the LLM trained by ReGuLaR still follows the standard latent reasoning paradigm, accepting pure text inputs and imposing no extra computational cost during inference.
- SOTA Performance: ReGuLaR significantly outperforms existing latent reasoning methods, achieving state-of-the-art performance with minimal reasoning length.
- Extreme Compression: Even when compressing all reasoning information into one latent reasoning state, ReGuLaR maintains superior performance across all model scales and datasets.
- Multi-Modal Reasoning: By rendering non-textual elements alongside text, ReGuLaR natively supports multi-modality within its latent reasoning processes, enabling it to surpass explicit CoT in complicated reasoning scenarios.
We have provided an env.yml file that contains the necessary environment dependencies.
To set up your environment, please execute:
conda env create -f env.yml
conda activate ReGuLaRPlease download required models from HuggingFace using the following script:
cd models
python model_download.py <YOUR_ACCESS_TOKEN>ReGuLaR is designed to be compatible with any reasoning dataset as long as each data sample within the dataset is formatted as the following JSON schema:
{
"image_idx": "Unique identifier for subsequent rendering",
"question": "Problem statement",
"steps": "Reasoning chain",
"answer": "Final answer"
}For reference, the GSM8K-Aug dataset has been provided in the ./datasets folder, please unzip it before use.
Since the rendering function is predefined and the visual encoder remains frozen in our work, we pre-compute visual representations offline before training, thereby reducing computational overhead.
cd data_precessing
python image_render.py GSM8K-Aug
python representation_extract.py GSM8K-Augbash train.sh GSM8K-Augpython run.py \
--test_ckpt_path=/path/to/trained/model.ckpt \
dataset_name=GSM8K-AugWe extend our sincere gratitude to CoLaR, DeepSeek-OCR and Glyph for their great work and codebase, which served as the foundation for developing ReGuLaR.
If you find ReGuLaR useful in your research, please consider citing it. 😊
@article{Wang2026ReGuLaR,
title={ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought},
author={Wang, Fanmeng and Liu, Haotian and Zhao, Guojiang and Xu, Hongteng and Gao, Zhifeng},
journal={arXiv preprint arXiv:2601.23184},
year={2026}
}


