ViLM

A research project into generative models that refine text using Vi command sequences.

ViLM is not an editor. It is a research project to design a new type of generative model.

The core idea is to move beyond standard autoregressive generation (which is forward-only) and create a model that generates text by refining it, much like a human writer. This model learns to build its output by generating a sequence of Vi commands, allowing it to go back, delete, insert, and move.

The Core Concept: Vi-based "Diffusion"

This approach is inspired by diffusion models, where an output is iteratively refined.

Standard LLM: prompt -> token 1 -> token 2 -> token 3... (Cannot go back)
ViLM: prompt -> iHello<Esc> -> Oworld<Esc> -> ggkP<Esc> -> ...

The ViLM model uses a classical transformer architecture, but it is trained as a stateful agent. It learns a policy to edit a text buffer. The "refinement steps" are Vi commands, giving the model the physical ability to navigate and edit its own output in a non-linear way.

The goal is to research a model that can refine text, fix its own mistakes, and "think" in terms of actions rather than just predicting the next token.

📁 Project Structure

This repository is organized logically across environment, model, and data:

./vi_gym/: The Gym Environment. A minimal, from-scratch Rust application simulating the Vi editor deterministically. It exposes an HTTP API (/init_session, /get_state, /act) rendering an XML-like observation of the notepad and cursor, and accepts Vi actions. This acts as our low-latency backend.
./train_utils/: The Agent & Training Scripts. Python tools for the local inference loop connecting to the Rust server (agent.py), causal dataset generation for ASCII art tasks (generate_causal_dataset.py), and Supervised Fine-Tuning.
./data/: Datasets. Stores raw and processed training outputs, like our structural ASCII learning dataset.

🚀 Current Status & Timeline

We are currently building in phases. Phase 1 (SFT) is complete.

Phase 1: Supervised Fine-Tuning (Complete)

We trained an experimental 0.8B parameter model (ViLM-0.8b) fine-tuned on Qwen/Qwen3.5-0.8B-Base. It acts as a behavioral cloning baseline for grid navigation on ASCII-based manipulation tasks.

Successes: The model successfully learned the strict grammar of Vi. It can navigate (h, j, k, l), use quantifiers (10j, 12o), and understands the relationship between Insert Mode (i, a, o) and escaping (<Esc>). It developed an emergent "Canvas Builder" routing, autonomously opening blank lines and padding space to scaffold drawings.
Limitations: As a 0.8B SFT model, it lacks zero-shot spatial reasoning. It often falls into "macro-loops" (e.g., trying to move up when already at the top boundary) because the deterministic environment rejects the move and returns an identical state. To mitigate this currently, inference requires temperature sampling.

Phase 2: Reinforcement Learning (Next Steps)

To break the model out of purely imitating patterns and teach it actual spatial reasoning and task completion, we will transition to RL (GRPO/PPO). This step is for the time being not completely solved out yet.

🤖 Agent Specification

The model is trained on a precise, XML-based communication format.

1. Token & Vocabulary Specification

The LLM's vocabulary must be extended to include:

<BOS>: (Input only) Signals the start of a prompt.
</command>: (Output only) Signals the LLM has finished its command sequence.
<Esc>: (Output only) The token for the Esc key.
<Enter>: (Output only) The token for the Enter/Return key (used for literal newline insertion).
<Tab>: (Output only) The token for the Tab key.

2. Specification: Normal Mode

The LLM generates non-interactive Vi commands based on its knowledge of the state.

Input (from Environment → LLM):

<BOS>
<notepad>
1 |<cursor>const name = "world";
2 |
3 |function hello() {
4 |  console.log("Hello, " + name);
5 |}
6 |
7 |function goodbye() {
8 |  console.log("Goodbye, " + name);
9 |}
10|
</notepad>
<mode>Normal</mode>
<prompt>Find the `hello` function, copy the whole function, and paste it below the `goodbye` function.</prompt>
<command>

Output (from LLM → Environment):

3GV%y9Gp</command>

3. Specification: Insert Mode

The LLM generates literal text and must explicitly use <Esc> to return to Normal mode.

Input (from Environment → LLM):

<BOS>
<notepad>
1 |const name = "world";
2 |
3 |function hello() {
4 |  console.log("Hello, " + name);<cursor>
5 |}
...
</notepad>
<mode>Insert</mode>
<prompt>Add a new line that says 'This is an example.' and then stop.</prompt>
<command>

Output (from LLM → Environment):

<Enter>  This is an example.<Esc></command>

🎮 Running the Local Agent

If you are looking to test the model dynamically, you need both the Rust backend server and the Python inference loop running.

Start the Environment (Rust Server):

cd vi_gym
cargo run --release -- --serve

Run the Agent (Python): (Note: Ensure you have uv installed, as we use uv.lock for dependency management.)
```
cd train_utils
uv run agent.py
```

This setup hooks the LLM to the deterministic Vi environment, letting you observe it taking actions in real time!

If you expect something even vaguely useful for now, be ready to be disapointed. The current SFT model is very much a proof of concept and struggles with basic instructions, even very close to training data. The next phase of RL training is where we hope to see significant improvements in the model's ability to reason spatially and complete tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
train_utils		train_utils
vi_gym		vi_gym
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViLM

The Core Concept: Vi-based "Diffusion"

📁 Project Structure

🚀 Current Status & Timeline

Phase 1: Supervised Fine-Tuning (Complete)

Phase 2: Reinforcement Learning (Next Steps)

To break the model out of purely imitating patterns and teach it actual spatial reasoning and task completion, we will transition to RL (GRPO/PPO). This step is for the time being not completely solved out yet.

🤖 Agent Specification

1. Token & Vocabulary Specification

2. Specification: Normal Mode

3. Specification: Insert Mode

🎮 Running the Local Agent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ViLM

The Core Concept: Vi-based "Diffusion"

📁 Project Structure

🚀 Current Status & Timeline

Phase 1: Supervised Fine-Tuning (Complete)

Phase 2: Reinforcement Learning (Next Steps)

To break the model out of purely imitating patterns and teach it actual spatial reasoning and task completion, we will transition to RL (GRPO/PPO). This step is for the time being not completely solved out yet.

🤖 Agent Specification

1. Token & Vocabulary Specification

2. Specification: Normal Mode

3. Specification: Insert Mode

🎮 Running the Local Agent

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages