Predictive Coding Tendencies in the GPT-2 Family

Multi-Method Empirical Evidence from Residual Stream Convergence, Activation Patching, MLP Transform Analysis, Zero-Ablation, and Logit Lens

📄 Paper: [arXiv link] (to be updated after submission) 👤 Author: Jong-O Yun — Independent Researcher, South Korea 📧 gallam.research@gmail.com 🔗 https://github.com/gallam-research-dev/pc-transformer-interpretability

Overview

This repository contains code and data for the paper:

“Predictive Coding Tendencies in the GPT-2 Family: Multi-Method Empirical Evidence from Residual Stream Convergence, Activation Patching, MLP Transform Analysis, Zero-Ablation, and Logit Lens”

I investigate whether GPT-2 family language models implicitly exhibit computational tendencies consistent with Predictive Coding (PC) without explicit design, using GPT-Neo 125M and Pythia 160M/410M as architectural controls.

Direct answer: I find no mechanistic proof that transformers implement PC, but consistent multi-method evidence that the GPT-2 family exhibits PC-like tendencies.

Models

Model	Params	Layers	Training Data	Tokenizer
GPT-2 Small	117M	12	WebText	GPT-2 BPE
GPT-2 Medium	345M	24	WebText	GPT-2 BPE
GPT-2 Large	774M	36	WebText	GPT-2 BPE
GPT-Neo 125M	125M	12	The Pile	GPT-2 BPE
Pythia 160M	160M	12	The Pile	NeoX
Pythia 410M	410M	24	The Pile	NeoX

Key Findings

Training-configuration-dependent convergence: English ranks first in mid-layer cosine similarity in all GPT-2 models; this pattern weakens in GPT-Neo and Pythia, implicating training configuration over architecture.
Mid-layer semantic specialization: Activation patching identifies normalized position 0.25–0.33 as the peak causal locus across all 6 models (range: 0.11–0.21).
W-shaped MLP magnitude: A spike–convergence–spike structure in GPT-2 is scale-invariant; GPT-Neo and Pythia show a dual-spike variant.
Convergence zone functional primacy: The lowest-magnitude zone causes the largest performance drop when removed. All GPT-2 95% CIs exclude zero: [0.25, 0.44], [0.36, 0.75], [0.53, 1.15].
Monotonic prediction refinement: Logit lens confirms Early Spike rank reduction rates 17–110× faster than the Convergence zone across GPT-2 models.

Setup

git clone https://github.com/gallam-research-dev/pc-transformer-interpretability
cd pc-transformer-interpretability
pip install -r requirements.txt

All experiments were run on Google Colab T4 GPU.

Reproducing the Paper

⚠️ Run one model at a time. T4 GPU RAM (15 GB) cannot hold all 6 models in a single session. Results accumulate across sessions via JSON.

Step-by-step

1. Open Google Colab with T4 GPU Runtime → Change runtime type → T4 GPU

2. Upload files

run_experiments.py
multi_model_results_v2.json   ← only needed when continuing a previous session

3. Set the target model — edit line 37 of run_experiments.py:

RUN_MODEL = "gpt2"   # change this each run

Model	Approx. RAM	Approx. time
`gpt2`	~6 GB	~15 min
`EleutherAI/gpt-neo-125M`	~6 GB	~15 min
`EleutherAI/pythia-160m`	~6 GB	~15 min
`gpt2-medium`	~9 GB	~25 min
`EleutherAI/pythia-410m`	~9 GB	~25 min
`gpt2-large`	~13 GB	~40 min

4. Run all cells

Results save to multi_model_results_v2.json. All 6 figures are generated and downloaded automatically.

5. Reset runtime between models Runtime → Factory reset runtime

6. Repeat for each model

Re-upload run_experiments.py + saved JSON, change RUN_MODEL, run again.

Tip: Google Drive (recommended)

from google.colab import drive
drive.mount('/content/drive')
# Set in run_experiments.py:
OUTPUT_JSON = "/content/drive/MyDrive/pc_results/multi_model_results_v2.json"

Repository Structure

pc-transformer-interpretability/
├── run_experiments.py        # All 5 experiments + all 6 figures (single script)
├── requirements.txt
├── README.md
├── LICENSE
├── figures/                  # All figures used in the paper
│   ├── figure1.png           # Mid-layer cosine similarity (6 models)
│   ├── figure2.png           # MLP transform magnitude by family
│   ├── figure3.png           # Activation patching heatmap
│   ├── figure4_delta_p.png   # delta_p heatmap (Appendix)
│   ├── figure5_ablation_improved.png  # Zero-ablation (clipped y-axis)
│   └── figure6_logit_lens.png         # Logit lens (split panels)
└── result/
    └── multi_model_results_v2.json    # All experiment results (Exp 1–5)

Citation

@article{yun2026predictive,
  title={Predictive Coding Tendencies in the GPT-2 Family:
         Multi-Method Empirical Evidence from Residual Stream Convergence,
         Activation Patching, MLP Transform Analysis, Zero-Ablation,
         and Logit Lens},
  author={Yun, Jong-O},
  journal={arXiv preprint},
  year={2026}
}

(arXiv ID to be updated after submission)

Acknowledgements

This work uses TransformerLens by Neel Nanda and Joseph Bloom. Experiments were conducted on Google Colab.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Coding Tendencies in the GPT-2 Family

Overview

Models

Key Findings

Setup

Reproducing the Paper

Step-by-step

Tip: Google Drive (recommended)

Repository Structure

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
result		result
LICENSE		LICENSE
README.md		README.md
paper_v1.pdf		paper_v1.pdf
requirements.txt		requirements.txt
run experiments.py		run experiments.py

Folders and files

Latest commit

History

Repository files navigation

Predictive Coding Tendencies in the GPT-2 Family

Overview

Models

Key Findings

Setup

Reproducing the Paper

Step-by-step

Tip: Google Drive (recommended)

Repository Structure

Citation

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages