Deciphering LLM Internal Representations Through Systematic Mechanistic Interpretability
By Fabrice Fils-Aimé | Independent Researcher | May 2026
In 1822, Jean-Francois Champollion cracked Egyptian hieroglyphs using the Rosetta Stone as a bilingual key.
This project applies the same logic to large language models. We treat known input-output pairs as our Rosetta Stone, hidden-state vectors as the hieroglyphs, and use established interpretability techniques to systematically decode what a transformer is computing at each layer.
The result: a six-step framework that unifies logit lens analysis, linear probes, and activation patching under a single coherent methodology.
| Metric | Value |
|---|---|
| Held-out classification accuracy | 95% |
| Statistical significance | p < 0.0001 (permutation test, 10,000 shuffles) |
| Model | GPT-2 Medium (345M parameters) |
| Compute cost | $0 (Google Colab free tier, T4 GPU) |
| Prompts verified | 44/60 (73% clean-pass rate) |
| Best probe layer | Layer 9 (mid-network) |
| Causal layers identified | Layers 18-23 (activation patching) |
| Step | Champollion (1822) | This Protocol |
|---|---|---|
| 1 | Find the Rosetta Stone | Create controlled prompts with known outputs |
| 2 | Locate cartouches | Extract hidden-state vectors at every layer |
| 3 | Cross-compare scripts | Apply logit lens to track prediction emergence |
| 4 | Build a partial alphabet | Train linear probes (5-fold CV) |
| 5 | Validate with Coptic | Test on held-out prompts |
| 6 | Discover the mixed system | Activation patching for causal discovery |
The full experiment runs in a single Colab notebook. No paid APIs. No institutional access required.
- Open the notebook in Google Colab (click the badge above)
- Run Cell 0 (installs dependencies, restarts runtime)
- Reconnect and run all remaining cells
- Total runtime: approximately 15 minutes on a free T4 GPU
The Champollion Protocol is a starting point. Several research directions remain open:
- Scale to larger models. GPT-2 Medium (345M) served as our testbed. Applying the same six steps to GPT-2 XL, Pythia, or LLaMA models would test whether the framework generalizes across architectures and parameter counts.
- Expand category coverage. The current Rosetta Stone uses three semantic categories (capitals, translations, scientific facts). Adding syntactic, logical, or multi-hop reasoning categories would probe deeper computational structures.
- Replace linear probes with nonlinear classifiers. Linear probes test for linearly separable representations. Nonlinear probes (MLPs, kernel methods) could reveal structure that the current approach misses.
- Integrate with the tuned lens. Belrose et al. (2023) showed that a learned affine transformation per layer outperforms the raw logit lens. Combining tuned lens with the Champollion framework could sharpen the layer-by-layer decoding.
- Automate the pipeline. The six steps are currently manual. Wrapping them into a single Python package with a CLI would lower the barrier for other researchers.
- Cross-model activation transfer. Use activation patching across different models (not just within one) to test whether similar representations emerge in independently trained transformers.
Contributions, issues, and pull requests are welcome.
This framework builds on foundational interpretability research:
- nostalgebraist (2020). Interpreting GPT: the logit lens. LessWrong.
- Belrose et al. (2023). Eliciting Latent Predictions from Transformers with the Tuned Lens.
- Olsson et al. (2022). In-context learning and induction heads. Transformer Circuits Thread.
- Nanda et al. (2023). TransformerLens library for mechanistic interpretability.
- Vig et al. (2020). Causal mediation analysis for transformers. NeurIPS.
- Anthropic (2025). Tracing the thoughts of a large language model.
The Champollion Protocol does not introduce new techniques. It contributes a unifying framework and an archaeological metaphor that makes the methodology coherent and accessible.
- AQLES - Probing quality-evaluative geometry in transformer hidden states
- Verifeye Forensic Suite - AI forensic tool for deepfake detection (96.33% accuracy)
- SmartReview AI - From TF-IDF to fine-tuned DistilBERT for sentiment analysis
Model internals: TransformerLens (open-source library) Compute: Google Colab free tier (T4 GPU) Documentation assistance: Claude (Anthropic)
@misc{filsaime2026champollion,
title={The Champollion Protocol: Deciphering LLM Internal Representations},
author={Fils-Aim\'{e}, Fabrice},
year={2026},
url={https://huggingface.co/fabthebest/champollion-protocol}
}- GitHub: github.com/fabthebest
- HuggingFace: huggingface.co/fabthebest/champollion-protocol
License: Apache 2.0