Mechanistic interpretability · Human visual attention · Independent researcher
对大模型使用逆向工程以探索可提升能力的未知途径.
继往开来.
Pinned Loading
-
amr_wtf
amr_wtf Public一种通过替换模型自答为教师回答来提取和施加行为向量的工具包。 GHOST: Ghost-Host Output Steering Toolkit — Extract and apply behavior vectors from language models by swapping their own answers with teacher answers.
Python
-
-
amr_honesty
amr_honesty Public一种模型内部自我自信度量化技术. A 100K-parameter prefix module that lets a frozen Qwen express its own internal uncertainty (read-out, not retraining).
Python 2
-
AMR_ReplaceNeuron
AMR_ReplaceNeuron Public神经元编辑和回收 Two ways to "replace" a neuron's behaviour in a frozen Gemma 4 E2B-it — catgirl single-column rewrite + LeetCode 233 LoRA/delta-W.
Python 1
-
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.