Skip to content

goodfire-ai/memorization_kfac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-FAC curvature edit (minimal)

Two commands to reproduce the K-FAC treatment used in our paper.

TL;DR

Compute A = E[aa^T] (pre-activation inputs) and G = E[gg^T] (pre-activation gradients) per MLP projection, decompose them, and keep only the top curvature mass when editing each weight W. This suppresses rote recitation while preserving shared structure.

Requirements

  • Python 3.10+
  • PyTorch (CUDA recommended)
  • NumPy
  • (If using HF models) transformers

Install your environment as usual, or add a requirements.txt and run uv pip install -r requirements.txt.

Usage

1. Collect K-FAC factors

python data/collect_kfac_multilayer.py \
  --model-size 7b \
  --layers 28,29,30,31 \
  --projections gate,up,down \
  --out data/kfac_factors/olmo2-7b

Streams text through model, saves A = E[aa^T] and G = E[gg^T] per MLP projection.

2. Apply edit & evaluate

python evaluations/eval_mem_kfac.py \
  --model-size 7b \
  --layers-json '{"31": {"gate": 0.8, "up": 0.8, "down": 0.8}}' \
  --use-cache

Keep-mass ∈ [0,1] controls how much curvature to retain.

Outputs

  • Printed/saved metrics from the evaluator (e.g., perplexity and any configured memorization metrics).
  • Optionally, an edited state dict / checkpoint depending on script flags.

Citation

If this code helps your work, please cite the paper:
OpenReview: https://openreview.net/pdf?id=MzRDxPUmgK

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages