research notes for repurtum

core idea:

the core idea that this research lab is working on is to do interpretability research using type theory.

core basis for how are going about it :

phase1:

going through these research papers to understand the core way that this field is moving forward:
- Validating Mech Interpretations: An Axiomatic Approach
  - Core Idea: Inspired by the notion of abstract interpretation from the program analysis literature that aims to develop approximate semantics for programs, we give a set of axioms that formally characterize a mechanistic interpretation as a description that approximately captures the semantics of the neural network under analysis in a compositional manner
Moving on how typed chain of thought is moving forward:
- Typed Chain-of-Thought : Using Curry Howard Framework for Verifying LLM Reasoning
  - Core Idea: Represent methods to extract and map the informal , natural language steps of CoT into a formal, typed proof structure.
  - If we convert a CoT trace into a well-typed proof , it will serve as a strong, verifiable certificate of its computational faithfulness , moving away from heuristics based interpretability towards a more formal verification based approach.
Then after we have done the mechanisitic interpretability to compositional interepretability
- From Mechanistic to Compositional Interpretability
  - Core Idea: Introduces the notion of compositional interpretability, using category theory framework grounded in the principles of compositionally and minimum description length.
  - These are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour.
  - Then deconstruct explanation quality into measuring the faithfulness and complexity to cast interpretability as a constrained optimization problem, and then introduce compressive refinement to systematically restructure the models into simpler parts without altering their function.
  - At the end prove tha a parsimony criterion under which synactic compression theoretically gurantees more concise , human-aligned explanations.
Move into improving model performance via compact proofs
- Compact Proofs of Model Performance via Mechanistic Interpretability
  - Core Idea: Formally prove the accuracy lower bounds for a small transformer trained on Max-of-k, validating proof transferability across 151 randoom seeds and 4 values of K.
  - Using quantitative metrics , find that shorter proofs seem to require and provide more mechanistic understanding.
  - Find that more faithful mechanisirtic undertstanding leads to tighter performance bounds.
  - Confirm these connections by qualitatively examining a subset of our proofs.
  - Identify compounding structureless errors as a key challenge for using mech interp to generate compact proofs on model performance.
Mental Model corresponding from Type Theory to Mech Interp:
- Program -> Trained Model
- Type -> Behaviour / property
- Proof -> Circuit / Explanation
- Type Checking -> Faithfulness validation
- Abstract Interpretation -> Axiomatic mech Interp
- Curry Howard -> Typed CoT
- Categorical Semantics -> Compositional Interpretability
- Normal Form / Compression -> MDL-Based explanation quality
- Compositionality -> Circuit Composition / layer-wise refinement

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.obsidian		.obsidian
assets		assets
papers		papers
project-azul		project-azul
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

research notes for repurtum

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

research notes for repurtum

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages