Interpretable Latent Representations of Neural Activity via Sparse Autoencoders

Project repository for: Interpretable Latent Representations of Neural Activity via Sparse Autoencoders

Detailed written report available in PDF format in this repo.

This project explores post-hoc interpretability techniques for neural embeddings derived from population recordings of hippocampal activity. We apply sparse autoencoders (SAEs) to embeddings generated using CEBRA, and evaluate how sparsification affects attribution to neural and behavioral variables without sacrificing decoding performance.

We find that sparsified representations:

Activate fewer latent dimensions per sample,
Yield more concentrated correlations with individual neurons and behaviors,
Preserve or improve position and direction decoding accuracy.

Motivation

Latent variable models like CEBRA yield powerful embeddings of high-dimensional neural activity, but they are often hard to interpret. This project investigates whether sparsity-enforcing autoencoders can recover latent structure that is more attributable to observable neural and behavioral factors.

Methods

Embedding generation: We use CEBRA in supervised mode (labels = position + direction) to create latent embeddings with varying dimensionality (8, 16, 32, 64).
Sparsification: We train SAEs (standard, Top-K, JumpReLU) across a hyperparameter grid to transform dense embeddings into sparse codes.
Evaluation:
- Behavioral attribution: Correlation between latent dimensions and position/direction.
- Neuron attribution: Correlation between latents and raw neural channels.
- Decoding: Linear decoding of position ($R^2$) and direction (accuracy).
- Sparsity: Percent of active latent dimensions per sample.

Key Results

SAE latents show improved neuron and behavior attribution.
Decoding accuracy is preserved across most configurations.
Higher Top-K and lower embedding size improves behavioral interpretability; larger embeddings improve neural interpretability.

More visualizations and interpretation are included in the report.

Contact

If you're interested in collaborating, using this framework, or adapting it to other domains, feel free to reach out:

Arjun Naik
arjunsn@uw.edu

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
embedding_evals		embedding_evals
embedding_models		embedding_models
final_embeddings		final_embeddings
gridsearch_results		gridsearch_results
oldmodels		oldmodels
sae_evals		sae_evals
sae_gridsearch_2_fixed		sae_gridsearch_2_fixed
saved_models		saved_models
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
fix_models.py		fix_models.py
get_embeddings.py		get_embeddings.py
get_models.py		get_models.py
getdata.ipynb		getdata.ipynb
interpretable-neural-embeddings-poster.pdf		interpretable-neural-embeddings-poster.pdf
interpretable_neural_embeddings.pdf		interpretable_neural_embeddings.pdf
sae_architectures.py		sae_architectures.py
sae_training_sweep.py		sae_training_sweep.py
sae_training_utils.py		sae_training_utils.py
tunemodels.ipynb		tunemodels.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interpretable Latent Representations of Neural Activity via Sparse Autoencoders

Motivation

Methods

Key Results

Contact

About

Uh oh!

Releases

Packages

Languages

License

Aleph-Null-123/interpretable-neural-embeddings

Folders and files

Latest commit

History

Repository files navigation

Interpretable Latent Representations of Neural Activity via Sparse Autoencoders

Motivation

Methods

Key Results

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages