Skip to content

mool32/clonal-crystallization-aging

Repository files navigation

License: MIT License: CC-BY 4.0 Preprint Companion arXiv

Clonal crystallization as a shared signature of bone-marrow aging and neural-network training

A cross-domain (LLM ↔ biology) application of a single two-metric framework — Gini and effective N — to cell-type compositional aging and to attention-head differentiation in transformer training

Theodor Spiro | ORCID 0009-0004-5382-9346 | tspiro@vaika.org

📄 Preprint: paper/draft_v2.pdf — manuscript draft, in preparation 🧮 Main analysis script: scripts/07_pythia_overlay.py (the cross-domain overlay) 📦 Companion paper (arXiv): Universal statistical signatures of evolution in artificial intelligence architectures, Spiro 2026, arXiv:2604.10571 — the broader DFE-universality hypothesis this paper instantiates in a specific substrate pair


Brief Summary

Aging of cell-type composition in biological tissues and functional differentiation of trained neural networks have been studied in disjoint literatures. We apply a single two-metric framework — the Gini coefficient and the effective number of contributing components (Hill-1 number, eff_N) — to mouse cell-type proportion distributions and to Pythia-410M head-importance distributions, and demonstrate:

  1. Pythia training and bone-marrow aging move in the same quadrant of the (ΔGini, Δeff_N) plane. Pythia: (+0.145, −45.5) over 143k training steps, concentrating function into fewer heads. Mouse bone marrow on Smart-seq2 FACS: (+0.088, −2.19); 10x Droplet: (+0.038, −0.54), driven by myeloid skewing and loss of B-lineage differentiation intermediates.
  2. Kidney and Limb Muscle also crystallize at coarse cell-type granularity; Lung and Spleen fall in the opposite (dispersion) quadrant, driven by immune-compartment expansion. The dispersion direction for Lung and Spleen replicates on the independent Kimmel et al. (2019) cohort at all five Leiden clustering resolutions tested.
  3. Granularity is a primary parameter, not a nuisance. Kidney direction flips with clustering granularity within both cohorts (crystallization at coarse, dispersion at fine). A within-cell-type Leiden analysis of TMS Kidney resolves the apparent paradox: podocytes crystallize (p = 0.016 on both metrics) while macrophages disperse (p = 0.004). The tissue-level result is the net of opposing sub-type dynamics.
  4. Caloric restriction in rat bone marrow reverses the aging shift. On the Calico atlas (Zou et al. 2022), CR rescues 64% of the Gini drift and 57% of the eff_N drift; cell-level bootstrap places P(rescue > 0) = 1.000 on both metrics. Biological replication remains limited (n = 2 per condition); treat as proof-of-concept.
  5. The correspondence is narrower than full DFE universality. This work demonstrates a specific substrate-independent compositional signature, not a claim of causal equivalence between transformer training and bone-marrow aging. It is a proof-of-concept instance of the broader hypothesis developed in the companion arXiv preprint.

Datasets

Dataset Source Use
Tabula Muris Senis (FACS + Droplet) figshare doi:10.6084/m9.figshare.12654728 (Schaum et al. 2020) Primary cohort; 4 ages × 23 tissues × 2 platforms
Kimmel et al. 2019 GEO GSE132901 Independent cohort for direction validation across 5 Leiden resolutions
Calico rat aging atlas GEO GSE141784 (Zou et al. 2022) Caloric-restriction rescue test on bone marrow
Pythia 410M ablation CSV Companion repo mool32/functional-differentiation-dfe Head-importance trajectory across 8 checkpoints

Repository structure

├── paper/
│   ├── draft_v2.md          # Manuscript source (Markdown)
│   └── draft_v2.pdf         # Compiled PDF (built by scripts/15_build_pdf.py)
├── scripts/                 # 16 numbered pipeline scripts (00-15) + utils.py
├── data/                    # 24 intermediate CSVs (committed; reproduces every paper number)
├── figures/                 # 13 publication PNGs at 300 DPI
├── component_analysis.md    # Cell-type drivers of tissue-level deltas
├── kimmel_validation_summary.md
├── pythia_overlay_summary.md
├── results_summary.md
├── results_addendum.md
└── substate_summary.md

Pipeline scripts (numbered execution order)

Step Script Purpose
0 00_explore.py Structural summary of TMS FACS and Droplet arms
1 01_compute_proportions.py Per (mouse, tissue) cell-type proportions
2 02_compute_metrics.py Per (mouse, tissue) Gini / eff_N / count
3 03_make_figures.py Tissue-level metrics vs age
4 04_summary.py Per-tissue young-vs-old Mann-Whitney + BH-FDR
5 05_platform_concordance.py FACS-vs-Droplet concordance
6 06_component_analysis.py Cell-type drivers of tissue-level deltas
7 07_pythia_overlay.py The cross-domain overlay — Pythia trajectory in (Gini, eff_N) plane
8 08_substate_analysis.py Within-cell-type Leiden granularity test
9 09_substate_figure.py Scale-invariance figure (Kidney vs Spleen)
10 10_kimmel_validation.py Independent cohort (Kimmel 2019), Leiden 0.8
11 11_kimmel_robustness.py Kimmel at 5 Leiden resolutions
12 12_calico_cr_marrow.py CR rescue test on Calico rat bone marrow
13 13_tms_kidney_leiden.py TMS Kidney methodological symmetry test
14 14_overlay_with_ci.py Pythia overlay with per-mouse bootstrap CI
15 15_build_pdf.py Compile draft_v2.mddraft_v2.pdf with embedded figures

Figures (300 DPI PNG)

Figure File Paper ref
1 fig_pythia_overlay_v2.png Pythia + biology in (Gini, eff_N) plane with bootstrap CI
2 fig_platform_concordance.png FACS vs Droplet per-tissue medians
3 fig_substate_scale.png Sub-cell-type Leiden, Kidney vs Spleen
4 fig_kimmel_validation.png Independent cohort validation
5 fig_calico_cr_rescue.png CR rescue of marrow crystallization
S1 fig_kimmel_robustness.png Kimmel direction vs clustering resolution
S2 fig_direction_heatmap.png Cross-platform direction heat map
S3 fig_kidney_symmetry.png TMS kidney at matching Leiden resolutions

Reproducing the analysis

Data dependencies (download separately)

The repository commits all intermediate CSVs, so all paper numbers reproduce from data/ without re-downloading the upstream data. To re-run from scratch you need:

  • Tabula Muris Senis FACS + Droplet .h5ad (figshare doi:10.6084/m9.figshare.12654728)
  • Kimmel 2019 preprocessed .h5ad (GEO GSE132901, see github.com/mjibanezsole/aging_pipeline for preprocessing)
  • Calico rat aging atlas .h5ad (GEO GSE141784)
  • Pythia 410M ablation CSV (companion repo mool32/functional-differentiation-dfe)

Update scripts/utils.py and scripts/07_pythia_overlay.py to point at your local data paths.

Environment

git clone https://github.com/mool32/clonal-crystallization-aging.git
cd clonal-crystallization-aging
pip install scanpy anndata numpy pandas scipy matplotlib leidenalg fpdf2

Run

# Numbered scripts execute in order; each is self-contained.
python scripts/01_compute_proportions.py
python scripts/02_compute_metrics.py
python scripts/03_make_figures.py
python scripts/04_summary.py
python scripts/05_platform_concordance.py
python scripts/06_component_analysis.py
python scripts/07_pythia_overlay.py        # cross-domain overlay
python scripts/08_substate_analysis.py
python scripts/09_substate_figure.py
python scripts/10_kimmel_validation.py
python scripts/11_kimmel_robustness.py
python scripts/12_calico_cr_marrow.py
python scripts/13_tms_kidney_leiden.py
python scripts/14_overlay_with_ci.py
python scripts/15_build_pdf.py             # compiles paper/draft_v2.pdf

Total runtime ≈ 2 hours on a single workstation; peak memory ≈ 16 GB when the Droplet .h5ad (7.7 GB) is loaded.

Citation

@article{spiro2026crystallization,
  author  = {Spiro, Theodor},
  title   = {Clonal crystallization as a shared signature of bone-marrow aging and neural-network training},
  journal = {bioRxiv},
  year    = {2026},
  note    = {Manuscript in preparation. Companion paper: arXiv:2604.10571}
}

And the underlying data sources (TMS, Kimmel, Calico rat, Pythia) per their own citation policies.

Contact

Theodor Spiro — tspiro@vaika.org

License

  • Code (scripts/, utils.py): MIT (see LICENSE)
  • Data (data/*.csv): CC-BY 4.0, with upstream citation requirements honored for TMS, Kimmel, and Calico datasets
  • Figures (figures/*.png): CC-BY 4.0
  • Manuscript (paper/draft_v2.md and .pdf): CC-BY 4.0

About

Clonal crystallization as a shared signature of bone-marrow aging and neural-network training (cross-domain: LLM ↔ biology, applies arXiv:2604.10571 to a specific substrate pair)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages