Add int8 shared-block bigram+cache prototype by sauravtom · Pull Request #704 · openai/parameter-golf

sauravtom · 2026-03-25T12:05:57Z

Summary

add shared-block transformer support (12 logical layers from a 2-block bank) plus LSQ-lite QAT hooks in train_gpt.py
add bigram + short cache logit fusion, label smoothing, EMA/SWA options, and int8 export extras for priors/config
add submission folder records/track_10min_16mb/2026-03-25_Int8_Bigram_Cache_Proto with README, submission.json, train.log placeholder, and runnable train script copy

This PR is a prototype submission package while compute-backed full 8xH100 run logs are pending.

Add record: int8 bigram cache prototype

365dfc0