-
Notifications
You must be signed in to change notification settings - Fork 26
Training Recipes
Every model shipped in the demo was trained on a single RTX 5060 Ti (16 GB) in pure PyTorch with bf16 autocast. Recipes below are the exact commands used to produce the checkpoints in production.
All scripts live under training/.
Standard MobileFaceNet (Chen et al. 2018) topology, width-scaled to four sizes, ArcFace head with the numerically-stable angle-addition margin.
cd training/scripts
# Prepare MS1M-RefineV2 (already pre-decoded into a raw blob, see
# prepare_lfw.py + the dataset.py loader)
python train.py --arch nano --epochs 25 --batch 512 --lr 1e-3
python train.py --arch tiny --epochs 25 --batch 384 --lr 1e-3
python train.py --arch standard --epochs 25 --batch 256 --lr 1e-3
python train.py --arch xs --epochs 25 --batch 192 --lr 1e-3| Variant | Params | LFW (after YuNet 5-pt alignment) | Time |
|---|---|---|---|
| nano | 0.20 M | 95.62% | ~6 h |
| tiny | 0.45 M | 96.85% | ~8 h |
| standard | 0.93 M | 98.25% | ~10 h |
| xs | 2.07 M | 99.07% | ~14 h |
Eval against the LFW 6,000-pair protocol after each epoch:
python lfw_eval.py --ckpt runs/xs/best.pt --pairs lfw_pairs.txtcd training/face_detect/scripts
python prepare_wider.py # downloads WIDER_train + WIDER_val (2.5 GB)
python train.py --epochs 80 --batch 32 --input-size 320
python export.py # → wasm/facex_detect.onnx (~400 KB)100 K params, FCOS-style anchor-free heads at strides 8/16/32. Best recall in production: 0.275 on the full WIDER FACE val (the metric includes 4-pixel faces our 320×320 input simply can't see; in webcam use the practical recall is ~95%).
cd training/landmark/scripts
python prepare_wflw.py
python train.py --epochs 60 --batch 64
python export_lm.py1.15 M params, MobileFaceNet-style backbone with a dense head. NME ~4.85% on the WFLW test split.
cd training/landmark3d/scripts
python pre_decode.py # pre-render labels via mediapipe
python train.py --epochs 40 --batch 128Final error: xy 0.54 px, z 0.51 (normalized) on held-out val. Rendering uses the WFLW 98-pt model to drive TPS over the 478 MP points → 576 visible mesh points in the demo.
We don't retrain the anti-spoof, just convert the upstream MinivisionAI weights (Apache 2.0) and run them through our own nn2 inference path at ~2× ONNX Runtime speed.
cd training/Silent-Face-Anti-Spoofing
python convert_to_onnx.py # → wasm/minifasnet_v2_27.onnx + v1se_40.onnx
cd ../nn2/tools
python export_minifasnet.py --variant v2 --ckpt ... --output ../weights/minifasnet_v2_27.bin
python export_minifasnet.py --variant v1se --ckpt ... --output ../weights/minifasnet_v1se_40.bin
# Build the nn2 antispoof binary and benchmark
cd .. && bash build_antispoof.sh
./nn2_antispoof.exe v2 weights/minifasnet_v2_27.bin tests/test_27.bin 200The recipe also applies to any tiny binary face attribute: smile, glasses, hat, etc.
cd training/smile/scripts
# Scrape ~500-1500 positives (Bing/DDGS image search via ddgs)
python scrape_positives.py
# Build dataset: positives + 3000 MS1M as negatives
python build_dataset.py
# Train TongueNet (47 K params, MobileNetV2-lite)
python train.py --epochs 30 --batch 128 --lr 1e-3
python export.py # → wasm/facex_smile.onnx (187 KB)Expected: F1 0.93–0.96 after 20 epochs (~10 min on 5060 Ti). Don't use class weights — they bias the model toward the rare class and produce constant-100% predictions on neutral webcam frames.
cd wasm
python tools/encrypt_models.py # AES-256-GCM all .onnx → .enc
cp *.enc ../docs/demo/ # GitHub Pages serves from /docsThe key is written to .model_key at repo root (gitignored). To
rotate, delete the file and re-encrypt — clients will need the new
key bytes (see Encrypted Weights).