Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard by dev-erik · Pull Request #25 · maderix/ANE

dev-erik · 2026-03-03T13:45:37Z

Summary

This PR contributes a set of improvements built on top of the original ANE training codebase: bug fixes, security hardening, developer documentation, and a community benchmark submission system with a public dashboard.

Bug Fixes (from upstream PRs)

M1/M2/M3 MIL syntax compatibility (PR Fix MIL syntax + M1/M2 support #6): program(1.0), ios16 target, canonical tensor types across 18 MIL files
Token sampling underflow fix (PR Fix token sampling underflow for short token datasets #17): prevents size_t wraparound crash on short datasets
Dashboard sudo hang fix (PR Optimize dashboard and prevent sudo hang when password needed #20): prevents blocking when password is required for powermetrics
Security hardening (PRs fix: address LOW security findings (LOW-01 to LOW-04) #5, fix: address CRIT security findings (CRIT-01 to CRIT-04) #7): -fstack-protector-strong, -Wformat-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation

New Features

Community benchmark system: standardized JSON output format, aggregation script, M4 Max reference result
Benchmark runner (scripts/run_benchmarks.sh): full test suite runner with --training-only, --probes-only, --benchmarks-only flags
mlpackage generator (scripts/gen_mlpackages.py): generates CoreML models for sram/inmem benchmarks
Community benchmark script (scripts/run_community_benchmark.sh): runs benchmarks, generates standardized JSON, optionally submits to a shared dashboard
ANE ChainingRequest API prototype: baseline measurement for multi-kernel pipelining
Configurable paths: ANE_MODEL_PATH, ANE_DATA_PATH, ANE_CKPT_PATH env vars (addresses Link to get the correct version of tiny stories (and also paths seem to be hardcoded in train_large) #4), combined with CLI flags (--model, --ckpt)

Documentation

docs/ARCHITECTURE.md — system architecture with mermaid swim-lane diagrams
docs/API_REFERENCE.md — complete function index by file
docs/BENCHMARKS.md — benchmark guide with build/run commands
`docs/BENCHMARK_RESULT — M4 Max benchmark results
Security audit report

Community Infrastructure

Rewritten README with quick start, env var table, community benchmarks section
CONTRIBUTING.md with guidelines for benchmark submissions and code contributions
GitHub issue templates (benchmark submission, bug report, feature request)
Comprehensive .gitignore

Community Benchmark Dashboard

The benchmark script optionally submits results to a public dashboard at https://web-lac-sigma-61.vercel.app. The URL is configurable via the ANE_DASHBOARD_URL env var.

The dashboard infrastructure (Vercel + Neon Postgres) is hosted and maintained by @dev-erik. No credentials or dashboard source code are included in this PR. Submissions go through a public API endpoint protected by rate limiting (5/hr per IP), schema validation, size limits, and duplicate detection. If you'd like admin access to the dashboard or database, feel free to reach out via DM.

Addresses

Issue Link to get the correct version of tiny stories (and also paths seem to be hardcoded in train_large) #4 (hardcod paths)
Issue Training data #10 (training data setup documentation)
Issue Results on M1/2/3/4, mini, pro and max? #3 (community benchmark results)

Test Plan

make train_large compiles cleanly
make train_large_ane compiles cleanly
training_dynamic/ compiles cleanly
Benchmark runner tested on M4 Max
Community benchmark JSON submitted and verified in dashboard
Env var and CLI flag path overrides tested

…program(1.0), ios16 target, tensor types across 18 files

…size_t wraparound on short datasets in both train_large variants

…ing when password is required for powermetrics

…rotector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation

…lti-kernel pipelining without recompile overhead

…eference, benchmark guide, M4 Max results, security audit report

…r full test suite, gen_mlpackages.py for CoreML model generation

…mit to dashboard, aggregation script, M4 Max reference result

…tignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link

…timized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR maderix#22 (binary blob risk).

… (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available.

…ut, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer

…mory safety

maderix · 2026-03-04T12:45:18Z

Thanks for the effort here — there's clearly a lot of work. However, this PR bundles too many unrelated changes together (58 files, +13K lines), which makes it very difficult to review safely. Several of the fixes here have already been merged via focused PRs (#17, #20, #29, #31, #34).

Could you split the remaining work into separate PRs? Here's what we'd be interested in reviewing individually:

1. .gitignore improvements — Small, easy merge. Just the .gitignore file.

2. test_chaining.m (ANE ChainingRequest API) — This is genuinely interesting research. Submit it standalone with a brief description of what you found.

3. stories_cpu_ops_opt.h — If this has real perf gains over the current CPU ops, submit it with before/after timing numbers.

4. MIL syntax / M1-M3 compat changes — PR #6 already covers this, and the core issue is that downgrading to program(1.0)/ios16 globally hurts M4. This needs chip detection at runtime rather than a blanket downgrade. Coordinate with PR #6 author or submit a new approach with runtime gating.

Things we'd prefer not to merge:

inference/ directory (~5K lines, Metal shaders, HTTP server, Qwen inference) — This is a separate project, not a patch to the training codebase. Consider making it its own repo or proposing it as a tracked feature first.
train_double_buffer.m / train_opt.m — We already have static + dynamic pipelines. Adding more variants without justifying why they're better than the dynamic pipeline (110ms/step, no recompile) creates maintenance burden.
External dashboard submission to vercel.app — Sending data to a third-party URL needs discussion before merging.
PROBE_RESULTS.md, extensive docs, issue templates — We prefer to keep the repo lean. A focused README update is fine, but 1500+ lines of docs/templates is too much.

The individual PRs that were already merged show the pattern we prefer — small, focused, one concern per PR. Looking forward to seeing the split!

dev-erik · 2026-03-04T13:27:47Z

Thanks for the detailed feedback — totally fair. Split into 3 focused PRs:

.gitignore — Add .gitignore for build artifacts and temp files #37
test_chaining.m (ANE ChainingRequest API prototype) — ANE private API research: chaining, E5 runtime, custom MIL compilation #40
stories_cpu_ops_opt.h (cache-optimized embedding ops, ~12x lookup speedup) — Add cache-optimized embedding ops (~12x lookup speedup) #39

Keeping the inference directory, dashboard, docs, and extra training variants in our fork. Closing this PR now.

dev-erik added 13 commits March 3, 2026 14:21

[fix] M1/M2/M3 MIL syntax compatibility (upstream PR maderix#6): use …

c41acd2

…program(1.0), ios16 target, tensor types across 18 files

[fix] Token sampling underflow fix (upstream PR maderix#17): prevent …

380237a

…size_t wraparound on short datasets in both train_large variants

[fix] Dashboard sudo hang fix (upstream PR maderix#20): prevent block…

4ae51e0

…ing when password is required for powermetrics

[fix] Security hardening (upstream PRs maderix#5, maderix#7): stack-p…

7524260

…rotector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation

[feat] ANE ChainingRequest API prototype: baseline measurement for mu…

680f8c7

…lti-kernel pipelining without recompile overhead

[docs] Developer documentation: architecture diagrams, complete API r…

37cac98

…eference, benchmark guide, M4 Max results, security audit report

[feat] Benchmark runner and mlpackage generator: run_benchmarks.sh fo…

517f1e4

…r full test suite, gen_mlpackages.py for CoreML model generation

[feat] Community benchmark system: standardized JSON output, auto-sub…

9832240

…mit to dashboard, aggregation script, M4 Max reference result

[docs] Community fork README, CONTRIBUTING guide, issue templates, gi…

216776b

…tignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link

[feat] Merge upstream: dynamic weight training, CLI fixes, dashboard v2

99b0683

[feat] Inference server mode: keep ANE kernels loaded between prompts…

6f16dbe

… (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available.

[feat] Optimize inference: vectorize ops (NEON/vDSP), gate debug outp…

0e70f5b

…ut, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer

dev-erik force-pushed the main branch 2 times, most recently from 6940b54 to 57d7a3e Compare March 3, 2026 23:51

[feat][gpu] Q4 quantization, Metal GPU shaders, ANE kernel fusion, me…

be96079

…mory safety

dev-erik force-pushed the main branch 2 times, most recently from 860ebe3 to be96079 Compare March 3, 2026 23:53

dev-erik closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25

Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25
dev-erik wants to merge 14 commits intomaderix:mainfrom
dev-erik:main

dev-erik commented Mar 3, 2026

Uh oh!

maderix commented Mar 4, 2026 •

edited

Loading

Uh oh!

dev-erik commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dev-erik commented Mar 3, 2026

Summary

Bug Fixes (from upstream PRs)

New Features

Documentation

Community Infrastructure

Community Benchmark Dashboard

Addresses

Test Plan

Uh oh!

maderix commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dev-erik commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maderix commented Mar 4, 2026 •

edited

Loading