Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25
Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25dev-erik wants to merge 14 commits intomaderix:mainfrom
Conversation
…program(1.0), ios16 target, tensor types across 18 files
…size_t wraparound on short datasets in both train_large variants
…ing when password is required for powermetrics
…rotector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation
…lti-kernel pipelining without recompile overhead
…eference, benchmark guide, M4 Max results, security audit report
…r full test suite, gen_mlpackages.py for CoreML model generation
…mit to dashboard, aggregation script, M4 Max reference result
…tignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link
…timized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR maderix#22 (binary blob risk).
… (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available.
…ut, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer
6940b54 to
57d7a3e
Compare
860ebe3 to
be96079
Compare
|
Thanks for the effort here — there's clearly a lot of work. However, this PR bundles too many unrelated changes together (58 files, +13K lines), which makes it very difficult to review safely. Several of the fixes here have already been merged via focused PRs (#17, #20, #29, #31, #34). Could you split the remaining work into separate PRs? Here's what we'd be interested in reviewing individually: 1. 2. 3. 4. MIL syntax / M1-M3 compat changes — PR #6 already covers this, and the core issue is that downgrading to Things we'd prefer not to merge:
The individual PRs that were already merged show the pattern we prefer — small, focused, one concern per PR. Looking forward to seeing the split! |
|
Thanks for the detailed feedback — totally fair. Split into 3 focused PRs:
Keeping the inference directory, dashboard, docs, and extra training variants in our fork. Closing this PR now. |
Summary
This PR contributes a set of improvements built on top of the original ANE training codebase: bug fixes, security hardening, developer documentation, and a community benchmark submission system with a public dashboard.
Bug Fixes (from upstream PRs)
program(1.0),ios16target, canonical tensor types across 18 MIL filessize_twraparound crash on short datasets-fstack-protector-strong,-Wformat-securityflags, NULL guards onane_compile/fread/fopen,tokenize.pyinput validationNew Features
scripts/run_benchmarks.sh): full test suite runner with--training-only,--probes-only,--benchmarks-onlyflagsscripts/gen_mlpackages.py): generates CoreML models for sram/inmem benchmarksscripts/run_community_benchmark.sh): runs benchmarks, generates standardized JSON, optionally submits to a shared dashboardANE_MODEL_PATH,ANE_DATA_PATH,ANE_CKPT_PATHenv vars (addresses Link to get the correct version of tiny stories (and also paths seem to be hardcoded in train_large) #4), combined with CLI flags (--model,--ckpt)Documentation
docs/ARCHITECTURE.md— system architecture with mermaid swim-lane diagramsdocs/API_REFERENCE.md— complete function index by filedocs/BENCHMARKS.md— benchmark guide with build/run commandsCommunity Infrastructure
CONTRIBUTING.mdwith guidelines for benchmark submissions and code contributions.gitignoreCommunity Benchmark Dashboard
The benchmark script optionally submits results to a public dashboard at https://web-lac-sigma-61.vercel.app. The URL is configurable via the
ANE_DASHBOARD_URLenv var.The dashboard infrastructure (Vercel + Neon Postgres) is hosted and maintained by @dev-erik. No credentials or dashboard source code are included in this PR. Submissions go through a public API endpoint protected by rate limiting (5/hr per IP), schema validation, size limits, and duplicate detection. If you'd like admin access to the dashboard or database, feel free to reach out via DM.
Addresses
Test Plan
make train_largecompiles cleanlymake train_large_anecompiles cleanlytraining_dynamic/compiles cleanly