Skip to content

Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25

Closed
dev-erik wants to merge 14 commits intomaderix:mainfrom
dev-erik:main
Closed

Community contributions: M1-M4 compat, security fixes, docs, benchmarks, and community dashboard#25
dev-erik wants to merge 14 commits intomaderix:mainfrom
dev-erik:main

Conversation

@dev-erik
Copy link

@dev-erik dev-erik commented Mar 3, 2026

Summary

This PR contributes a set of improvements built on top of the original ANE training codebase: bug fixes, security hardening, developer documentation, and a community benchmark submission system with a public dashboard.

Bug Fixes (from upstream PRs)

New Features

  • Community benchmark system: standardized JSON output format, aggregation script, M4 Max reference result
  • Benchmark runner (scripts/run_benchmarks.sh): full test suite runner with --training-only, --probes-only, --benchmarks-only flags
  • mlpackage generator (scripts/gen_mlpackages.py): generates CoreML models for sram/inmem benchmarks
  • Community benchmark script (scripts/run_community_benchmark.sh): runs benchmarks, generates standardized JSON, optionally submits to a shared dashboard
  • ANE ChainingRequest API prototype: baseline measurement for multi-kernel pipelining
  • Configurable paths: ANE_MODEL_PATH, ANE_DATA_PATH, ANE_CKPT_PATH env vars (addresses Link to get the correct version of tiny stories (and also paths seem to be hardcoded in train_large) #4), combined with CLI flags (--model, --ckpt)

Documentation

  • docs/ARCHITECTURE.md — system architecture with mermaid swim-lane diagrams
  • docs/API_REFERENCE.md — complete function index by file
  • docs/BENCHMARKS.md — benchmark guide with build/run commands
  • `docs/BENCHMARK_RESULT — M4 Max benchmark results
  • Security audit report

Community Infrastructure

  • Rewritten README with quick start, env var table, community benchmarks section
  • CONTRIBUTING.md with guidelines for benchmark submissions and code contributions
  • GitHub issue templates (benchmark submission, bug report, feature request)
  • Comprehensive .gitignore

Community Benchmark Dashboard

The benchmark script optionally submits results to a public dashboard at https://web-lac-sigma-61.vercel.app. The URL is configurable via the ANE_DASHBOARD_URL env var.

The dashboard infrastructure (Vercel + Neon Postgres) is hosted and maintained by @dev-erik. No credentials or dashboard source code are included in this PR. Submissions go through a public API endpoint protected by rate limiting (5/hr per IP), schema validation, size limits, and duplicate detection. If you'd like admin access to the dashboard or database, feel free to reach out via DM.

Addresses

Test Plan

  • make train_large compiles cleanly
  • make train_large_ane compiles cleanly
  • training_dynamic/ compiles cleanly
  • Benchmark runner tested on M4 Max
  • Community benchmark JSON submitted and verified in dashboard
  • Env var and CLI flag path overrides tested

dev-erik added 13 commits March 3, 2026 14:21
…program(1.0), ios16 target, tensor types across 18 files
…size_t wraparound on short datasets in both train_large variants
…ing when password is required for powermetrics
…rotector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation
…lti-kernel pipelining without recompile overhead
…eference, benchmark guide, M4 Max results, security audit report
…r full test suite, gen_mlpackages.py for CoreML model generation
…mit to dashboard, aggregation script, M4 Max reference result
…tignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link
…timized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR maderix#22 (binary blob risk).
… (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available.
…ut, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer
@dev-erik dev-erik force-pushed the main branch 2 times, most recently from 6940b54 to 57d7a3e Compare March 3, 2026 23:51
@dev-erik dev-erik force-pushed the main branch 2 times, most recently from 860ebe3 to be96079 Compare March 3, 2026 23:53
@maderix
Copy link
Owner

maderix commented Mar 4, 2026

Thanks for the effort here — there's clearly a lot of work. However, this PR bundles too many unrelated changes together (58 files, +13K lines), which makes it very difficult to review safely. Several of the fixes here have already been merged via focused PRs (#17, #20, #29, #31, #34).

Could you split the remaining work into separate PRs? Here's what we'd be interested in reviewing individually:

1. .gitignore improvements — Small, easy merge. Just the .gitignore file.

2. test_chaining.m (ANE ChainingRequest API) — This is genuinely interesting research. Submit it standalone with a brief description of what you found.

3. stories_cpu_ops_opt.h — If this has real perf gains over the current CPU ops, submit it with before/after timing numbers.

4. MIL syntax / M1-M3 compat changes — PR #6 already covers this, and the core issue is that downgrading to program(1.0)/ios16 globally hurts M4. This needs chip detection at runtime rather than a blanket downgrade. Coordinate with PR #6 author or submit a new approach with runtime gating.

Things we'd prefer not to merge:

  • inference/ directory (~5K lines, Metal shaders, HTTP server, Qwen inference) — This is a separate project, not a patch to the training codebase. Consider making it its own repo or proposing it as a tracked feature first.
  • train_double_buffer.m / train_opt.m — We already have static + dynamic pipelines. Adding more variants without justifying why they're better than the dynamic pipeline (110ms/step, no recompile) creates maintenance burden.
  • External dashboard submission to vercel.app — Sending data to a third-party URL needs discussion before merging.
  • PROBE_RESULTS.md, extensive docs, issue templates — We prefer to keep the repo lean. A focused README update is fine, but 1500+ lines of docs/templates is too much.

The individual PRs that were already merged show the pattern we prefer — small, focused, one concern per PR. Looking forward to seeing the split!

@dev-erik
Copy link
Author

dev-erik commented Mar 4, 2026

Thanks for the detailed feedback — totally fair. Split into 3 focused PRs:

  1. .gitignoreAdd .gitignore for build artifacts and temp files #37
  2. test_chaining.m (ANE ChainingRequest API prototype) — ANE private API research: chaining, E5 runtime, custom MIL compilation #40
  3. stories_cpu_ops_opt.h (cache-optimized embedding ops, ~12x lookup speedup) — Add cache-optimized embedding ops (~12x lookup speedup) #39

Keeping the inference directory, dashboard, docs, and extra training variants in our fork. Closing this PR now.

@dev-erik dev-erik closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants