fix: address HIGH security findings (HIGH-01 to HIGH-05)#13
fix: address HIGH security findings (HIGH-01 to HIGH-05)#13manni07 wants to merge 17 commits intomaderix:mainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CRIT-01: dlopen() return check + NSClassFromString validation in ane_init()
(ane_runtime.h + stories_config.h); g_ane_ok / g_ane_ok_large flag
only set when all private classes load successfully; stories_config.h
gets re-entry guard (g_ane_init_done) that was previously missing
- CRIT-02: g_ane_ok guard in ane_compile() and compile_kern_mil_w(); NULL check
for inMemoryModel after inMemoryModelWithDescriptor: — prevents crash
when API call returns nil (ane_runtime.h, stories_io.h)
- CRIT-03: Validate fread() return for critical config/header reads to prevent
garbage malloc() sizes; fopen() NULL check in save_checkpoint();
design decision documented (model.h, train_large.m)
- CRIT-04: int -> size_t in build_blob*/build_blob_t/build_blob_fp16; calloc()
NULL checks added; (size_t) cast in malloc() size calculations to
prevent signed integer overflow UB (stories_io.h, model.h)
Simulation: 3 iterations, overall score 96.15% (all criteria >= 95%)
ref: docs/reports/security-audit-2026-03-02.md
- MED-01: IOSurfaceLock() return checked in all 6 I/O functions; early return
on failure prevents data race (stories_io.h, ane_runtime.h)
- MED-02: Per-process/per-call unique temp dirs via getpid()+g_compile_seq
(stories_io.h, ane_runtime.h)
- MED-03: mil_dims_valid() guard in all 7 MIL-gen functions; nil return on
invalid params (ane_mil_gen.h)
- MED-04: CkptHdr.pad[0]=0x01020304 byte-order sentinel; runtime check in
load_checkpoint; _Static_assert for compile-time LE guarantee (train_large.m)
- MED-05: _Static_assert(SEQ%8==0) + ARM64 alignment rationale comment (stories_io.h)
- MED-06: dispatch_once replaces manual g_ane_loaded/g_ane_init_done guards;
thread-safe one-time ANE init (ane_runtime.h, stories_config.h)
ref: docs/reports/security-audit-2026-03-02.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Simulation plan for HIGH-01 to HIGH-05 with 5-criteria scoring. Overall avg: 95.76% (all criteria >=95%). ref: docs/reports/security-audit-2026-03-02.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- train_large.m: early exit if n_tokens < SEQ+1 prevents arithmetic underflow and downstream index corruption - stories_cpu_ops.h: clamp tok >= VOCAB to 0 in embed_lookup() to prevent heap-buffer-overflow ref: docs/reports/security-audit-2026-03-02.md HIGH-01
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- stories_cpu_ops.h: apply same VOCAB clamp in embed_backward() to prevent OOB write / heap corruption (mirrors embed_lookup fix) - train_large.m: munmap+close before early-exit to avoid FD/mmap leak
- train_large.m: realpath() guard for DATA_PATH before open() provides clear error message when run from wrong directory - train_large.m: realpath() audit log in load_pretrained() for model path ref: docs/reports/security-audit-2026-03-02.md HIGH-02
- munmap(token_data) + close(data_fd) before execl() prevents FD/mmap leak across process boundary - realpath(argv[0]) resolves binary path to prevent exec of wrong binary when invoked via relative path ref: docs/reports/security-audit-2026-03-02.md HIGH-03
Document the FD/mmap cleanup and realpath() argv[0] fix applied in commit b5c3cf9, including design decisions and updated finding status.
- stories_config.h: add xmf(n)/xcf(n) static inline helpers that abort() with diagnostic on OOM — replaces all unchecked malloc/calloc - stories_config.h: replace all (float*)malloc(X*4) -> xmf(X) and (float*)calloc(X,4) -> xcf(X) in all 5 alloc functions - train_large.m: same replacement for direct alloc calls (embed, rms_final, grads, per-iteration temporaries — 31 call sites total) ref: docs/reports/security-audit-2026-03-02.md HIGH-04 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 7 additional raw malloc/calloc calls replaced with xmf()/xcf() helpers - cross_entropy_loss buf cast to (size_t) to prevent int overflow ref: docs/reports/security-audit-2026-03-02.md HIGH-04
- stories_io.h: Kern struct calloc NULL guard + abort - ane_runtime.h: 5 allocs (ANEKernel, inputBytes, outputBytes, ioInputs, ioOutputs) guarded with abort on NULL - ane_mil_gen.h: 3 uint8_t buf calloc calls guarded with abort on NULL ref: docs/reports/security-audit-2026-03-02.md HIGH-04 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- stories_mil.h: get_mask_blob() mask calloc now guarded with abort() on NULL before immediate array write ref: docs/reports/security-audit-2026-03-02.md HIGH-04
- stories_io.h: ane_eval() changed from void to bool; captures BOOL return from evaluateWithQoS:options:request:error: with NSError log - train_large.m: step_ok tracks all 6 ane_eval() results per step; gradient update skipped if any ANE eval fails in that step ref: docs/reports/security-audit-2026-03-02.md HIGH-05 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- security-audit-2026-03-02.md: add BEHOBEN status to all 5 HIGH findings - 001-initial-setup-and-security-audit.md: update status table to show all 19 findings resolved; add stories_mil.h to HIGH-04 completeness list ref: docs/reports/security-audit-2026-03-02.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2e0c8332a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (IOSurfaceLock(s, 0, NULL) != kIOReturnSuccess) { // MED-01 | ||
| fprintf(stderr, "IOSurfaceLock(write) failed — surface write skipped\n"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Propagate IOSurface lock failures to step_ok
io_write_fp16 now returns early on IOSurfaceLock failure, but the training loop only tracks ane_eval() in step_ok. In the lock-failure path (e.g., invalid or busy IOSurface), ANE eval can still run on stale inputs and step_ok remains true, so the optimizer update is applied with corrupted gradients instead of being skipped. This should be surfaced as a boolean error back to the caller and combined into the same failure gate used for ane_eval.
Useful? React with 👍 / 👎.
Summary
Fixes all 5 high-severity findings from the security audit (
docs/reports/security-audit-2026-03-02.md).n_tokens < SEQ+1early-exit guard intrain_large.m(withmunmap+closecleanup); addedif (tok >= VOCAB) { tok = 0; }clamp in bothembed_lookup()ANDembed_backward()instories_cpu_ops.hto prevent both OOB read and OOB write/heap corruptionDATA_PATH/MODEL_PATHwithout validation —realpath()guard beforeopen()with actionable error message;realpath()audit log inload_pretrained()execl()without cleanup —munmap(token_data)+close(data_fd)before exec;realpath(argv[0])to prevent exec of wrong binary via symlink/relative pathmalloc()/calloc()— addedxmf(n)/xcf(n)OOM-abort helpers instories_config.h; replaced all 58 raw float allocations instories_config.h,train_large.m,stories_cpu_ops.h; added inline OOM guards for non-float allocations instories_io.h,stories_mil.h,ane_runtime.h,ane_mil_gen.h(total: 68 guarded sites)ane_eval()changed fromvoidtobool;step_ok &= ane_eval(...)at all 6 call sites; Adam gradient update skipped when any ANE eval fails in a stepTest plan
cd training && make train_large— must compile clean (zero errors, zero new warnings)(SEQ+1)*2 = 514bytes →stderr: "Token file too small"train_largefrom a directory whereDATA_PATHdoesn't exist →stderr: "Data file not found: '...' Hint: run from training/ directory"ulimit -v 1000000 && ./train_large→ OOM abort with"OOM: malloc(..."diagnosticane_eval()instories_io.hreturnsbool,step_okused intrain_large.mRelated
xmf/xcfhelpers build onsize_ttype safety from CRIT-04)dispatch_once+ MED-01 IOSurfaceLock guards in same files)