fix: address LOW security findings (LOW-01 to LOW-04)#5
fix: address LOW security findings (LOW-01 to LOW-04)#5manni07 wants to merge 1 commit intomaderix:mainfrom
Conversation
- LOW-01: Add SEC_FLAGS with -fstack-protector-strong and -Wformat-security to CFLAGS; add CFLAGS_DEBUG with -fsanitize=address,undefined; add verify-flags Make target to inspect active build flags. Note: -D_FORTIFY_SOURCE=2 is implicit at -O2 on macOS Apple LLVM. - LOW-02: Extract -Wno-deprecated-declarations into ANE_COMPAT variable with explanatory comment (intentional for private _ANE* APIs via objc_msgSend). Add check-deprecated Make target to periodically inspect suppressed warnings without breaking the build. - LOW-03: Add input validation to tokenize.py: (1) ZIP file existence check with helpful error message (2) Configurable size limit (default 10GB, MAX_ZIP_BYTES env var) (3) Validate data00.bin present in ZIP before extraction (4) Guard struct.unpack against files smaller than 20 bytes (5) Token ID range validation (must be < VOCAB_SIZE=32000) - LOW-04: Create .gitignore covering macOS metadata (.DS_Store), log files (*.log), compiled binaries (training/train etc.), training data and checkpoints (training/*.bin), ANE artifacts (.mlmodelc, .mlpackage), and external assets directory. Add security audit report and development diary (initial session). Simulation: 3 iteration rounds, final score 96.35% across 5 criteria (Fix Completeness, Backward Compatibility, Code Quality, Testability, Project Consistency) — all >= 95%. ref: docs/reports/security-audit-2026-03-02.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7990a2b876
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not os.path.exists(ZIP_PATH): | ||
| print(f"ERROR: ZIP-Datei nicht gefunden: {ZIP_PATH}", file=sys.stderr) | ||
| print(f" Erwartet: ~/tiny_stories_data_pretokenized.zip", file=sys.stderr) | ||
| sys.exit(1) |
There was a problem hiding this comment.
Preserve existing-output fast path before ZIP checks
The new ZIP_PATH existence/size validation runs before the existing OUTPUT_PATH early-return, so tokenize.py now fails when tinystories_data00.bin is already present but the original ZIP has been deleted (a common post-extraction cleanup case). This is a behavioral regression from the previous idempotent flow and breaks reruns of make tokenize in environments that keep only the extracted .bin.
Useful? React with 👍 / 👎.
…rotector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation
Summary
-fstack-protector-strong,-Wformat-security) and aCFLAGS_DEBUGvariant with AddressSanitizer. Note:-D_FORTIFY_SOURCE=2is implicit at-O2on Apple LLVM and intentionally omitted to avoid macro redefinition warnings.-Wno-deprecated-declarationsinto a namedANE_COMPATvariable with an explanatory comment (intentional suppression for private_ANE*APIs viaobjc_msgSend). Addmake check-deprecatedtarget to periodically inspect suppressed warnings without breaking the build.tokenize.py: ZIP file existence check, configurable size limit (default 10 GB viaMAX_ZIP_BYTESenv var),data00.binpresence in ZIP,struct.unpackguard for small files, and token ID range validation (< VOCAB_SIZE=32000)..gitignorecovering macOS metadata, log files, compiled binaries, training data / checkpoints (training/*.bin), ANE artifacts (.mlmodelc,.mlpackage), and external assets.Changed files
training/MakefileSEC_FLAGS,ANE_COMPAT,CFLAGS_DEBUG,verify-flags,check-deprecatedtargetstraining/tokenize.py.gitignoredocs/reports/security-audit-2026-03-02.mddocs/diaries/001-initial-setup-and-security-audit.mdTest plan
make verify-flags→ shows-fstack-protector-strong -Wformat-securityin CFLAGSmake check-deprecated→ informational output, build does not failpython3 training/tokenize.pywith missing ZIP → exits withERROR:message and code 1git check-ignore -v firebase-debug.log .DS_Store training/train→ all three matchedmake train/make train_large→ no new warnings introduced