Skip to content

Add sanitized parse-error report mode to log_aggregator (#1)#10

Open
leo202000 wants to merge 1 commit into
9904099:mainfrom
leo202000:bounty/parse-error-report-9904099-1
Open

Add sanitized parse-error report mode to log_aggregator (#1)#10
leo202000 wants to merge 1 commit into
9904099:mainfrom
leo202000:bounty/parse-error-report-9904099-1

Conversation

@leo202000

Copy link
Copy Markdown

Summary

Closes #1. Adds a --parse-error-report PATH mode to tools/log_aggregator.py that writes a JSON summary of parse failures by file and line number. Malformed JSON records (which previously fell through to the text parser and were silently swallowed) are now surfaced as a concise, sanitized artifact for operators, without leaking raw log payloads or secret-looking values.

Changes

  • tools/log_aggregator.py
    • _SECRET_PATTERNS + _sanitize_text(): redacts api keys, AWS keys, GitHub tokens, and Bearer values; clamps messages to 200 chars.
    • LogAggregator.parse_errors: new list tracking each failure's file, line, parser type, and sanitized message.
    • _record_parse_error() + extended _parse_line(): lines that look like JSON ({/[) but fail json.loads are recorded as json parse failures. The line still falls through to the text parser, so existing entry counts and CSV/JSON/HTML outputs stay backward compatible.
    • process_file(): now tracks line numbers (via enumerate) and passes file + line to _parse_line.
    • export_parse_error_report(): writes {total_parse_errors, errors_by_file: {file: {count, failures: [{line, parser, error}]}}}. Never includes raw log line contents.
    • New --parse-error-report PATH CLI flag; report is written only when the flag is supplied.
  • tools/smoke_parse_error_report.py: focused smoke script covering valid JSON logs, malformed JSON logs, text logs, sanitized report output (no raw payload/secrets), backward compatibility of existing outputs, and the CLI end-to-end.
  • Committed a real (non-stub) diagnostic build log: diagnostic/build-2b54872c.logd + matching JSON metadata.

Testing

# 1. Smoke test (13 checks): valid logs, malformed logs, sanitized report, backward compat, CLI
python3 tools/smoke_parse_error_report.py
# -> SMOKE TEST PASSED: all checks ok (exit 0)

# 2. Generated a real diagnostic by running the build
python3 build.py
# -> produced diagnostic/build-2b54872c.logd + build-2b54872c.json (real, not the stub)

# 3. Manual CLI check
python3 tools/log_aggregator.py --input sample.log --output out.json --parse-error-report errors.json

The diagnostic JSON metadata shows the build ran across all 10 modules. Module-level build failures are due to language toolchains (cargo/npm/go/gcc/javac/ghc, etc.) not being installed in this environment and are unrelated to this change, which only touches tools/log_aggregator.py. Backward compatibility is verified: when --parse-error-report is not used, export_json/export_csv/generate_html_report and entry counts are unchanged.

Encrypted diagnostic for review: diagnostic/build-2b54872c.logd (decrypt with encryptly unpack diagnostic/build-2b54872c.logd <outdir> --password de0d0d5a4a1b633eacf6).

Checklist

  • Relevant modules affected by these changes build locally
  • Tests pass locally
  • Diagnostic build log is committed in this PR
  • Documentation has been updated, if applicable
  • Configuration or schema changes are documented, if applicable
  • No generated build artifacts are committed, except the required diagnostic build log
  • Changes are scoped to the PR purpose and avoid unrelated cleanup
  • Security, privacy, and error-handling implications have been considered

  • I would like to request that my diagnostic build log is removed before merging

@leo202000 leo202000 force-pushed the bounty/parse-error-report-9904099-1 branch from caf2b83 to 2043594 Compare June 21, 2026 02:29
Nexussyn added a commit to Nexussyn/zeroeye that referenced this pull request Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[$40 BOUNTY] [Python] Add sanitized parse-error reports for log aggregation

1 participant