Skip to content

[ BOUNTY] Add JSONL output to log_aggregator (#3)#7

Open
xcapselx wants to merge 3 commits into
thanhle74:mainfrom
xcapselx:feat/jsonl-output-thanhle74
Open

[ BOUNTY] Add JSONL output to log_aggregator (#3)#7
xcapselx wants to merge 3 commits into
thanhle74:mainfrom
xcapselx:feat/jsonl-output-thanhle74

Conversation

@xcapselx

@xcapselx xcapselx commented Jun 18, 2026

Copy link
Copy Markdown

Summary

Add --format text|jsonl output support to tools/log_aggregator.py so downstream tooling can parse aggregated logs without scraping terminal text. text is the new default; jsonl emits one JSON record per line.

Changes

  • tools/log_aggregator.py:

    • Changed --format choices from json|csv|html to text|jsonl|json|csv|html with text as default
    • Added export_jsonl() method that writes one JSON record per line with timestamp, level, source, message, and metadata fields
    • Added _sorted_entries_for_jsonl() to sort entries by timestamp (oldest first); entries without timestamps appear after timestamped ones
    • Added _entry_to_jsonl_record() to map internal entry fields to the JSONL schema (ISO 8601 timestamps, service mapped to source)
    • Track unparseable lines in unparseable_lines list; JSONL output emits a warning record for each
    • Fixed string timestamp handling in _parse_line() — JSON logs with ISO 8601 string timestamps now convert to Unix timestamps instead of crashing
    • Made TextLogParser require at least a timestamp or a log level keyword; lines with no recognizable structure are now treated as unparseable
  • tests/test_log_aggregator_jsonl.py (new file):

    • test_jsonl_output_with_json_logs — JSONL output from JSON-format logs, verifies all required fields
    • test_jsonl_output_with_text_logs — JSONL output from text-format logs, verifies all required fields
    • test_jsonl_sorted_by_timestamp — verifies chronological ordering
    • test_unparseable_lines_produce_warning_records — verifies warning records for unparseable lines
    • test_text_format_is_default — verifies default format is text
    • test_jsonl_format_choice_exists — verifies jsonl is a valid format choice
  • docs/OPERATIONS.md:

    • Added "Log Aggregator JSONL Output" section documenting the JSONL record schema, ordering rules, warning record format, and supported input formats

Testing

python -m unittest tests.test_log_aggregator_jsonl -v

Result:

test_jsonl_format_choice_exists ... ok
test_jsonl_output_with_json_logs ... ok
test_jsonl_output_with_text_logs ... ok
test_jsonl_sorted_by_timestamp ... ok
test_text_format_is_default ... ok
test_unparseable_lines_produce_warning_records ... ok

Ran 6 tests in 0.011s
OK

Build diagnostic: diagnostic/build-338f0f7a.json (encryptly .logd unavailable on Windows; JSON metadata included).

Checklist

  • Relevant modules affected by these changes build locally
  • Tests pass locally
  • Diagnostic build log is committed in this PR
  • Documentation has been updated (docs/OPERATIONS.md)
  • Changes are scoped to the PR purpose and avoid unrelated cleanup
  • Security, privacy, and error-handling implications have been considered

Closes #3

Summary by CodeRabbit

  • New Features
    • Added JSONL export support to the log aggregator for machine-readable downstream consumption.
    • Expanded the CLI --format options and changed the default output to text.
  • Bug Fixes
    • Improve parsing behavior by skipping lines without detectable timestamps/valid levels.
    • JSONL export now includes warning records for unparseable lines and outputs records in timestamp order.
  • Documentation
    • Updated operations guide with JSONL schema, usage, and supported input formats.
  • Tests
    • Added unit tests covering JSONL export, sorting, warnings, and CLI format defaults.

LE-VAI added 2 commits June 18, 2026 19:12
- Add --format text|jsonl with text as default (was json)
- JSONL records include timestamp, level, source, message, metadata
- Sort entries by timestamp when available
- Warning records for unparseable lines
- Fix string timestamp handling in _parse_line
- Make TextLogParser require timestamp or level keyword
- Add tests covering JSON and text log formats
- Document JSONL schema in docs/OPERATIONS.md
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 007eb786-e5cd-4cba-bee5-2e58d9c42f7e

📥 Commits

Reviewing files that changed from the base of the PR and between 3edf7c9 and 8385927.

📒 Files selected for processing (2)
  • tests/test_log_aggregator_jsonl.py
  • tools/log_aggregator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tools/log_aggregator.py
  • tests/test_log_aggregator_jsonl.py

📝 Walkthrough

Walkthrough

Adds JSONL export capability to LogAggregator: TextLogParser.parse() gains a pre-filter to skip lines with neither timestamp nor level; _parse_line normalizes ISO timestamps to epoch and tracks unparseable lines; export_jsonl() writes ordered JSONL records plus warning entries. CLI --format gains jsonl/text choices with text as the new default. A test suite and OPERATIONS.md section document the feature. A diagnostic JSON report for a failed encryptly-preflight build is also included.

Changes

JSONL Export Feature

Layer / File(s) Summary
Parser pre-filter and aggregator internals
tools/log_aggregator.py
TextLogParser.parse() returns None when a line has neither timestamp nor recognizable level. LogAggregator.__init__ adds unparseable_lines list. _parse_line converts ISO-8601 string timestamps to epoch integers for hourly bucketing and appends stripped non-parsed lines to unparseable_lines.
export_jsonl() method and CLI wiring
tools/log_aggregator.py
export_jsonl() writes timestamp-ordered JSONL records for parsed entries and appends warning records with raw_line for each unparseable item. parse_args() expands --format to ["text", "jsonl", "json", "csv", "html"] with text as the new default. main() adds explicit jsonl/json branches and replaces the old JSON fallback with pass.
Tests and documentation
tests/test_log_aggregator_jsonl.py, docs/OPERATIONS.md
TestJSONLOutput covers JSON log export, text log export, timestamp-sorted output, warning records for unparseable lines, default format, and --format jsonl CLI choice. OPERATIONS.md documents the JSONL record schema, ordering rules, warning record structure, and supported input formats.

Diagnostic Build Report

Layer / File(s) Summary
Diagnostic JSON build report
diagnostic/build-338f0f7a.json
New report records the encryptly-preflight module FAIL status due to a missing encryptly binary on windows-x64 for commit 338f0f7a, with pass/fail counts and a PR note about the absent .logd artifact.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as parse_args / main()
    participant LogAggregator
    participant TextLogParser
    participant OutputFile

    User->>CLI: python3 log_aggregator.py --format jsonl
    CLI->>LogAggregator: process_file(input_path)
    loop each line
        LogAggregator->>TextLogParser: parse(line)
        TextLogParser-->>LogAggregator: entry or None
        alt parsed entry
            LogAggregator->>LogAggregator: normalize ISO timestamp → epoch
        else unparseable
            LogAggregator->>LogAggregator: append to unparseable_lines
        end
    end
    CLI->>LogAggregator: export_jsonl(output_path)
    LogAggregator->>LogAggregator: sort entries by timestamp
    loop parsed entries
        LogAggregator->>OutputFile: write JSONL record
    end
    loop unparseable_lines
        LogAggregator->>OutputFile: write warning JSONL record with raw_line
    end
    OutputFile-->>User: .jsonl file
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hoppity-hop, I parsed the logs today,
JSON lines sorted, warnings on display!
Unparseable crumbs? I warned them with care,
A --format jsonl flag floats in the air.
The diagnostic said "no binary found" —
But the JSONL export is safe and sound! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: adding JSONL output functionality to the log_aggregator tool, which is the primary focus of this changeset.
Description check ✅ Passed The PR description comprehensively covers all required template sections including Summary, Changes, Testing, and a completed Checklist with all relevant items marked. Documentation updates and diagnostic build logs are included as required.
Linked Issues check ✅ Passed All acceptance criteria from issue #3 are met: JSONL feature fully implemented with export_jsonl() method and supporting functions, existing tests pass plus 6 new tests added and passing, code follows conventions, and diagnostic artifacts included.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the PR purpose of adding JSONL output support. Changes include the log_aggregator implementation, comprehensive tests, documentation updates, and build diagnostics—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diagnostic/build-338f0f7a.json`:
- Line 5: The diagnostic JSON file contains a non-null diagnostic_logd_error
field indicating the encryptly binary is missing on windows-x64, which will
trigger a CI validation failure in the diagnostic-build-log.yml workflow that
explicitly rejects files with this error. To fix this, either re-run the python3
build.py script on a platform where the encryptly binary is available (such as
linux-x64 or macos-x64) to generate a valid diagnostic file with the encrypted
.logd file present, or remove the diagnostic file entirely from the commit if
the JSONL feature does not require a passing build diagnostic.

In `@tests/test_log_aggregator_jsonl.py`:
- Line 58: Replace all four instances of tempfile.mktemp(suffix=".jsonl") with a
secure temporary file API. Instead of using mktemp(), use
NamedTemporaryFile(delete=False) to create a named temporary file with the
.jsonl suffix, then extract the file path from the returned object. This applies
to lines 58, 77, 96, and 112 in the test file. Ensure that you properly handle
the file object returned by NamedTemporaryFile and close it if needed before
using the path.

In `@tools/log_aggregator.py`:
- Around line 254-263: In the timestamp parsing logic within the try-except
block, when catching ValueError or TypeError exceptions, set both ts = None and
entry['timestamp'] = None to ensure the timestamp is properly normalized to None
instead of remaining as an invalid string. Additionally, change the conditional
check from if ts and isinstance(ts, (int, float)): to if ts is not None and
isinstance(ts, (int, float)): to properly handle valid epoch 0 timestamps, which
are falsy but should not be skipped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52b35326-daa8-4faf-b8ec-8e41ad73dd16

📥 Commits

Reviewing files that changed from the base of the PR and between 94e0fb0 and 3edf7c9.

📒 Files selected for processing (4)
  • diagnostic/build-338f0f7a.json
  • docs/OPERATIONS.md
  • tests/test_log_aggregator_jsonl.py
  • tools/log_aggregator.py

"generated_at": "2026-06-18T23:12:41.974269+00:00",
"commit": "338f0f7a",
"diagnostic_logd": null,
"diagnostic_logd_error": "encryptly binary not found (detected windows-x64; available: linux-arm64, linux-x64, macos-arm64, macos-x64, windows-arm64, windows-x64); cannot create diagnostic\\build-338f0f7a.logd",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CI validation blocker: diagnostic_logd_error will cause PR to fail CI checks.

The diagnostic report includes diagnostic_logd_error (line 5) and message_blocker (line 6), indicating the encryptly preflight build failed due to a missing encryptly binary on windows-x64. However, the CI workflow (.github/workflows/diagnostic-build-log.yml lines 160–162) explicitly rejects any diagnostic JSON file with a non-null diagnostic_logd_error:

if diagnostic_logd_error:
    failures.append((json_path, f"Build script reported diagnostic_logd_error: {diagnostic_logd_error}"))
    continue

This will cause the PR's CI validation to fail, blocking merge. The diagnostic artifact requirements mandate that a valid encrypted .logd file be present in the commit. To proceed, either:

  1. Re-run python3 build.py on a platform where the encryptly binary is available (linux-x64, macos-x64, etc.), or
  2. Remove this diagnostic file from the PR commit if the JSONL feature itself does not require a passing build diagnostic.

Also applies to: 6-6

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostic/build-338f0f7a.json` at line 5, The diagnostic JSON file contains
a non-null diagnostic_logd_error field indicating the encryptly binary is
missing on windows-x64, which will trigger a CI validation failure in the
diagnostic-build-log.yml workflow that explicitly rejects files with this error.
To fix this, either re-run the python3 build.py script on a platform where the
encryptly binary is available (such as linux-x64 or macos-x64) to generate a
valid diagnostic file with the encrypted .logd file present, or remove the
diagnostic file entirely from the commit if the JSONL feature does not require a
passing build diagnostic.

Comment thread tests/test_log_aggregator_jsonl.py Outdated
Comment thread tools/log_aggregator.py
- Replace tempfile.mktemp() with mkstemp-based _temp_output_path helper
- Use 'is not None' instead of truthy check for timestamp (handles epoch 0)
- Normalize invalid string timestamps to None instead of leaving as string
- All 6 tests still passing
thanhle74 pushed a commit that referenced this pull request Jun 20, 2026
* Add benchmark rate limiter bypass flag

* Add benchmark diagnostic report

* Remove incomplete benchmark diagnostic report

* Add benchmark build diagnostics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[$25 BOUNTY] [Python] feat: Add JSONL output to log_aggregator.py

2 participants