Skip to content

feat: argus gcwhy — narrate the worst recent GC pause#155

Merged
rlaope merged 1 commit intomasterfrom
feat/argus-gcwhy
Apr 17, 2026
Merged

feat: argus gcwhy — narrate the worst recent GC pause#155
rlaope merged 1 commit intomasterfrom
feat/argus-gcwhy

Conversation

@rlaope
Copy link
Copy Markdown
Owner

@rlaope rlaope commented Apr 17, 2026

Closes #151.

Summary

argus gcwhy <gc-log-file> [--last=5m] picks the single worst pause in the lookback window and emits up to three plain-English "why" bullets plus a related-counters block.

Sample output:

┌─ GC Why ────────────────────────────────────────────────────┐
│  Worst pause: 912 ms at t=14:02:11  (Mixed / G1 Evacuation) │
│                                                             │
│  Why this happened:                                         │
│   • Allocation rate surged roughly 6.8x the recent baseline │
│     in the events leading up to this pause — a burst of     │
│     allocation pressure.                                    │
│   • Heap was 94% occupied at pause start — very little      │
│     headroom left for new allocation.                       │
│   • Pause was 4.2x the recent average (912 ms vs 220 ms) —  │
│     this is an outlier.                                     │
│                                                             │
│  Related counters:                                          │
│    pause-ms:              912                               │
│    type:                  Mixed                             │
│    cause:                 G1 Evacuation Pause               │
│    heap-used-pct:         94.0                              │
│    prior-avg-pause-ms:    220                               │
└─────────────────────────────────────────────────────────────┘

Rule engine (ordered)

  1. Explicit System.gc() trigger
  2. G1 humongous allocation
  3. Metaspace pressure (Metadata GC Threshold)
  4. Full GC fallback (concurrent-mode / evacuation failure)
  5. Allocation burst vs recent baseline (heap-before jump)
  6. High heap occupancy at pause start (≥ 90%)
  7. Outlier pause vs recent average (≥ 3× prior avg)
  8. Fallback neutral one-liner when nothing fires

Emits the top three triggered bullets (ordered). Always at least one bullet.

Scope

  • Log-based: reuses GcLogParser, filters events to --last=Nm window
  • --format=json for automation
  • --last accepts Ns, Nm, Nh or a bare number in seconds; default 5m
  • i18n (en, ko, ja, zh), shell completions (bash, zsh, fish, ps1)

Non-goals in this PR

  • Live PID / target mode — deferred to a follow-up; requires per-event GC histograms over /prometheus that are not yet exposed.

References (design lineage)

  • Censum-style reasoning ("why" not just "what")
  • JDK Mission Control GC log narrative

Test plan

  • ./gradlew :argus-cli:test --tests io.argus.cli.gcwhy.* — passes
  • ./gradlew :argus-cli:test full suite — passes
  • Unit tests cover: System.gc, humongous, allocation burst, high occupancy, Full GC fallback, empty events, window filter picking the right in-window event

New command: argus gcwhy <gc-log-file> [--last=5m] [--format=json].
Picks the single worst pause inside the lookback window and emits up to
three natural-language "why" bullets plus a related-counters block.

Rule engine applies ordered checks over the target event and preceding
events in the window: explicit System.gc(), G1 humongous allocation,
Metaspace pressure, Full GC fallback (concurrent-mode / evacuation
failure), allocation burst vs recent baseline, high heap occupancy,
and outlier-pause detection. Falls back to a neutral one-line summary
when nothing anomalous is detected.

Log-based in v1, matching gcscore. Live PID/target mode is deferred
to a follow-up once the server exposes per-event histograms over /prometheus.

New files:
  - argus-cli/src/main/java/io/argus/cli/gcwhy/GcWhyResult.java
  - argus-cli/src/main/java/io/argus/cli/gcwhy/GcWhyAnalyzer.java
  - argus-cli/src/main/java/io/argus/cli/command/GcWhyCommand.java
  - argus-cli/src/test/java/io/argus/cli/gcwhy/GcWhyAnalyzerTest.java

Updated:
  - ArgusCli command registration
  - i18n (en, ko, ja, zh) cmd.gcwhy.desc / header.gcwhy / desc.gcwhy
  - Shell completions (bash, zsh, fish, ps1)

Closes #151.

Signed-off-by: rlaope <piyrw9754@gmail.com>
@rlaope rlaope merged commit b14b2f2 into master Apr 17, 2026
7 checks passed
@rlaope rlaope deleted the feat/argus-gcwhy branch April 17, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: argus gcwhy — explain why the last GC spike happened

1 participant