✨ feat: allow different judge models for same judge type and show stats in dashboard by marcorusso97 · Pull Request #420 · AISecurityLab/hackagent

Marco Russo (marcorusso97) · 2026-06-04T14:01:06Z

Summary

This PR introduces full support for running multiple judges of the same type with different models, and ensures their outputs are correctly tracked, aggregated, and rendered across the dashboard.

It also fixes consistency issues between summary panels and expanded detail views, so judge counts, names, metrics, and verdicts stay aligned.

Why

When two or more judges shared the same type, judge vote keys could collide and overwrite each other.
This caused missing judges, incorrect counts, incomplete strictness/ASR values, and absent verdict blocks in detail cards.

What Changed

Multi-judge key stability

Added deterministic suffixing for duplicate judge types.
Preserved distinct per-judge votes using canonical keys such as:
- eval_hb
- eval_hbv_1
- eval_hbv_2

Evaluation and metrics pipeline

Updated evaluation handling to avoid overwriting votes from repeated judge types.
Improved aggregation and sync logic to preserve per-judge outputs end to end.

Dashboard enrichment and rendering

Standardized propagation of:
- judge votes
- judge metadata (name/type)
- per-goal multi-judge metrics
Added robust fallbacks for legacy runs and sparse trace payloads.
Unified multi-judge verdict styling and behavior across attack cards.

Attack card updates

Improved verdict rendering in:
- AdvPrefix
- PAP
- Baseline
- BoN
- Generic card paths used by FlipAttack, CipherChat, and H4RM3L
Does not apply to scorer-based attacks (AutoDan-Turbo, PAIR, TAP).
Ensured verdicts also appear in mitigated scenarios when judge votes are available.

Tests

Updated unit tests for:
- evaluation step
- sync behavior
- metrics behavior
Added coverage for repeated judge-type scenarios and key-collision prevention.

Impact

Backward compatible for existing runs.
Eliminates duplicate-judge key collisions.
Improves reliability and transparency of multi-judge analytics in the dashboard.

…ts in dashboard

codecov · 2026-06-04T14:12:09Z

Codecov Report

❌ Patch coverage is 22.83951% with 625 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
hackagent/server/dashboard/_page.py	7.79%	355 Missing ⚠️
hackagent/attacks/evaluator/inline_step_judge.py	18.51%	66 Missing ⚠️
hackagent/server/dashboard/attack_cards/_shared.py	6.15%	61 Missing ⚠️
...kagent/server/dashboard/attack_cards/_advprefix.py	0.00%	40 Missing ⚠️
...ckagent/server/dashboard/attack_cards/_baseline.py	0.00%	30 Missing ⚠️
hackagent/attacks/evaluator/evaluation_step.py	81.95%	24 Missing ⚠️
...ackagent/server/dashboard/attack_cards/_generic.py	0.00%	24 Missing ⚠️
hackagent/server/dashboard/attack_cards/_bon.py	0.00%	11 Missing ⚠️
hackagent/server/dashboard/attack_cards/_pap.py	0.00%	7 Missing ⚠️
hackagent/attacks/evaluator/metrics.py	88.23%	2 Missing ⚠️
... and 3 more

📢 Thoughts on this report? Let us know!

Marco Russo (marcorusso97) · 2026-06-05T14:15:47Z

Add ID column and replace Judge column with Model column, that will contain the actual model name.
Then maximize code reuse.

… bug

+                    if val is not None:
+                        try:
+                            best_score = max(best_score, float(val))
+                        except (TypeError, ValueError):


✨ feat: allow different judge models for same judge type and show sta…

96bd08b

…ts in dashboard

Marco Russo (marcorusso97) requested a review from Raffaele Paolino (RPaolino) June 4, 2026 14:01

Marco Russo (marcorusso97) linked an issue Jun 4, 2026 that may be closed by this pull request

Allow usage of multiple judges of same type #414

Closed

🐛 fix: improved multi judge tables on dashboard and fixed recent runs…

6766a5c

… bug

github-code-quality Bot found potential problems Jun 9, 2026

View reviewed changes

Comment thread hackagent/attacks/evaluator/inline_step_judge.py

if val is not None:

try:

best_score = max(best_score, float(val))

except (TypeError, ValueError):

✅ test: added unit tests

2a5eb26

Nicola Franco (franconicola) merged commit 7c7e627 into main Jun 9, 2026
23 of 24 checks passed

Nicola Franco (franconicola) deleted the 414-allow-usage-of-multiple-judges-of-same-type branch June 9, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ feat: allow different judge models for same judge type and show stats in dashboard#420

✨ feat: allow different judge models for same judge type and show stats in dashboard#420
Nicola Franco (franconicola) merged 3 commits into
mainfrom
414-allow-usage-of-multiple-judges-of-same-type

Marco Russo (marcorusso97) commented Jun 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Marco Russo (marcorusso97) commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marco Russo (marcorusso97) commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Multi-judge key stability

Evaluation and metrics pipeline

Dashboard enrichment and rendering

Attack card updates

Tests

Impact

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Marco Russo (marcorusso97) commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marco Russo (marcorusso97) commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

Marco Russo (marcorusso97) commented Jun 5, 2026 •

edited

Loading