Skip to content

✨ feat: allow different judge models for same judge type and show stats in dashboard#420

Merged
Nicola Franco (franconicola) merged 3 commits into
mainfrom
414-allow-usage-of-multiple-judges-of-same-type
Jun 9, 2026
Merged

✨ feat: allow different judge models for same judge type and show stats in dashboard#420
Nicola Franco (franconicola) merged 3 commits into
mainfrom
414-allow-usage-of-multiple-judges-of-same-type

Conversation

@marcorusso97

@marcorusso97 Marco Russo (marcorusso97) commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR introduces full support for running multiple judges of the same type with different models, and ensures their outputs are correctly tracked, aggregated, and rendered across the dashboard.

It also fixes consistency issues between summary panels and expanded detail views, so judge counts, names, metrics, and verdicts stay aligned.

Why

When two or more judges shared the same type, judge vote keys could collide and overwrite each other.
This caused missing judges, incorrect counts, incomplete strictness/ASR values, and absent verdict blocks in detail cards.

What Changed

Multi-judge key stability

  • Added deterministic suffixing for duplicate judge types.
  • Preserved distinct per-judge votes using canonical keys such as:
    • eval_hb
    • eval_hbv_1
    • eval_hbv_2

Evaluation and metrics pipeline

  • Updated evaluation handling to avoid overwriting votes from repeated judge types.
  • Improved aggregation and sync logic to preserve per-judge outputs end to end.

Dashboard enrichment and rendering

  • Standardized propagation of:
    • judge votes
    • judge metadata (name/type)
    • per-goal multi-judge metrics
  • Added robust fallbacks for legacy runs and sparse trace payloads.
  • Unified multi-judge verdict styling and behavior across attack cards.

Attack card updates

  • Improved verdict rendering in:
    • AdvPrefix
    • PAP
    • Baseline
    • BoN
    • Generic card paths used by FlipAttack, CipherChat, and H4RM3L
  • Does not apply to scorer-based attacks (AutoDan-Turbo, PAIR, TAP).
  • Ensured verdicts also appear in mitigated scenarios when judge votes are available.

Tests

  • Updated unit tests for:
    • evaluation step
    • sync behavior
    • metrics behavior
  • Added coverage for repeated judge-type scenarios and key-collision prevention.

Impact

  • Backward compatible for existing runs.
  • Eliminates duplicate-judge key collisions.
  • Improves reliability and transparency of multi-judge analytics in the dashboard.

@marcorusso97

Marco Russo (marcorusso97) commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Add ID column and replace Judge column with Model column, that will contain the actual model name.
Then maximize code reuse.

if val is not None:
try:
best_score = max(best_score, float(val))
except (TypeError, ValueError):
@franconicola Nicola Franco (franconicola) merged commit 7c7e627 into main Jun 9, 2026
23 of 24 checks passed
@franconicola Nicola Franco (franconicola) deleted the 414-allow-usage-of-multiple-judges-of-same-type branch June 9, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow usage of multiple judges of same type

2 participants