fix: normalize TAP judge scores to consistent 1-10 scale by RPaolino · Pull Request #365 · AISecurityLab/hackagent

Raffaele Paolino (RPaolino) · 2026-05-18T10:22:33Z

Summary

Fixes #347 — TAP attack score scale inconsistency causing premature early stopping and incorrect pruning.

Problem

Two issues in the TAP attack's scoring logic:

Score normalization: Binary judges (HarmBench) return 0/1, but TAP's internal logic expects 1-10. With success_score_threshold defaulting to 1 in DEFAULT_TAP_CONFIG, any non-refusal response (score=1) immediately triggers early stopping — making the tree search pointless.
finalize_all_goals threshold: The evaluation step's finalize_all_goals call didn't pass through the configured success_score_threshold, causing it to use a different default than the generation step.

Changes

generation.py: Add _normalize_score() to map binary (0/1) scores to the 1-10 scale used by TAP's internal system prompt. All judge scores are normalized before being used for pruning, early stopping, and candidate selection.
evaluation.py: Pass success_threshold to coordinator.finalize_all_goals() so the final success determination uses the same threshold as generation.

Problem ------- TAP had two conflicting defaults for success_score_threshold: - DEFAULT_TAP_CONFIG dict: 1 (designed for binary 0/1 judges) - TapParams Pydantic model: 10 (designed for nuanced 1-10 judges) Since the TAP JUDGE_SYSTEM_PROMPT scores on a 1-10 scale (where 1 is the minimum), any non-refusal response scored >= 1 and immediately triggered early stopping when the dict default (1) was used. The pruning min_score was hardcoded to 1, which: - On the 1-10 scale: kept every branch (no pruning effect) - On the 0/1 scale: only kept already-jailbroken branches (too aggressive) Solution -------- - evaluation.py: Auto-detect binary judges (harmbench, harmbench_variant, jailbreakbench) via _judges_are_binary() with robust type inference from type, evaluator_type, and identifier fields. Normalize binary 0/1 scores to 1/10 in score_candidates() so all downstream logic (early stop, pruning, success check) works on a uniform 1-10 scale. - config.py: Align DEFAULT_TAP_CONFIG dict default to 10, matching TapParams. Add min_judge_prune_score (default 3) to both dict and Pydantic model. - generation.py: Read min_judge_prune_score from config (default 3) instead of hardcoded min_score=1. Align success_threshold fallback to 10. Note: The default success_score_threshold is 10 (perfect jailbreak). Users can override via tap_params.success_score_threshold.

The coordinator's _default_goal_scorer uses success_threshold=0.5 by default, which is correct for a 0-1 scale but wrong for TAP's 1-10 normalized scale. A judge score of 1 (= 'no jailbreak') exceeds 0.5 and was incorrectly classified as SUCCESSFUL_JAILBREAK. TAP now passes success_score_threshold (default 10) so only scores >= 10 are classified as successful jailbreaks.

codecov · 2026-05-18T10:37:16Z

Codecov Report

❌ Patch coverage is 25.80645% with 23 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
hackagent/attacks/techniques/tap/evaluation.py	19.23%	21 Missing ⚠️
hackagent/attacks/techniques/tap/generation.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Raffaele Paolino (RPaolino) added 2 commits May 18, 2026 12:17

Nicola Franco (franconicola) requested a review from Marco Russo (marcorusso97) May 20, 2026 16:39

Marco Russo (marcorusso97) approved these changes May 22, 2026

View reviewed changes

Marco Russo (marcorusso97) merged commit 420fcc8 into main May 22, 2026
21 of 22 checks passed

Marco Russo (marcorusso97) deleted the fix/tap-score-scale-inconsistency branch May 22, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalize TAP judge scores to consistent 1-10 scale#365

fix: normalize TAP judge scores to consistent 1-10 scale#365
Marco Russo (marcorusso97) merged 2 commits into
mainfrom
fix/tap-score-scale-inconsistency

Raffaele Paolino (RPaolino) commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Raffaele Paolino (RPaolino) commented May 18, 2026

Summary

Problem

Changes

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants