Skip to content

✨ feat: use remote endpoints as default when using remote mode#424

Merged
Nicola Franco (franconicola) merged 3 commits into
mainfrom
421-use-remote-models-as-default-if-using-remote-mode
Jun 10, 2026
Merged

✨ feat: use remote endpoints as default when using remote mode#424
Nicola Franco (franconicola) merged 3 commits into
mainfrom
421-use-remote-models-as-default-if-using-remote-mode

Conversation

@marcorusso97

Copy link
Copy Markdown
Contributor

Summary

This PR introduces mode-aware role defaults for attack configurations in the orchestrator.

When HackAgent is initialized in remote mode, attack role defaults are now resolved from a remote profile.
When running in local mode, role defaults are resolved from a local profile.

The spreadsheet is still descriptive only and is not used at runtime.

What changed

1. Added centralized role mapping

A single attack-to-role map is now used to define which attack roles should receive defaults, and it consists of an additional field in the already existing _ATTACK_MODEL_ROLE_PATHS. This field establishes, for each attack, whether each role should be performed either by the attacker or by the judge model. For instance, in AutoDan-Turbo, "attacker" and "summarizer" are both performed by the attacker model.

This replaces the previous naming that implied remote-only behavior.

2. Added mode detection from backend context

The orchestrator now determines mode by checking whether a backend API key is available:

  • If API key exists and is non-empty -> remote mode
  • Otherwise -> local mode

3. Added explicit remote role defaults

Introduced _remote_role_defaults(api_key), with:

  • attacker:
    • endpoint: https://api.hackagent.dev/v1
    • agent_type: OPENAI_SDK
    • identifier: hackagent-attacker
    • api_key: backend key (fallback if not overridden)
  • judge:
    • endpoint: https://api.hackagent.dev/v1
    • agent_type: OPENAI_SDK
    • identifier: hackagent-judge
    • type: harmbench_variant
    • api_key: backend key (fallback if not overridden)

4. Added explicit local role defaults

Introduced _local_role_defaults(), aligned with the updated attack_roles sheet:

5. Added mode-based config normalization before execution

Introduced _apply_mode_based_role_defaults(attack_config), called at the beginning of execute().

Behavior:

  • Applies role defaults based on detected mode
  • Fills only missing keys
  • Preserves explicit user overrides
  • Keeps judge and judges structures consistent

6. Fixed evaluator warning for auto-injected judges

By ensuring a default judge type is present in injected role defaults, we avoid warnings like:

  • Unknown or missing judge type for: {...}

Tests

Added and updated orchestrator unit tests to cover:

  • Remote mode default injection
  • Local mode default injection
  • Preservation of explicit overrides
  • Filling of missing role fields

Impact

  • No changes required for users providing full explicit attack role config
  • Improved default behavior for users relying on implicit role defaults
  • Consistent and predictable behavior between remote and local mode

@marcorusso97 Marco Russo (marcorusso97) linked an issue Jun 9, 2026 that may be closed by this pull request
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.93939% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
hackagent/attacks/orchestrator.py 93.93% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@franconicola Nicola Franco (franconicola) merged commit dd38f0e into main Jun 10, 2026
43 of 44 checks passed
@franconicola Nicola Franco (franconicola) deleted the 421-use-remote-models-as-default-if-using-remote-mode branch June 10, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use remote models as default if using remote mode

2 participants