Skip to content

Babylonehy/ALSO

Repository files navigation

ALSO: Adversarial Online Strategy Optimization for Social Agents

ICML 2026

Xiang Li1, Liping Yi1, Mingze Kong2, Ming Zhang3, Zhongxiang Dai2, Qinghua Hu1

1Tianjin University    2The Chinese University of Hong Kong, Shenzhen    3East China Normal University

ALSO framework overview

[Project Page] [Paper] [arXiv] [Code]

News

  • May 2026: ALSO is accepted to ICML 2026.

Overview

ALSO studies online strategy optimization for LLM-based social agents in multi-turn social simulation. In environments such as Sotopia, agents face evolving dialogue contexts and non-stationary opponents, so a static persona or fixed behavioral instruction can lead to repeated deadlocks and poor goal completion.

ALSO formulates turn-level strategy adaptation as an adversarial bandit problem. At each dialogue turn, the system selects a persona-strategy arm, injects the selected social strategy into the agent prompt, observes reward feedback from the interaction, and updates a lightweight neural surrogate for sample-efficient online adaptation. No model weights are fine-tuned.

This repository contains the Sotopia-based implementation used for the paper experiments, including the main ALSO runner, strategy spaces, bandit baselines, evaluation scripts, and focused regression tests.

Highlights

  • Online adaptation: adapts within a single multi-turn interaction instead of relying on offline retraining.
  • Adversarial bandit formulation: does not assume a stationary or cooperative opponent.
  • Strategy injection: optimizes high-level behavioral strategies at prompt time without fine-tuning LLMs.
  • Neural reward surrogate: predicts per-arm rewards from interaction context to reduce exploration cost.
  • Sotopia evaluation: supports all/hard scenario splits, bilateral optimization, static baselines, and evolutionary prompt-optimization baselines.

Results Snapshot

The project page reports that ALSO is best or near-best across Sotopia-All and Sotopia-Hard in the bilateral optimization setting. The table below shows the Overall score summary.

Model Split Original Instinct OPRO EvoPrompt ALSO
DeepSeek-V3.2 Sotopia-All 3.619 3.851 3.787 3.737 3.889
DeepSeek-V3.2 Sotopia-Hard 3.025 3.427 3.344 3.292 3.527
Qwen2.5-72B Sotopia-All 3.676 3.848 3.689 3.825 3.882
Qwen2.5-72B Sotopia-Hard 3.347 3.666 3.242 3.491 3.648

See the project page for additional ablations, strategy drift analysis, heterogeneous model pairing, and case studies.

Repository Layout

.
├── sotopia/                                  # Sotopia package code
├── tests/                                    # Upstream Sotopia tests
└── experiments/also/                       # ALSO paper artifact
    ├── core/                                 # Bandits, strategy spaces, dynamic envs, evaluators
    ├── conf/main_experiments/                # Tmuxinator configs for smoke and paper runs
    ├── generated_strategies/                 # Small strategy pools used by the strategy loader
    ├── scripts/generate_strategy_cache.py    # Strategy embedding cache generation
    ├── tests/                                # Focused artifact tests
    ├── calculate_cost.py
    ├── evaluate_by_tag.py
    └── run_bandit_simulation_context.py

Generated runtime artifacts are intentionally excluded from git:

  • experiments/also/outputs/
  • experiments/also/cache/
  • experiments/also/results/
  • embedding caches, figures, spreadsheets, and historical intermediate datasets

Installation

1. Create the Python environment

Use Python 3.10-3.12 and uv.

git clone https://github.com/Babylonehy/ALSO.git
cd ALSO

uv sync --extra api --extra test --extra paper

2. Install Sotopia data

uv run sotopia install

This initializes the Sotopia runtime data needed by the experiment runner. The full paper runs also require access to model APIs and, when using database-backed evaluation, the Sotopia/Redis setup expected by the base Sotopia package.

3. Create your local .env

Copy the template and edit the values for your machine:

cp .env.example .env

At minimum, set the model-provider keys you plan to use:

OPENROUTER_API_KEY=replace_with_your_openrouter_key
OPENAI_API_KEY=replace_with_your_openai_key_if_using_openai_models

If you use a remote or password-protected Redis service, also set REDIS_OM_URL in .env. Keep .env private; it is ignored by git.

4. Verify the database service

ALSO needs the Sotopia Redis database for scenario, agent, and episode records. If uv run sotopia install created the Docker service successfully, redis-stack should be running on port 6379.

docker ps | grep redis-stack
docker exec redis-stack redis-cli ping

Expected output:

PONG

If the container exists but is stopped, restart it:

docker start redis-stack

If the container does not exist yet, run a non-interactive Sotopia install with Docker and published data:

uv run sotopia install \
  --use-docker \
  --load-database \
  --redis-data-path "$(pwd)" \
  --overwrite-existing-data

Then verify that Sotopia can query the loaded data:

uv run python - <<'PY'
from sotopia.database import AgentProfile, EnvironmentProfile
from sotopia.database.env_agent_combo_storage import EnvAgentComboStorage

print("agents:", len(list(AgentProfile.all_pks())))
print("environments:", len(list(EnvironmentProfile.all_pks())))
print("env_agent_combos:", len(list(EnvAgentComboStorage.all_pks())))
PY

If Redis is remote or password-protected, set REDIS_OM_URL in .env before running Python scripts:

REDIS_OM_URL=redis://default:password@host:6379

If local proxy variables should not be used for API calls:

unset ALL_PROXY all_proxy

5. Optional: install tmuxinator

Paper-scale configs are written as tmuxinator files.

sudo apt install tmuxinator

Quick Start

Run a one-scenario, two-turn smoke test from the repository root:

tmuxinator start \
  -p experiments/also/conf/main_experiments/smoke_test.yml \
  project_root=$(pwd)

Equivalent direct command:

cd experiments/also

uv run python run_bandit_simulation_context.py \
  --batch \
  --subset hard \
  --max-episodes 1 \
  --batch-size 1 \
  --selection-mode strategy \
  --strategy-version v3 \
  --model openrouter/openai/gpt-4o-mini \
  --env-model openrouter/openai/gpt-4o-mini \
  --reward-eval-model openrouter/openai/gpt-4o-mini \
  --bandit-type adversarial \
  --optimize both \
  --max-turns 2 \
  --tag smoke_test \
  --output outputs/smoke_test.json

Expected output:

experiments/also/outputs/smoke_test.json

Experiments

1. Precompute strategy embeddings

Full paper runs use strategy mode with the V3 strategy space. Generate the cache once before launching batch experiments:

cd experiments/also

uv run python scripts/generate_strategy_cache.py \
  --subset hard \
  --strategy-version v3 \
  --cache-dir cache/strategy_embeddings_v3_slim \
  --skip-existing

2. Run ALSO

From the repository root:

tmuxinator start \
  -p experiments/also/conf/main_experiments/adversarial_v3_hard.yml \
  project_root=$(pwd) \
  batch=40 \
  eta=0.5

The main config launches P1-only, P2-only, and bilateral optimization panes for the hard split.

3. Run baselines

Method Config
Original / no optimization experiments/also/conf/main_experiments/baseline_v3.yml
ALSO / adversarial bandit experiments/also/conf/main_experiments/adversarial_v3_hard.yml
OPRO experiments/also/conf/main_experiments/opro_v3.yml
EvoPrompt experiments/also/conf/main_experiments/evoprompt_v3.yml
PromptBreeder experiments/also/conf/main_experiments/promptbreeder_v3.yml
Neural UCB experiments/also/conf/main_experiments/neural_ucb_no_ctx_v3.yml

Example:

tmuxinator start \
  -p experiments/also/conf/main_experiments/opro_v3.yml \
  project_root=$(pwd) \
  batch=40

4. Smaller direct run

For a smaller command-line run without tmuxinator:

cd experiments/also

uv run python run_bandit_simulation_context.py \
  --selection-mode strategy \
  --strategy-version v3 \
  --context-embedding \
  --embedding-model qwen/qwen3-embedding-8b \
  --context-embedding-dim 4096 \
  --batch \
  --subset hard_small \
  --batch-size 14 \
  --no-mask-unselected-scores \
  --model openrouter/deepseek/deepseek-v3.2 \
  --reward-eval-model openrouter/deepseek/deepseek-v3.2 \
  --bandit-type adversarial \
  --optimize both \
  --eta 10 \
  --depth 2 \
  --max-turns 20 \
  --push-to-db \
  --strategy-cache-dir cache/strategy_embeddings_v3_slim \
  --tag-prefix reproduction

Evaluation

List available experiment tags:

cd experiments/also
uv run python evaluate_by_tag.py --list-tags

Evaluate one run:

uv run python evaluate_by_tag.py \
  --tag reproduction_bandit_adversarial_both_hard_small \
  --eval-set hard

Compare multiple runs and export tables:

uv run python evaluate_by_tag.py \
  --tags tag_a tag_b tag_c \
  --output results/comparison.csv \
  --output-xlsx results/comparison.xlsx \
  --export-csv results/tables \
  --save-all

Testing

Run focused artifact tests:

uv run pytest experiments/also/tests -q

Check retained entrypoints:

uv run python -m py_compile \
  experiments/also/run_bandit_simulation_context.py \
  experiments/also/evaluate_by_tag.py \
  experiments/also/calculate_cost.py \
  experiments/also/scripts/generate_strategy_cache.py

Acknowledgements

The experiment environment is built on the Sotopia social simulation framework. The project-page style follows common academic project-page conventions and links to the public ALSO page for figures, ablations, and qualitative examples.

About

(ICML 2026) ALSO: Adversarial Online Strategy Optimization for Social Agents — first online, adversarial-bandit framework for strategy optimization in non-stationary multi-agent social simulation.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors