ALSO: Adversarial Online Strategy Optimization for Social Agents

ICML 2026

Xiang Li¹, Liping Yi¹, Mingze Kong², Ming Zhang³, Zhongxiang Dai², Qinghua Hu¹

¹Tianjin University ²The Chinese University of Hong Kong, Shenzhen ³East China Normal University

[Project Page] [Paper] [arXiv] [Code]

News

May 2026: ALSO is accepted to ICML 2026.

Overview

ALSO studies online strategy optimization for LLM-based social agents in multi-turn social simulation. In environments such as Sotopia, agents face evolving dialogue contexts and non-stationary opponents, so a static persona or fixed behavioral instruction can lead to repeated deadlocks and poor goal completion.

ALSO formulates turn-level strategy adaptation as an adversarial bandit problem. At each dialogue turn, the system selects a persona-strategy arm, injects the selected social strategy into the agent prompt, observes reward feedback from the interaction, and updates a lightweight neural surrogate for sample-efficient online adaptation. No model weights are fine-tuned.

This repository contains the Sotopia-based implementation used for the paper experiments, including the main ALSO runner, strategy spaces, bandit baselines, evaluation scripts, and focused regression tests.

Highlights

Online adaptation: adapts within a single multi-turn interaction instead of relying on offline retraining.
Adversarial bandit formulation: does not assume a stationary or cooperative opponent.
Strategy injection: optimizes high-level behavioral strategies at prompt time without fine-tuning LLMs.
Neural reward surrogate: predicts per-arm rewards from interaction context to reduce exploration cost.
Sotopia evaluation: supports all/hard scenario splits, bilateral optimization, static baselines, and evolutionary prompt-optimization baselines.

Results Snapshot

The project page reports that ALSO is best or near-best across Sotopia-All and Sotopia-Hard in the bilateral optimization setting. The table below shows the Overall score summary.

Model	Split	Original	Instinct	OPRO	EvoPrompt	ALSO
DeepSeek-V3.2	Sotopia-All	3.619	3.851	3.787	3.737	3.889
DeepSeek-V3.2	Sotopia-Hard	3.025	3.427	3.344	3.292	3.527
Qwen2.5-72B	Sotopia-All	3.676	3.848	3.689	3.825	3.882
Qwen2.5-72B	Sotopia-Hard	3.347	3.666	3.242	3.491	3.648

See the project page for additional ablations, strategy drift analysis, heterogeneous model pairing, and case studies.

Repository Layout

.
├── sotopia/                                  # Sotopia package code
├── tests/                                    # Upstream Sotopia tests
└── experiments/also/                       # ALSO paper artifact
    ├── core/                                 # Bandits, strategy spaces, dynamic envs, evaluators
    ├── conf/main_experiments/                # Tmuxinator configs for smoke and paper runs
    ├── generated_strategies/                 # Small strategy pools used by the strategy loader
    ├── scripts/generate_strategy_cache.py    # Strategy embedding cache generation
    ├── tests/                                # Focused artifact tests
    ├── calculate_cost.py
    ├── evaluate_by_tag.py
    └── run_bandit_simulation_context.py

Generated runtime artifacts are intentionally excluded from git:

experiments/also/outputs/
experiments/also/cache/
experiments/also/results/
embedding caches, figures, spreadsheets, and historical intermediate datasets

Installation

1. Create the Python environment

Use Python 3.10-3.12 and uv.

git clone https://github.com/Babylonehy/ALSO.git
cd ALSO

uv sync --extra api --extra test --extra paper

2. Install Sotopia data

uv run sotopia install

This initializes the Sotopia runtime data needed by the experiment runner. The full paper runs also require access to model APIs and, when using database-backed evaluation, the Sotopia/Redis setup expected by the base Sotopia package.

3. Create your local `.env`

Copy the template and edit the values for your machine:

cp .env.example .env

At minimum, set the model-provider keys you plan to use:

OPENROUTER_API_KEY=replace_with_your_openrouter_key
OPENAI_API_KEY=replace_with_your_openai_key_if_using_openai_models

If you use a remote or password-protected Redis service, also set REDIS_OM_URL in .env. Keep .env private; it is ignored by git.

4. Verify the database service

ALSO needs the Sotopia Redis database for scenario, agent, and episode records. If uv run sotopia install created the Docker service successfully, redis-stack should be running on port 6379.

docker ps | grep redis-stack
docker exec redis-stack redis-cli ping

Expected output:

PONG

If the container exists but is stopped, restart it:

docker start redis-stack

If the container does not exist yet, run a non-interactive Sotopia install with Docker and published data:

uv run sotopia install \
  --use-docker \
  --load-database \
  --redis-data-path "$(pwd)" \
  --overwrite-existing-data

Then verify that Sotopia can query the loaded data:

uv run python - <<'PY'
from sotopia.database import AgentProfile, EnvironmentProfile
from sotopia.database.env_agent_combo_storage import EnvAgentComboStorage

print("agents:", len(list(AgentProfile.all_pks())))
print("environments:", len(list(EnvironmentProfile.all_pks())))
print("env_agent_combos:", len(list(EnvAgentComboStorage.all_pks())))
PY

If Redis is remote or password-protected, set REDIS_OM_URL in .env before running Python scripts:

REDIS_OM_URL=redis://default:password@host:6379

If local proxy variables should not be used for API calls:

unset ALL_PROXY all_proxy

5. Optional: install tmuxinator

Paper-scale configs are written as tmuxinator files.

sudo apt install tmuxinator

Quick Start

Run a one-scenario, two-turn smoke test from the repository root:

tmuxinator start \
  -p experiments/also/conf/main_experiments/smoke_test.yml \
  project_root=$(pwd)

Equivalent direct command:

cd experiments/also

uv run python run_bandit_simulation_context.py \
  --batch \
  --subset hard \
  --max-episodes 1 \
  --batch-size 1 \
  --selection-mode strategy \
  --strategy-version v3 \
  --model openrouter/openai/gpt-4o-mini \
  --env-model openrouter/openai/gpt-4o-mini \
  --reward-eval-model openrouter/openai/gpt-4o-mini \
  --bandit-type adversarial \
  --optimize both \
  --max-turns 2 \
  --tag smoke_test \
  --output outputs/smoke_test.json

Expected output:

experiments/also/outputs/smoke_test.json

Experiments

1. Precompute strategy embeddings

Full paper runs use strategy mode with the V3 strategy space. Generate the cache once before launching batch experiments:

cd experiments/also

uv run python scripts/generate_strategy_cache.py \
  --subset hard \
  --strategy-version v3 \
  --cache-dir cache/strategy_embeddings_v3_slim \
  --skip-existing

2. Run ALSO

From the repository root:

tmuxinator start \
  -p experiments/also/conf/main_experiments/adversarial_v3_hard.yml \
  project_root=$(pwd) \
  batch=40 \
  eta=0.5

The main config launches P1-only, P2-only, and bilateral optimization panes for the hard split.

3. Run baselines

Method	Config
Original / no optimization	`experiments/also/conf/main_experiments/baseline_v3.yml`
ALSO / adversarial bandit	`experiments/also/conf/main_experiments/adversarial_v3_hard.yml`
OPRO	`experiments/also/conf/main_experiments/opro_v3.yml`
EvoPrompt	`experiments/also/conf/main_experiments/evoprompt_v3.yml`
PromptBreeder	`experiments/also/conf/main_experiments/promptbreeder_v3.yml`
Neural UCB	`experiments/also/conf/main_experiments/neural_ucb_no_ctx_v3.yml`

Example:

tmuxinator start \
  -p experiments/also/conf/main_experiments/opro_v3.yml \
  project_root=$(pwd) \
  batch=40

4. Smaller direct run

For a smaller command-line run without tmuxinator:

cd experiments/also

uv run python run_bandit_simulation_context.py \
  --selection-mode strategy \
  --strategy-version v3 \
  --context-embedding \
  --embedding-model qwen/qwen3-embedding-8b \
  --context-embedding-dim 4096 \
  --batch \
  --subset hard_small \
  --batch-size 14 \
  --no-mask-unselected-scores \
  --model openrouter/deepseek/deepseek-v3.2 \
  --reward-eval-model openrouter/deepseek/deepseek-v3.2 \
  --bandit-type adversarial \
  --optimize both \
  --eta 10 \
  --depth 2 \
  --max-turns 20 \
  --push-to-db \
  --strategy-cache-dir cache/strategy_embeddings_v3_slim \
  --tag-prefix reproduction

Evaluation

List available experiment tags:

cd experiments/also
uv run python evaluate_by_tag.py --list-tags

Evaluate one run:

uv run python evaluate_by_tag.py \
  --tag reproduction_bandit_adversarial_both_hard_small \
  --eval-set hard

Compare multiple runs and export tables:

uv run python evaluate_by_tag.py \
  --tags tag_a tag_b tag_c \
  --output results/comparison.csv \
  --output-xlsx results/comparison.xlsx \
  --export-csv results/tables \
  --save-all

Testing

Run focused artifact tests:

uv run pytest experiments/also/tests -q

Check retained entrypoints:

uv run python -m py_compile \
  experiments/also/run_bandit_simulation_context.py \
  experiments/also/evaluate_by_tag.py \
  experiments/also/calculate_cost.py \
  experiments/also/scripts/generate_strategy_cache.py

Acknowledgements

The experiment environment is built on the Sotopia social simulation framework. The project-page style follows common academic project-page conventions and links to the public ALSO page for figures, ablations, and qualitative examples.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
experiments/also		experiments/also
figs		figs
notebooks		notebooks
scripts		scripts
sotopia-chat		sotopia-chat
sotopia		sotopia
sotopia_conf		sotopia_conf
stubs		stubs
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALSO: Adversarial Online Strategy Optimization for Social Agents

News

Overview

Highlights

Results Snapshot

Repository Layout

Installation

1. Create the Python environment

2. Install Sotopia data

3. Create your local `.env`

4. Verify the database service

5. Optional: install tmuxinator

Quick Start

Experiments

1. Precompute strategy embeddings

2. Run ALSO

3. Run baselines

4. Smaller direct run

Evaluation

Testing

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ALSO: Adversarial Online Strategy Optimization for Social Agents

News

Overview

Highlights

Results Snapshot

Repository Layout

Installation

1. Create the Python environment

2. Install Sotopia data

3. Create your local .env

4. Verify the database service

5. Optional: install tmuxinator

Quick Start

Experiments

1. Precompute strategy embeddings

2. Run ALSO

3. Run baselines

4. Smaller direct run

Evaluation

Testing

Acknowledgements

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Create your local `.env`

Packages