Honeypot Protocol

A control protocol that tests whether AI models behave differently under different perceived oversight contexts. The protocol varies only the system prompt across three conditions (evaluation, synthetic deployment, no-monitoring) while holding the task, tools, environment, and scoring identical. Built on ControlArena and evaluated in BashArena.

Quick start

git clone https://github.com/najmulhasan-code/honeypot-protocol.git
cd honeypot-protocol
python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env      # add your API keys

Requires Docker running for BashArena sandbox execution.

Running experiments

python -m src.experiment --quick          # single evaluation
python -m src.experiment --run opus       # all conditions/modes for one model
python -m src.experiment --full           # full matrix across all models

Windows: use run.bat instead of python -m src.experiment.

Structure

src/
  config.py                  # models, conditions, parameters
  experiment.py              # BashArena evaluation runner
  prompts/contexts.py        # three system prompt conditions
  protocols/honeypot_protocol.py  # honeypot micro-protocol
results/                     # evaluation outputs (JSON, CSV)

Citation

@misc{hasan2026honeypot,
  title={Honeypot Protocol},
  author={Hasan, Najmul},
  year={2026},
  howpublished={AI Control Hackathon, Apart Research},
  url={https://apartresearch.com/project/honeypot-protocol-hmsb}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Honeypot Protocol

Quick start

Running experiments

Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Honeypot Protocol

Quick start

Running experiments

Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages