A control protocol that tests whether AI models behave differently under different perceived oversight contexts. The protocol varies only the system prompt across three conditions (evaluation, synthetic deployment, no-monitoring) while holding the task, tools, environment, and scoring identical. Built on ControlArena and evaluated in BashArena.
git clone https://github.com/najmulhasan-code/honeypot-protocol.git
cd honeypot-protocol
python -m venv venv
source venv/bin/activate # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env # add your API keysRequires Docker running for BashArena sandbox execution.
python -m src.experiment --quick # single evaluation
python -m src.experiment --run opus # all conditions/modes for one model
python -m src.experiment --full # full matrix across all modelsWindows: use run.bat instead of python -m src.experiment.
src/
config.py # models, conditions, parameters
experiment.py # BashArena evaluation runner
prompts/contexts.py # three system prompt conditions
protocols/honeypot_protocol.py # honeypot micro-protocol
results/ # evaluation outputs (JSON, CSV)
@misc{hasan2026honeypot,
title={Honeypot Protocol},
author={Hasan, Najmul},
year={2026},
howpublished={AI Control Hackathon, Apart Research},
url={https://apartresearch.com/project/honeypot-protocol-hmsb}
}