Skip to content

najmulhasan-code/honeypot-protocol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Honeypot Protocol

A control protocol that tests whether AI models behave differently under different perceived oversight contexts. The protocol varies only the system prompt across three conditions (evaluation, synthetic deployment, no-monitoring) while holding the task, tools, environment, and scoring identical. Built on ControlArena and evaluated in BashArena.

Protocol overview

Quick start

git clone https://github.com/najmulhasan-code/honeypot-protocol.git
cd honeypot-protocol
python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env      # add your API keys

Requires Docker running for BashArena sandbox execution.

Running experiments

python -m src.experiment --quick          # single evaluation
python -m src.experiment --run opus       # all conditions/modes for one model
python -m src.experiment --full           # full matrix across all models

Windows: use run.bat instead of python -m src.experiment.

Structure

src/
  config.py                  # models, conditions, parameters
  experiment.py              # BashArena evaluation runner
  prompts/contexts.py        # three system prompt conditions
  protocols/honeypot_protocol.py  # honeypot micro-protocol
results/                     # evaluation outputs (JSON, CSV)

Citation

@misc{hasan2026honeypot,
  title={Honeypot Protocol},
  author={Hasan, Najmul},
  year={2026},
  howpublished={AI Control Hackathon, Apart Research},
  url={https://apartresearch.com/project/honeypot-protocol-hmsb}
}

About

Honeypot Protocol

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages