GitHub - lciric/eval-awareness-detection: Mechanistic detection of eval-awareness in language models via representation engineering · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
extraction		extraction
probing		probing
prompts		prompts
steering		steering
utils		utils
.gitignore		.gitignore
main.py		main.py
requirements.txt		requirements.txt

About

Mechanistic detection of eval-awareness in language models via representation engineering

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%