diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index c3f1ef5..d6ceef7 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -35,6 +35,8 @@ jobs: publish-pypi: name: Publish to PyPI (OIDC) + # Disabled until PyPI name is finalized (audit 2026-05-18: name squat detected) + if: false needs: build runs-on: ubuntu-latest environment: diff --git a/README.md b/README.md index 1ab9e2c..c1ce774 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://pypi.org/project/dose/) +> ⚠️ **PyPI name notice**: A package named `dose` already exists on PyPI from a different author. **Do NOT** run `pip install dose`. This project is distributed via GitHub source only until a unique PyPI name is chosen. See https://github.com/hinanohart/dose#installation for the correct install method. `dose` measures how much the helpful and harmful capability subspaces of a language model overlap — and how that overlap changes as intervention strength varies. The core metric, **PSI (Pharmakon Separability Index)**, quantifies whether a model's representations treat safe and unsafe behaviors as geometrically separable directions. Urbina et al. (2022) showed that the same AI system used to discover therapeutics can be redirected to generate chemical weapons with minimal effort — the capability is not separated, only constrained by convention. `dose` extends this framing to the activation geometry of language models and measures it quantitatively. @@ -29,7 +30,7 @@ The H2 auto-decision logic (`orchestrator._auto_decide_h2`) and the reproducibil ## Installation ```bash -pip install dose +pip install dose # do not run, see above # GPU (recommended for full runs): pip install dose torch --index-url https://download.pytorch.org/whl/cu121 # With Gradio demo: