Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .pr/01-pypi-metadata.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 0.1.3
uploaded: 2026-06-09T19:08:21
bdist_wheel toolshield-0.1.3-py3-none-any.whl
size: 84128 bytes
sha256: aa52be93bbc2c552529254482dc0f6a7493325b8becc2b897b3cb80be4b23fd3
sdist toolshield-0.1.3.tar.gz
size: 9807528 bytes
sha256: f680c20398aeb5f95820c25e70c9fdb97ff416b4e033422859f3173570cd3826
12 changes: 12 additions & 0 deletions .pr/02-reproducible-build.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Rebuilt 0.1.3 from `git archive HEAD` (commit f9bc16d3) and compared to PyPI:

wheel local sha256: aa52be93bbc2c552529254482dc0f6a7493325b8becc2b897b3cb80be4b23fd3
wheel PyPI sha256: aa52be93bbc2c552529254482dc0f6a7493325b8becc2b897b3cb80be4b23fd3
wheel match: YES

sdist local sha256: f680c20398aeb5f95820c25e70c9fdb97ff416b4e033422859f3173570cd3826
sdist PyPI sha256: f680c20398aeb5f95820c25e70c9fdb97ff416b4e033422859f3173570cd3826
sdist match: YES

Conclusion: the published wheel is byte-identical to what `python -m build`
produces from a clean checkout of this commit. The repo and PyPI are in sync.
31 changes: 31 additions & 0 deletions .pr/03-wheel-contents.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Full contents of toolshield-0.1.3-py3-none-any.whl (downloaded from PyPI):
(generated with: unzip -l toolshield-0.1.3-py3-none-any.whl)

Archive: /tmp/ts-reproduce/_dist/toolshield-0.1.3-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
962 2020-02-02 00:00 toolshield/__init__.py
1000 2020-02-02 00:00 toolshield/_paths.py
27939 2020-02-02 00:00 toolshield/cli.py
23357 2020-02-02 00:00 toolshield/exp_generate.py
5691 2020-02-02 00:00 toolshield/experience_store.py
6848 2020-02-02 00:00 toolshield/inspector.py
17827 2020-02-02 00:00 toolshield/iterative_exp_runner.py
7264 2020-02-02 00:00 toolshield/mcp_scan.py
8832 2020-02-02 00:00 toolshield/post_process_prompts.py
47703 2020-02-02 00:00 toolshield/prompts.py
50381 2020-02-02 00:00 toolshield/tree_generation.py
9040 2020-02-02 00:00 toolshield/data/seed.sql
4144 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/filesystem-mcp.json
11686 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/gmail-mcp.json
3856 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/notion-mcp.json
8482 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/playwright-mcp.json
2635 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/postgres-mcp.json
10326 2020-02-02 00:00 toolshield/experiences/claude-sonnet-4.5/terminal-mcp.json
12048 2020-02-02 00:00 toolshield-0.1.3.dist-info/METADATA
87 2020-02-02 00:00 toolshield-0.1.3.dist-info/WHEEL
51 2020-02-02 00:00 toolshield-0.1.3.dist-info/entry_points.txt
1066 2020-02-02 00:00 toolshield-0.1.3.dist-info/licenses/LICENSE
2095 2020-02-02 00:00 toolshield-0.1.3.dist-info/RECORD
--------- -------
263320 23 files
44 changes: 44 additions & 0 deletions .pr/04-pypi-install-smoke.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
+ python -m venv /tmp/ts-smoke-venv

+ /tmp/ts-smoke-venv/bin/pip install --no-cache-dir toolshield==0.1.3
Installing collected packages: urllib3, typing_extensions, tqdm, sniffio, propcache, multidict, json-repair, jiter, idna, h11, frozenlist, distro, charset_normalizer, certifi, attrs, annotated-types, aiohappyeyeballs, yarl, typing-inspection, requests, pydantic-core, httpcore, anyio, aiosignal, pydantic, httpx, aiohttp, openai, toolshield
Successfully installed aiohappyeyeballs-2.6.2 aiohttp-3.14.1 aiosignal-1.4.0 annotated-types-0.7.0 anyio-4.13.0 attrs-26.1.0 certifi-2026.5.20 charset_normalizer-3.4.7 distro-1.9.0 frozenlist-1.8.0 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.18 jiter-0.15.0 json-repair-0.60.1 multidict-6.7.1 openai-2.41.0 propcache-0.5.2 pydantic-2.13.4 pydantic-core-2.46.4 requests-2.34.2 sniffio-1.3.1 toolshield-0.1.3 tqdm-4.68.2 typing-inspection-0.4.2 typing_extensions-4.15.0 urllib3-2.7.0 yarl-1.24.2

[notice] A new release of pip is available: 25.0.1 -> 26.1.2
[notice] To update, run: python -m pip install --upgrade pip

+ /tmp/ts-smoke-venv/bin/python -c '<reporter-smoke-test>'
toolshield loaded from: /tmp/ts-smoke-venv/lib/python3.12/site-packages/toolshield/__init__.py
toolshield.__version__ = '0.1.3'

>>> public API
ExperienceStore = <class 'toolshield.experience_store.ExperienceStore'>
MCPInspector = <class 'toolshield.inspector.MCPSSEInspector'> (alias for MCPSSEInspector)
load_experiences = <function load_experiences at 0x7fb57fca20c0>

>>> modules the wheel must contain (missing in 0.1.2 -> reason for #4)
toolshield.mcp_scan = /tmp/ts-smoke-venv/lib/python3.12/site-packages/toolshield/mcp_scan.py
toolshield.mcp_scan.scan_port = <function scan_port at 0x7fb57e060900>
toolshield.cli.main = <function main at 0x7fb57e0a9c60>

>>> bundled experiences (all under toolshield/experiences/claude-sonnet-4.5/)
- filesystem-mcp
- gmail-mcp
- notion-mcp
- playwright-mcp
- postgres-mcp
- terminal-mcp

>>> ExperienceStore.load_bundled(...) round trip
filesystem-mcp -> 12 experiences
gmail-mcp -> 26 experiences
notion-mcp -> 9 experiences
playwright-mcp -> 21 experiences
postgres-mcp -> 9 experiences
terminal-mcp -> 26 experiences

>>> `toolshield --help` CLI entry point
exit code: 0
first line of --help: 'usage: toolshield [-h] [--mcp_name MCP_NAME] [--mcp_server MCP_SERVER]'

ALL CHECKS PASSED -- PyPI toolshield==0.1.3 is fully functional.
59 changes: 59 additions & 0 deletions .pr/05-sdk-test-fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# SDK-side fix: skip extra-dependent tests when toolshield is absent

Commit: [`dfa5451a`](https://github.com/OpenHands/software-agent-sdk/pull/2911/commits)
Files touched: `tests/sdk/security/test_toolshield_llm_analyzer.py` (+15 / -1)

## What the previous CI run showed

`sdk-tests` job on the pre-fix PR head: **3766 passed, 4 failed, 13 xfailed**
(the 53 tests in `test_toolshield_llm_analyzer.py` are included in those
totals — 49 passed, 4 failed). All 4 failures share the same root cause:

```
FAILED tests/sdk/security/test_toolshield_llm_analyzer.py::TestSafetyExperiences::test_opt_in_to_default_seed
FAILED tests/sdk/security/test_toolshield_llm_analyzer.py::TestToolShieldHelpers::test_auto_detect_loads_experiences_for_detected_server
FAILED tests/sdk/security/test_toolshield_llm_analyzer.py::TestToolShieldHelpers::test_auto_detect_falls_back_to_default_seed_when_nothing_detected
FAILED tests/sdk/security/test_toolshield_llm_analyzer.py::TestToolShieldHelpers::test_auto_detect_handles_already_inside_event_loop

E ImportError: toolshield is not installed. Install via
E `pip install openhands-sdk[toolshield]` to use these helpers, or pass
E a custom string to ToolShieldLLMSecurityAnalyzer(safety_experiences=...).
```

The four tests genuinely need the real `toolshield` package (they exercise
`default_safety_experiences()` and `auto_detect_safety_experiences()`,
which import and call into `toolshield.experience_store` / `toolshield.mcp_scan`).
The `sdk-tests` job does not install optional extras, so the package
isn't available to those tests.

## The fix

Added a module-level `pytest.mark.skipif` factory:

```python
requires_toolshield = pytest.mark.skipif(
importlib.util.find_spec("toolshield") is None,
reason="requires the [toolshield] extra (`pip install openhands-sdk[toolshield]`)",
)
```

…and decorated the four tests with `@requires_toolshield`. Result:

- In `sdk-tests` (no toolshield): the four tests SKIP cleanly instead of failing.
- In a job that installs `[toolshield]` (e.g. the toolshield-specific CI lane): they run normally.
- The 49 other tests in the file already exercise the analyzer through mocks and never needed toolshield — unchanged.

## Why this is the right shape

`toolshield` is declared as an OPTIONAL extra in `pyproject.toml`:

```toml
[project.optional-dependencies]
toolshield = ["toolshield>=0.1.3,<0.2"]
```

So tests that depend on it should follow the standard
`importlib.util.find_spec` + `pytest.mark.skipif` pattern for optional
deps, not assume CI installs every extra. The previous code's docstring
even said "Requires the `[toolshield]` extra (installed in CI)" —
but CI was not, in fact, installing it; the docstring's assumption was wrong.
63 changes: 63 additions & 0 deletions .pr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# `.pr/` — live evidence for PR #2911

Following the convention @enyst suggested in
[review comment](https://github.com/OpenHands/software-agent-sdk/pull/2911#issuecomment-4662680235):
artefacts proving a fix works belong under `.pr/`, not just pasted in PR comments.

## What this bundle answers

Both blockers raised on this PR:

1. **@VascoSch92 — "fix the package at the source first"**
([CHATS-lab/ToolShield#4](https://github.com/CHATS-lab/ToolShield/issues/4)).
Fixed and published as `toolshield==0.1.3`. Files `01`–`04` below are
the evidence that 0.1.3 is correct and reproducible from source.

2. **@enyst — "add logs or other artefacts that show it works"**.
That's this directory.

## Files

| file | what it shows |
| ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `pypi-0.1.3.json` | Raw `pypi.org/pypi/toolshield/0.1.3/json` response — canonical record of what was uploaded. |
| `01-pypi-metadata.txt` | Same data, human-readable: version, upload time, filenames, sizes, SHA256s. |
| `02-reproducible-build.txt` | Rebuilt the wheel + sdist locally from `git archive HEAD` on CHATS-lab/ToolShield. SHA256s are **byte-identical** to PyPI — `pip install toolshield==0.1.3` is what's in source-of-truth. |
| `03-wheel-contents.txt` | Full `unzip -l` of the wheel. Confirms `mcp_scan.py`, `experience_store.py`, and the six bundled `claude-sonnet-4.5` experience JSONs all ship. |
| `04-pypi-install-smoke.txt` | Fresh venv → `pip install toolshield==0.1.3` from PyPI → reporter's smoke test, every assertion passes. This is the failure mode from #4 actually exercised. |
| `05-sdk-test-fix.md` | Note explaining the small `tests/sdk/security/test_toolshield_llm_analyzer.py` change in this PR — adds `requires_toolshield` skip marker for the 4 tests that need the optional extra. |

## Commits in this PR addressing the review

| commit | what |
| ------------ | ----------------------------------------------------------------------------- |
| `3c87453` | Pin bump: `toolshield>=0.1.1,<0.2` → `>=0.1.3,<0.2` |
| `dfa5451a` | Skip the 4 toolshield-dependent tests when the extra isn't installed |

(Earlier commits in the PR — `ebc6fcd4` through `b4f92775` — addressed
the two prior rounds of review feedback from @Fieldnote-Echo.)

## How to re-verify locally

```bash
# 1. Confirm toolshield package is fixed
python -m venv /tmp/verify
/tmp/verify/bin/pip install toolshield==0.1.3
/tmp/verify/bin/python -c '
from toolshield import ExperienceStore
from toolshield.mcp_scan import scan_port # missing in 0.1.2
from toolshield.cli import main # toolshield auto entry point
ExperienceStore().load_bundled("filesystem-mcp")
print("OK")
'

# 2. Confirm SDK tests pass (with the [toolshield] extra so the 4 marked
# tests don't skip)
pip install -e "openhands-sdk[toolshield]"
pytest tests/sdk/security/test_toolshield_llm_analyzer.py -v

# 3. Confirm they SKIP cleanly without the extra
pip uninstall -y toolshield
pytest tests/sdk/security/test_toolshield_llm_analyzer.py -v -k "auto_detect or opt_in_to_default_seed"
# Expected: 4 SKIPPED with reason "requires the [toolshield] extra"
```
Loading