The "Deep Audit" Release (Patched some vulnerabilities) #6
Harshit-J004
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The v3.0.0 release gave you the execution firewall. Today, v3.1.0 stress-tests it against the real world.
We ran ToolGuard's automated fuzzer against actual tools from major AI agent ecosystems and validated native integration with all 7 supported frameworks. Then we audited every line of ToolGuard's own codebase and shipped 11 critical stability patches.
1. 🛡️ Real-World Tool Fuzzing
We fuzzed actual, shipping tools from major AI agent ecosystems to prove ToolGuard catches what the frameworks don't.
LangChain —
WikipediaQueryRun(real community tool):We imported the actual
WikipediaQueryRuntool fromlangchain-communityand hit it with 40 hallucinated LLM payloads. LangChain tools accept complexstr | dict | ToolCallunions. When the fuzzer sent invalid types, the native pipeline threw massive unhandled Pydantic tracebacks. ToolGuard'sguard_langchain_toolintercepted 39 of 40 crashes, converting them to cleanSchemaValidationErrormessages the LLM can self-correct from.CrewAI —
ScrapeWebsiteTool(real community tool):We imported the actual
ScrapeWebsiteToolfromcrewai-toolsand hit it with 44 hallucinated payloads. CrewAI unpacks inputs via**kwargsand throws hardcodedValueErrortraps when required fields are missing. ToolGuard'sguard_crewai_toolintercepted all 44 crashes gracefully — zero Python tracebacks escaped.2. 🔌 Native Framework Integration Validation
Beyond real-tool fuzzing, we validated that ToolGuard's adapters work natively with every supported framework's tool interface. These tests confirm that ToolGuard adds the input validation layer that these frameworks intentionally leave to the developer:
• Microsoft AutoGen (
FunctionTool): AutoGen passes raw JSON dicts to user functions with zero validation.guard_autogen_toolcatchesTypeErrorandSchemaValidationErrorbefore they crash your agent loop.• LlamaIndex (
FunctionTool): LlamaIndex relies on Pydantic internally but doesn't gracefully handle hallucinated edge-cases likenullor type-mismatched inputs.guard_llamaindex_toolabsorbs theValidationErrortracebacks.• OpenAI Swarm (
Agent.functions): Swarm exposes plain Python functions with no validation at all.guard_swarm_agentextracts every function, wraps each with Pydantic schema enforcement, and even detected 3 Prompt Injection Vulnerabilities via reflected payload scanning.• FastAPI (Middleware): When FastAPI endpoints are exposed as agent tools, the HTTP 422 safety net disappears.
as_fastapi_toolrestores Pydantic validation at the function level.• AutoGPT (
web_search): AutoGPT blindly trusts LLM inputs.@create_toolcatches null and type-mismatch payloads before they reach the DuckDuckGo scraper.All 7 framework adapters verified. Zero ToolGuard internal crashes.
3. 🛠️ Deep Codebase Audit & 11 Critical Patches
We executed a line-by-line codebase audit and shipped 11 stabilization patches:
• CrewAI Native Extraction Bug: Fixed logic flaw in
guard_crewai_toolwhere non-callable Pydantic subclass instances threw TypeErrors during initialization.• Bare Decorator Support: Fixed DX bug where
@create_tool(without parentheses) caused a runtime crash.• Fuzzer Base Inference:
test_chainnow infers base inputs from wrapped tool signatures whenbase_inputis empty.• Clean Exception Handlers: Stopped raw Python tracebacks from leaking when
self._sig.bind()failed.• LangChain / CrewAI Inheritance Checks: Repaired
NotImplementedErrorbypass logic for legacy_runmethods.• Dynamic Scoring Stability: Fixed
KeyErrorin Console Reporter when tools dynamically mutated names during__init__.• Pruned Tech Debt: Removed duplicated comments, fixed type annotations (
callable→typing.Callable), and dead variable paths.ToolGuard v3.1.0: battle-tested against real community tools, natively integrated with 7 frameworks, and hardened with 11 stability patches. Install the update and guard your agents.
This discussion was created from the release The "Deep Audit" Release (Patched some vulnerabilities).
Beta Was this translation helpful? Give feedback.
All reactions