feat: confidential data procurement skill — full pipeline with live E2E tests#10
Merged
feat: confidential data procurement skill — full pipeline with live E2E tests#10
Conversation
BuyerPolicy, SupplierSubmission, DatasetMetrics, ProcurementResult with Pydantic validators. Output key sets enforce budget leak prevention (quality_score buyer-only). Score weights and base_price validated on init. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ingest.py: CSV parse with size/row limits, JSON metadata, format stubs (PDF/DOCX/Excel), in-memory DataFrame store, procurement_upload_handler - core/skill_card.py: upload_handler + respond_handler optional fields - api/routes.py: generic POST /upload and POST /respond routes that delegate to skill-owned handlers — no skill-specific logic in shared infra - All 57 existing tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l check Implements the pure-math quality evaluation pipeline for the dataset procurement skill: null/duplicate/label metrics, component scoring, weighted quality score, price formula (P = base + spread * S), and deal condition (R ≤ P ≤ B). 38 unit tests cover all paths including critical failures, partial quality, and edge cases.
ProcurementFilter withholds quality_score and hard_constraints_pass from the supplier role to prevent max_budget reverse-engineering (P/S = budget). validate_tool_output blocks raw row dumps, high-cardinality lists, and oversized blobs before they reach the agent. 15 new tests, 53 total.
…onversation Guided multi-turn conversation collects required columns, row/quality thresholds, budget range, and optional label/forbidden-column constraints. LLM returns JSON when policy is complete; handler validates and constructs BuyerPolicy. 13 new tests (5 parse, 8 handler), 66 total.
Wires up the full pipeline: deterministic evaluation → guardrails → SkillResponse. Agent layer stubbed (schema=0.5, claim_veracity=1.0 placeholders until Commit 7). Registers the new skill in routes.py alongside hackathon_novelty. SkillCard declares instant trigger mode, role-aware output keys, upload_handler, and user_display hints.
Single evaluate_node uses 3 aggregate-only tools (schema_summary, column_stats, value_distribution) to do fuzzy column matching and seller claim verification. Agent output (schema_score, claim_veracity_score) replaces deterministic placeholders, quality_score and proposed_payment are recomputed before guardrails.
procurement_respond_handler implements the full resolution matrix: both-accept issues release_token, any-reject terminates, accept+renegotiate resolves at proposed_payment, double-renegotiate checks if revised_budget >= revised_reserve. One round enforced via renegotiation_used flag. Wired into skill_card.
…ation matrix Covers happy path (init→upload→submit→accept→authorized), critical reject, role-filtered results (buyer vs supplier), double-renegotiate success/failure, mixed accept+renegotiate, second-renegotiation guard, token enforcement, and skill registration. Requires python-multipart for multipart form upload.
- agent.py: wrap evaluate_node in StateGraph for LangSmith trace visibility - tests/conftest.py: @pytest.mark.live marker, base_df fixture, matrix printer + demo JSON output - tests/test_live_integration.py: 30 tests — deterministic, agent, pipeline, renegotiation - base_price=0 on all demo policies; critical failures and reserve-not-met → $0 or rejected - .gitignore: exclude tests/fixtures/ and tests/demo_matrix.json - ci.yml: add skills/dataset-procurement branch; requirements.txt: add python-multipart
…mpts, renegotiation matrix
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the confidential data procurement skill end-to-end inside a simulated TEE.
What it does:
BuyerPolicyP = base_price + (max_budget - base_price) × quality_score— neither party sees the other's private numberTests:
@pytest.mark.live) run against real HuggingFace transaction data with a real LLM — skipped in CI, run locally withsource .envtests/demo_matrix.jsonincluded — shows actual pipeline output across 3 evaluation scenarios and 2 renegotiation outcomes