Skip to content

feat: confidential data procurement skill — full pipeline with live E2E tests#10

Merged
Par-t merged 14 commits intomainfrom
skills/dataset-procurement
Mar 23, 2026
Merged

feat: confidential data procurement skill — full pipeline with live E2E tests#10
Par-t merged 14 commits intomainfrom
skills/dataset-procurement

Conversation

@Par-t
Copy link
Copy Markdown
Collaborator

@Par-t Par-t commented Mar 23, 2026

Implements the confidential data procurement skill end-to-end inside a simulated TEE.

What it does:

  • Buyer describes dataset requirements in natural language → LLM extracts a BuyerPolicy
  • Seller uploads a CSV + metadata with column definitions and claims
  • Enclave runs deterministic checks (critical rejections: forbidden columns, >50% duplicates) then an LLM agent (schema matching, claim verification, quality scoring)
  • Payment formula: P = base_price + (max_budget - base_price) × quality_score — neither party sees the other's private number
  • One round of renegotiation allowed: both accept / one accepts one renegotiates / both renegotiate (overlap or no overlap)
  • Role-aware output: buyer sees quality score, supplier doesn't (prevents budget reverse-engineering)

Tests:

  • 167 unit + E2E tests run in CI (no API key needed)
  • 8 live tests (@pytest.mark.live) run against real HuggingFace transaction data with a real LLM — skipped in CI, run locally with source .env
  • tests/demo_matrix.json included — shows actual pipeline output across 3 evaluation scenarios and 2 renegotiation outcomes

Par-t and others added 14 commits March 22, 2026 14:56
BuyerPolicy, SupplierSubmission, DatasetMetrics, ProcurementResult with
Pydantic validators. Output key sets enforce budget leak prevention
(quality_score buyer-only). Score weights and base_price validated on init.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ingest.py: CSV parse with size/row limits, JSON metadata, format stubs
  (PDF/DOCX/Excel), in-memory DataFrame store, procurement_upload_handler
- core/skill_card.py: upload_handler + respond_handler optional fields
- api/routes.py: generic POST /upload and POST /respond routes that
  delegate to skill-owned handlers — no skill-specific logic in shared infra
- All 57 existing tests pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l check

Implements the pure-math quality evaluation pipeline for the dataset
procurement skill: null/duplicate/label metrics, component scoring,
weighted quality score, price formula (P = base + spread * S), and
deal condition (R ≤ P ≤ B). 38 unit tests cover all paths including
critical failures, partial quality, and edge cases.
ProcurementFilter withholds quality_score and hard_constraints_pass from
the supplier role to prevent max_budget reverse-engineering (P/S = budget).
validate_tool_output blocks raw row dumps, high-cardinality lists, and
oversized blobs before they reach the agent. 15 new tests, 53 total.
…onversation

Guided multi-turn conversation collects required columns, row/quality thresholds,
budget range, and optional label/forbidden-column constraints. LLM returns JSON
when policy is complete; handler validates and constructs BuyerPolicy. 13 new
tests (5 parse, 8 handler), 66 total.
Wires up the full pipeline: deterministic evaluation → guardrails → SkillResponse.
Agent layer stubbed (schema=0.5, claim_veracity=1.0 placeholders until Commit 7).
Registers the new skill in routes.py alongside hackathon_novelty. SkillCard
declares instant trigger mode, role-aware output keys, upload_handler, and
user_display hints.
Single evaluate_node uses 3 aggregate-only tools (schema_summary, column_stats,
value_distribution) to do fuzzy column matching and seller claim verification.
Agent output (schema_score, claim_veracity_score) replaces deterministic placeholders,
quality_score and proposed_payment are recomputed before guardrails.
procurement_respond_handler implements the full resolution matrix: both-accept
issues release_token, any-reject terminates, accept+renegotiate resolves at
proposed_payment, double-renegotiate checks if revised_budget >= revised_reserve.
One round enforced via renegotiation_used flag. Wired into skill_card.
…ation matrix

Covers happy path (init→upload→submit→accept→authorized), critical reject,
role-filtered results (buyer vs supplier), double-renegotiate success/failure,
mixed accept+renegotiate, second-renegotiation guard, token enforcement,
and skill registration. Requires python-multipart for multipart form upload.
- agent.py: wrap evaluate_node in StateGraph for LangSmith trace visibility
- tests/conftest.py: @pytest.mark.live marker, base_df fixture, matrix printer + demo JSON output
- tests/test_live_integration.py: 30 tests — deterministic, agent, pipeline, renegotiation
- base_price=0 on all demo policies; critical failures and reserve-not-met → $0 or rejected
- .gitignore: exclude tests/fixtures/ and tests/demo_matrix.json
- ci.yml: add skills/dataset-procurement branch; requirements.txt: add python-multipart
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
conclave Ready Ready Preview, Comment Mar 23, 2026 0:05am

@Par-t Par-t merged commit eb5c7ae into main Mar 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant