feat: confidential data procurement skill — full pipeline with live E2E tests by Par-t · Pull Request #10 · prakhar728/conclave

Par-t · 2026-03-23T00:05:04Z

Implements the confidential data procurement skill end-to-end inside a simulated TEE.

What it does:

Buyer describes dataset requirements in natural language → LLM extracts a BuyerPolicy
Seller uploads a CSV + metadata with column definitions and claims
Enclave runs deterministic checks (critical rejections: forbidden columns, >50% duplicates) then an LLM agent (schema matching, claim verification, quality scoring)
Payment formula: P = base_price + (max_budget - base_price) × quality_score — neither party sees the other's private number
One round of renegotiation allowed: both accept / one accepts one renegotiates / both renegotiate (overlap or no overlap)
Role-aware output: buyer sees quality score, supplier doesn't (prevents budget reverse-engineering)

Tests:

167 unit + E2E tests run in CI (no API key needed)
8 live tests (@pytest.mark.live) run against real HuggingFace transaction data with a real LLM — skipped in CI, run locally with source .env
tests/demo_matrix.json included — shows actual pipeline output across 3 evaluation scenarios and 2 renegotiation outcomes

BuyerPolicy, SupplierSubmission, DatasetMetrics, ProcurementResult with Pydantic validators. Output key sets enforce budget leak prevention (quality_score buyer-only). Score weights and base_price validated on init. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ement

- ingest.py: CSV parse with size/row limits, JSON metadata, format stubs (PDF/DOCX/Excel), in-memory DataFrame store, procurement_upload_handler - core/skill_card.py: upload_handler + respond_handler optional fields - api/routes.py: generic POST /upload and POST /respond routes that delegate to skill-owned handlers — no skill-specific logic in shared infra - All 57 existing tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…l check Implements the pure-math quality evaluation pipeline for the dataset procurement skill: null/duplicate/label metrics, component scoring, weighted quality score, price formula (P = base + spread * S), and deal condition (R ≤ P ≤ B). 38 unit tests cover all paths including critical failures, partial quality, and edge cases.

ProcurementFilter withholds quality_score and hard_constraints_pass from the supplier role to prevent max_budget reverse-engineering (P/S = budget). validate_tool_output blocks raw row dumps, high-cardinality lists, and oversized blobs before they reach the agent. 15 new tests, 53 total.

…onversation Guided multi-turn conversation collects required columns, row/quality thresholds, budget range, and optional label/forbidden-column constraints. LLM returns JSON when policy is complete; handler validates and constructs BuyerPolicy. 13 new tests (5 parse, 8 handler), 66 total.

Wires up the full pipeline: deterministic evaluation → guardrails → SkillResponse. Agent layer stubbed (schema=0.5, claim_veracity=1.0 placeholders until Commit 7). Registers the new skill in routes.py alongside hackathon_novelty. SkillCard declares instant trigger mode, role-aware output keys, upload_handler, and user_display hints.

Single evaluate_node uses 3 aggregate-only tools (schema_summary, column_stats, value_distribution) to do fuzzy column matching and seller claim verification. Agent output (schema_score, claim_veracity_score) replaces deterministic placeholders, quality_score and proposed_payment are recomputed before guardrails.

procurement_respond_handler implements the full resolution matrix: both-accept issues release_token, any-reject terminates, accept+renegotiate resolves at proposed_payment, double-renegotiate checks if revised_budget >= revised_reserve. One round enforced via renegotiation_used flag. Wired into skill_card.

…ation matrix Covers happy path (init→upload→submit→accept→authorized), critical reject, role-filtered results (buyer vs supplier), double-renegotiate success/failure, mixed accept+renegotiate, second-renegotiation guard, token enforcement, and skill registration. Requires python-multipart for multipart form upload.

- agent.py: wrap evaluate_node in StateGraph for LangSmith trace visibility - tests/conftest.py: @pytest.mark.live marker, base_df fixture, matrix printer + demo JSON output - tests/test_live_integration.py: 30 tests — deterministic, agent, pipeline, renegotiation - base_price=0 on all demo policies; critical failures and reserve-not-met → $0 or rejected - .gitignore: exclude tests/fixtures/ and tests/demo_matrix.json - ci.yml: add skills/dataset-procurement branch; requirements.txt: add python-multipart

…mpts, renegotiation matrix

…hout API key

vercel · 2026-03-23T00:05:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
conclave	Ready	Preview, Comment	Mar 23, 2026 0:05am

Par-t and others added 14 commits March 22, 2026 14:56

Merge remote-tracking branch 'origin/main' into skills/dataset-procur…

5d0691f

…ement

chore: untrack test fixtures (gitignored)

55b37d9

feat: endpoint-driven live E2E tests with real data, buyer/seller pro…

c1fd634

…mpts, renegotiation matrix

fix: mark run_skill tests that call real LLM as live — skip in CI wit…

513575c

…hout API key

Par-t merged commit eb5c7ae into main Mar 23, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: confidential data procurement skill — full pipeline with live E2E tests#10

feat: confidential data procurement skill — full pipeline with live E2E tests#10
Par-t merged 14 commits intomainfrom
skills/dataset-procurement

Par-t commented Mar 23, 2026

Uh oh!

vercel Bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Par-t commented Mar 23, 2026

Uh oh!

vercel Bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant