Feat/agent refine v2 by Par-t · Pull Request #8 · prakhar728/conclave

Par-t · 2026-03-22T16:52:43Z

What this branch is

A refactor of the hackathon_novelty skill pipeline. Three broken/missing pieces from v2 fixed, one new node added. Same graph topology: triage → router → flag/score → finalize.

What changed

Embedding model (deterministic.py)
Swapped all-MiniLM-L6-v2 → all-mpnet-base-v2 (768d). Better semantic similarity quality, making duplicate detection viable at a reasonable threshold.

Duplicate detection (agent.py, config.py, init.py)
Threshold dropped 0.95 → 0.7. Near-duplicate pairs are pre-computed and passed into the triage context explicitly — the triage LLM sees which pairs are flagged and confirms whether they’re truly the same concept. Only the later submission in a pair is classified as duplicate; the earlier proceeds to scoring.

Alignment judgment (agent.py)
aligned is now judged by the triage LLM — it reads each submission’s idea text inline alongside the operator guidelines and outputs true/false per submission. Replaces broken MiniLM cosine similarity to a reference text. relevance_score field removed everywhere; aligned (binary) is the replacement.

Ingestion node (ingest.py — new file)
Agentic node that runs before the deterministic layer. Normalizes submission text from plain text, markdown, and docx. Summarizes anything over 300 words.

Role-based output (config.py, frontend)
USER_OUTPUT_KEYS = {submission_id, novelty_score, aligned} — participants only see these three. Admins see the full set: criteria_scores, status, analysis_depth, duplicate_of.

Guardrails (guardrails.py)
Key whitelist prevents any unlisted fields from reaching API responses. Score bounds clamp out-of-range values. Leakage detection flags any result that contains a substring of raw submission input — tested against prompt injection attempts where adversarial text inside submissions tries to surface itself in outputs.

…ze uses Qwen3.5

…e cleanup - use idea_text-only embeddings with relevance_score and aligned flag - expose {submission_id, novelty_score, aligned} via SkillCard.user_output_keys - decouple routes via card.user_output_keys (no skill-internal imports) - fix init greeting template and ready confirmation - add 20 eval submissions and stabilize two-turn eval pipeline - all 55 tests passing

…e detection - Swap all-MiniLM-L6-v2 → all-mpnet-base-v2 (768d, better similarity quality) - Remove compute_relevance_scores() — replaced by LLM-judged aligned (binary) - Triage node now reads idea text inline, judges aligned (true/false) per submission - Duplicate detection: near-duplicate pairs (sim > 0.7) surfaced to triage LLM for confirmation - Only the later submission in a duplicate pair is flagged; safety net prevents all-flagged edge case - Add nudge retry if triage returns flat format without aligned field - SIMILARITY_DUPLICATE_THRESHOLD: 0.95 → 0.7 - Remove relevance_score from all outputs, models, guardrails, frontend types - Add agentic ingest.py (text normalization node) - Fix SCORE_MODEL: openai/gpt-4o → deepseek-ai/DeepSeek-V3.1 - 57 unit tests + 15 e2e tests pass

Par-t added 4 commits March 21, 2026 16:47

feat: per-node NearAI model routing — triage/quick use GPT-OSS, analy…

f8c5429

…ze uses Qwen3.5

feat: fix flat scoring — robust parser, empty content nudge, rubric v3

b510595

prakhar728 merged commit 9d697a6 into main Mar 22, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/agent refine v2#8

Feat/agent refine v2#8
prakhar728 merged 4 commits intomainfrom
feat/agent-refine-v2

Par-t commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Par-t commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants