Skip to content

Feat/agent refine v2#8

Merged
prakhar728 merged 4 commits intomainfrom
feat/agent-refine-v2
Mar 22, 2026
Merged

Feat/agent refine v2#8
prakhar728 merged 4 commits intomainfrom
feat/agent-refine-v2

Conversation

@Par-t
Copy link
Copy Markdown
Collaborator

@Par-t Par-t commented Mar 22, 2026

What this branch is

A refactor of the hackathon_novelty skill pipeline. Three broken/missing pieces from v2 fixed, one new node added. Same graph topology: triage → router → flag/score → finalize.


What changed

Embedding model (deterministic.py)
Swapped all-MiniLM-L6-v2 → all-mpnet-base-v2 (768d). Better semantic similarity quality, making duplicate detection viable at a reasonable threshold.

Duplicate detection (agent.py, config.py, init.py)
Threshold dropped 0.95 → 0.7. Near-duplicate pairs are pre-computed and passed into the triage context explicitly — the triage LLM sees which pairs are flagged and confirms whether they’re truly the same concept. Only the later submission in a pair is classified as duplicate; the earlier proceeds to scoring.

Alignment judgment (agent.py)
aligned is now judged by the triage LLM — it reads each submission’s idea text inline alongside the operator guidelines and outputs true/false per submission. Replaces broken MiniLM cosine similarity to a reference text. relevance_score field removed everywhere; aligned (binary) is the replacement.

Ingestion node (ingest.py — new file)
Agentic node that runs before the deterministic layer. Normalizes submission text from plain text, markdown, and docx. Summarizes anything over 300 words.

Role-based output (config.py, frontend)
USER_OUTPUT_KEYS = {submission_id, novelty_score, aligned} — participants only see these three. Admins see the full set: criteria_scores, status, analysis_depth, duplicate_of.

Guardrails (guardrails.py)
Key whitelist prevents any unlisted fields from reaching API responses. Score bounds clamp out-of-range values. Leakage detection flags any result that contains a substring of raw submission input — tested against prompt injection attempts where adversarial text inside submissions tries to surface itself in outputs.

Par-t added 4 commits March 21, 2026 16:47
…e cleanup

- use idea_text-only embeddings with relevance_score and aligned flag
- expose {submission_id, novelty_score, aligned} via SkillCard.user_output_keys
- decouple routes via card.user_output_keys (no skill-internal imports)
- fix init greeting template and ready confirmation
- add 20 eval submissions and stabilize two-turn eval pipeline
- all 55 tests passing
…e detection

- Swap all-MiniLM-L6-v2 → all-mpnet-base-v2 (768d, better similarity quality)
- Remove compute_relevance_scores() — replaced by LLM-judged aligned (binary)
- Triage node now reads idea text inline, judges aligned (true/false) per submission
- Duplicate detection: near-duplicate pairs (sim > 0.7) surfaced to triage LLM for confirmation
- Only the later submission in a duplicate pair is flagged; safety net prevents all-flagged edge case
- Add nudge retry if triage returns flat format without aligned field
- SIMILARITY_DUPLICATE_THRESHOLD: 0.95 → 0.7
- Remove relevance_score from all outputs, models, guardrails, frontend types
- Add agentic ingest.py (text normalization node)
- Fix SCORE_MODEL: openai/gpt-4o → deepseek-ai/DeepSeek-V3.1
- 57 unit tests + 15 e2e tests pass
@prakhar728 prakhar728 merged commit 9d697a6 into main Mar 22, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants