Binary evidence-sufficiency dissociation in reasoning-model hidden states for fixed-question, changed-context multi-hop QA.
-
Updated
Apr 18, 2026 - Python
Binary evidence-sufficiency dissociation in reasoning-model hidden states for fixed-question, changed-context multi-hop QA.
Python library for evidence sufficiency scoring in governance assessments under delayed ground truth, drift, and decision-readiness constraints.
Benchmark dataset and evaluation harness for comparing governance evidence feasibility across rule-based, hybrid ML, streaming, and agentic AI decision systems.
Python toolkit for label-free monitoring of governance evidence degradation in delayed-label risk decision systems using proxy drift monitors and response chains.
DEMM-Bench: Decision Evidence Maturity Benchmark for agent-runtime decisions across eight evidence regimes. Accompanies a research paper.
Add a description, image, and links to the evidence-sufficiency topic page so that developers can more easily learn about it.
To associate your repository with the evidence-sufficiency topic, visit your repo's landing page and select "manage topics."