Skip to content

euanai/novum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Novum

From Bacon's Novum Organum (1620) — the "new instrument" that codified the scientific method.

Automated ML Research with Mechanical Constraint Enforcement and Structured Iteration

A Claude Code extension that runs the full research pipeline — literature survey, SOTA discovery, hypothesis generation, experiment execution, and paper draft writing — with code-level guards designed to prevent result fabrication.

License Python 3.10+ Node.js 18+

Case Study

A single /research command ran autonomously for 30 hours — literature survey, hypothesis generation, experiment execution, and paper draft writing — with no human intervention.

Novum running for 30 hours

Metric Value
Duration 30 hours (1 day 6 hours)
GPU used 14.3h of 40h budget
Hypotheses tested 10 (7 failed, 3 competed, 1 champion)
Iteration cycles 4 (automatic regression + constraint accumulation)

The draft and results are preliminary and have not been independently validated; please treat them as a starting point and verify with your own runs.

Quick Start

System dependencies (Ubuntu/Debian):

sudo apt install -y poppler-utils git-lfs curl wget
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/euanai/novum.git
cd novum
bash scripts/install.sh

This installs hooks, agents, commands, skills, and rules into ~/.claude/.

Requirements: Claude Code CLI (Opus 4.6 recommended, Max 20x plan for long runs), Python 3.10+, Node.js 18+, NVIDIA GPU 8GB+ VRAM, pdftotext (from poppler-utils).

Warning

Fully autonomous runs require Claude Code's --dangerously-skip-permissions flag, which bypasses all tool approval prompts. The agent can execute arbitrary commands and modify files without confirmation. Use at your own risk.

# Scout: find low-cost opportunities from a conference
/research --scout "CVPR 2025" --budget=8h

# Full pipeline: literature → experiments → paper draft
/research "efficient visual language model inference"

# Survey only (Phase 1-3.5, no experiments)
/research "parameter-efficient fine-tuning" --depth=survey

# Resume interrupted run
/research --resume

# Check progress
/research --status

Why Novum Exists

AI research agents fabricate results. MLR-Bench (2025) found that AI agents fabricate ~80% of experimental results. Existing tools rely on prompt-level instructions ("please don't fabricate") which LLMs often ignore under pressure — especially after consecutive failures.

Novum enforces constraints mechanically:

Layer Reliability How it works
Hook (mechanical) Deterministic PreToolUse hook denies the tool call before execution
Prompt (instructional) Variable "Never fabricate results" in agent prompt
LLM (behavioral) Low Hope the model complies on its own

If a constraint matters, enforce it with a hook — not a prompt.

What Novum Does

  • Anti-Fabrication Guard — Blocks Write/Edit to protected results files at the API level before execution.
  • Phase Gate Guard — Prevents phase advancement unless prerequisites exist and pass quality thresholds.
  • Hypothesis Tournament — Tests multiple ideas via Successive Halving (15%→30%→55% budget), not just the first one.
  • Automatic Iteration — 3-level failure loops with 5-Whys diagnosis. Constraints accumulate across cycles.
  • Conference Scanner — Scan conference proceedings, keyword-filter candidates, PDF-analyze top opportunities.
  • Cross-Project Knowledge Base — Learned constraints persist across projects to reduce repeated failures.
  • Independent Audit — Post-run reviewer scores the draft and can trigger automatic regression if below venue threshold.

Pipeline

Phase What happens
0 Scout — Conference scanning for low-cost research opportunities
1 Literature — Systematic review via Semantic Scholar + OpenReview APIs
2 SOTA — Codebase discovery, ranking, freshness check, smoke tests
2.5 Profile — Run SOTA code to build quantitative intuition
3 Ideas — Hypothesis generation (dual-track explore/exploit)
3.5 Quick Validation — Fast signal detection before full investment
4 Design — Experiment plan for all hypotheses + tournament budget
5 Baseline — Environment setup + baseline reproduction
6 Experiments — Tournament-based hypothesis testing
7 Analysis — Results analysis + publishability verdict
8 Draft Writing — Paper draft with experiment gap discovery

When experiments fail, the pipeline automatically regresses: small loop (tune hyperparameters) → medium loop (redesign experiments) → big loop (regenerate hypotheses with 5-Whys constraints).

Architecture

                     User: /research "topic"
                              |
                              v
                +--------------------------+
                |      Master Agent        |
                |  (commands/research.md)  |
                |                          |
                |  Orchestrates, reviews,  |
                |  iterates, diagnoses     |
                +----------+---------------+
                           |
              Task tool    |   dispatch
         +---------+-------+--------+-----------+
         |         |       |        |           |
         v         v       v        v           v
     sota-     env-    experiment- opportunity- pipeline-
     finder    setup   runner     scorer       reviewer
     (opus)   (sonnet) (opus)    (sonnet)     (opus)

=========================================================
              4 PreToolUse Hooks (Mechanical)
=========================================================
 research-guard.js       Anti-fabrication file protection
 phase-gate-guard.js     Phase transition prerequisites
 prompt-quality-guard.js Worker dispatch validation
 download-guard.js       Network/proxy safety

Related Projects

  • AI Scientist v2 / v1 — End-to-end paper generation via agentic tree search (Sakana AI)
  • AI-Researcher — Multi-agent research automation with Gradio GUI (HKU)
  • Agent Laboratory — LLM research assistant with optional human-in-the-loop (JHU)
  • PaperQA2 — RAG-based scientific literature QA (Future House)
  • GPT Researcher — Web research report generation
  • Coscientist — Chemistry lab automation via robotic APIs (CMU, Nature 2023)
  • Claude Scholar — Claude Code skills, hooks, and commands for academic workflows (writing, reviewing, coding)

Novum's focus is on making agentic research reliable rather than merely capable — through mechanical constraint enforcement and structured iteration protocols.

Parameters

Parameter Description Default
--depth survey (Phase 1-3.5), reproduce (Phase 1-5), or full (Phase 1-8) full
--budget GPU time budget (e.g., 8h, 24h) 8h
--target Quality threshold for publishability verdict: oral, poster, workshop poster
--budget-split Tournament round allocation (e.g., 15,30,55) 15,30,55
--explore-ratio Fraction of hypotheses from EXPLORE track 0.3
--venue Conference to calibrate review standards (e.g., CVPR, NeurIPS) CVPR
--resume Resume from last checkpoint
--status Show current progress
--review Run pipeline-reviewer audit on completed run

Project Structure

novum/
├── commands/research.md         # Master Agent prompt (~2800 lines)
├── agents/                      # 5 Worker Agents
│   ├── sota-finder.md           #   SOTA codebase discovery (Opus)
│   ├── env-setup.md             #   Environment setup (Sonnet)
│   ├── experiment-runner.md     #   Experiment execution (Opus)
│   ├── opportunity-scorer.md    #   Conference paper scorer (Sonnet)
│   └── pipeline-reviewer.md     #   Post-run auditor (Opus)
├── hooks/                       # 4 Mechanical Guards + 1 Logger
│   ├── research-guard.js        #   Anti-fabrication enforcement
│   ├── phase-gate-guard.js      #   Phase transition prerequisites
│   ├── prompt-quality-guard.js  #   Dispatch prompt validation
│   ├── download-guard.js        #   Network/proxy safety
│   └── structured-logger.js     #   Async event logging
├── scripts/
│   ├── install.sh               #   Installer
│   └── lib/research_utils.py    #   State, API clients, utilities
├── skills/                      # Domain knowledge + methodology
├── rules/research-agents.md     #   Agent orchestration rules
├── config.example.yaml          #   Configuration template
├── CONTRIBUTING.md
└── LICENSE                      # Apache-2.0

Known Limitations

  1. Single node — Tested on single-GPU setups; multi-GPU on one machine should work but is untested; multi-node distributed training is not supported
  2. CV keywords only — NLP/systems/theory domains need new keyword JSON files (see CONTRIBUTING.md)
  3. Unix-only — Uses os.killpg, signal.SIGTERM, nohup
  4. No paper cache cleanuppaper-cache/txt/ grows over time
  5. Anti-fabrication by filename — Custom output filenames need to be added to the guard

Note

Novum was developed and tested with limited compute and bandwidth. If you have more compute, more time, or better ideas — fork it, improve it, and share what you find. This is an early step toward a future where a single /research command democratizes scientific discovery. We're not there yet, but the direction is clear.

Position Paper

Researching the research system is among the highest-leverage intellectual activities of our time.

📄 Democratizing Discovery: How Automated Research Pipelines Make Scientific Innovation Universally Accessible

Vision

Anyone should be able to run a full research pipeline — without owning a GPU or configuring an environment — to solve the problems they actually face.

Join the waitlist →

Contributing

See CONTRIBUTING.md for how to add new agents, hooks, and domain keywords.

Contact

euanai@proton.me

License

Apache-2.0. See LICENSE for details.

Citation

If you find Novum useful:

@software{novum2026,
  author={Euan},
  title={Novum: Automated ML Research with Mechanical Constraint Enforcement and Structured Iteration},
  year={2026},
  url={https://github.com/euanai/novum}
}

If you find the paper interesting:

@article{euan2026democratizing,
  author={Euan},
  title={Democratizing Discovery: How Automated Research Pipelines Make Scientific Innovation Universally Accessible},
  year={2026},
  doi={10.5281/zenodo.18848462}
}

About

Autonomous ML research loops for Claude Code with mechanical anti-fabrication guards.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors