Persistent project-scoped store for deep research findings, with progressive disclosure and contrarian-pass investigation.
A Claude Code and Codex skill that gives you a persistent, project-scoped store for deep research findings.
Stop re-researching the same topics across sessions. Stop polluting conversation context with raw web search dumps. The skill maintains a structured local knowledge base under <project>/.research/, looks it up before fetching the web, and uses progressive disclosure to load only what's actually needed.
Long Claude Code sessions run out of context. /compact summarizes older turns and drops the rest, so findings from a deep research thread evaporate and the next question re-triggers the same web searches.
This skill makes the data layer outlive the chat. Research written today survives /compact, /clear, IDE restarts, and machine moves. The next session reads INDEX.md first (a tiny dispatcher), matches the topic, and pulls only the matched entry's ## Summary section into context. The full body stays on disk until you actually need it.
Loading tiers, cheapest first:
| Tier | Loads | Approx tokens | When |
|---|---|---|---|
| 1 | INDEX.md |
100 to 500 | Every retrieval |
| 2 | Entry's ## Summary only |
50 to 200 | When INDEX shows a match |
| 3 | Full FINDINGS.md body |
500 to 3000 | When the summary doesn't cover it |
Heavy research artifacts become cheap to recall: you only pay for the tier you need.
- Project-scoped, not global. Each repo has its own research store, kept private (gitignored by default).
- Progressive disclosure. Index, then summary, then full body, in that order. Most lookups never load the full entry.
- Conflict-handling history. When findings change, old claims move to a
## Discarded approachestable with reasons; never silently overwritten. Prevents re-trying refuted approaches. - Subagent-isolated investigation. Heavy web research can run in a separate subagent: Opus 4.7 in Claude Code, or GPT-5.5 xhigh in Codex when subagents are explicitly authorized. Your main context stays clean.
- Async where supported. In Claude Code, the investigation subagent runs in background mode (
run_in_background: true) so the conversation stays interactive while research happens. In Codex, the plugin investigates inline unless the user explicitly authorizes subagents. - Cognitive phases. Decompose, Gather, Validate, Contrarian pass, Synthesize. The contrarian pass actively searches for "why this is wrong" rather than confirming. It earns its keep.
When a Claude Code Investigation subagent finishes, its full structured return (Summary, Findings, contrarian objection, sources) is injected into the main agent's context as a task-notification message. No file round-trip, no tail-the-log polling. The main agent parses the return directly and writes the data layer. In Codex, the same structured shape is used, either from an explicitly authorized subagent or from inline investigation.
Why this matters:
- No raw web-search dump pollution. The main agent only sees the agent's clean synthesized output, never the raw web search results or fetch responses. Those live in a separate transcript file the main agent is forbidden to read.
- Storage is deterministic. The required output format maps 1:1 to the FINDINGS.md schema. Parsing is mechanical, not interpretive.
- Conversation stays interactive. Claude Code uses background mode (
run_in_background: true) for subagent investigation. Codex defaults to inline investigation unless the user explicitly asks for subagents.
Four install routes, all global. No registration, approval, or login required.
npx skills add hec-ovi/research-skill/plugin marketplace add hec-ovi/research-skill
/plugin install research@research-skill
/reload-plugins
This uses Claude Code's built-in marketplace mechanism to install the plugin from the maintainer's GitHub repo. It is not Anthropic's first-party catalog.
codex plugin marketplace add hec-ovi/research-skillThis uses the Codex plugin metadata at .agents/plugins/marketplace.json and plugins/research-codex/.codex-plugin/plugin.json. The Codex plugin lives in its own plugins/research-codex/ root with its own skill copy at plugins/research-codex/skills/research/SKILL.md, so the existing npx skills add and Claude Code plugin paths stay untouched.
Restart Codex after adding the marketplace if the current session does not pick up /research immediately. Then invoke it inside Codex:
/research compare durable local memory patterns for code agents
# Personal (across all your projects)
git clone https://github.com/hec-ovi/research-skill ~/.claude/skills/research
# Or project-only
git clone https://github.com/hec-ovi/research-skill <your-project>/.claude/skills/researchClaude Code picks up direct-cloned skills live, no restart needed.
The skill auto-activates when you ask a research-style question. In Claude Code and the Codex plugin route, you can also invoke it explicitly:
/research <topic>
Examples:
- "What's the latest TypeScript ORM for edge runtime in 2026?"
- "Compare Bun vs Node cold-starts for serverless"
- "/research drizzle-type-generation"
The skill writes to your project, not your home dir:
<project>/.research/
├── INDEX.md # dispatcher: topic table, scanned first
└── <topic-slug>/
├── FINDINGS.md # entry: frontmatter + summary + findings + history
└── raw/ # optional: pasted PDFs, whitepapers, etc.
INDEX.md is the dispatcher, equivalent to RESOLVER.md in the GBrain pattern. The agent reads it first, then loads only the matched entry's ## Summary section. Full entries and raw documents only load on demand.
.research/ is a top-level project directory (sibling of .claude/, not nested inside it). It's colocated with the project, gitignored by default, and easy to find by name. Reads and writes go through the host's normal permission system.
- Plan-stage notes
- Small facts or one-line preferences
- Code-level decisions tied to one file
- Casual lookups answerable from a single source
- A substitute for a single WebSearch / WebFetch
If the question fits in one search plus 1 to 2 sentences, you don't need this skill.
Built explicitly on three open patterns; credit where due.
Frontmatter and folder layout follow the open Agent Skills specification (Apache 2.0 / CC-BY-4.0). Portable across Claude Code and any other code CLI that implements the SKILL.md format.
The Investigation phase walks a 5-step cognitive workflow (Decompose, Gather, Validate, Contrarian pass, Synthesize) adapted from xAI's published Multi-Agent architecture and the DeepSearch announcement. xAI ships 4 specialized agents (Captain, Harper, Benjamin, Lucas) on a shared backbone; this skill condenses those into cognitive phases a single subagent walks, since the Claude Code harness does not currently expose subagent continuation (SendMessage unavailable as of April 2026).
The Contrarian pass (phase 4) is the standout borrowed element: actively searching for "why this is wrong" rather than confirming. In an A/B test on a celebrity-fronted AI tool legitimacy question, the contrarian pass surfaced significant controversy that a minimal-brief baseline missed.
INDEX.md acts as a dispatcher in the same role as RESOLVER.md in GBrain. The INDEX is scanned first; full entries load only on match. Progressive disclosure tiers borrow GBrain's "thin harness, fat skills" philosophy (THIN_HARNESS_FAT_SKILLS.md).
Every entry's FINDINGS.md has structured frontmatter (topic, created, last_verified, status, related, sources, raw) and a body with ## Summary, ## Findings, ## Discarded approaches, ## Open questions, ## Timeline. See SKILL.md for the full schema and rules.
- A code CLI that implements the SKILL.md format (Claude Code, Codex, or any other compatible client)
- For Codex plugin installation: Codex CLI with
codex plugin marketplace add - For Claude Code Investigation: an Opus-class model accessible to the spawning agent
- For Codex Investigation with subagents: GPT-5.5 with
reasoning_effort: "xhigh"
The Investigation phase needs reasoning depth. The skill spawns subagents with model: "opus", but the calling agent has to actually pass that parameter on every spawn. To make it systematic across all your Claude Code sessions, configure your environment to default subagents to Opus.
Two practical approaches:
- Hook (strongest): add a
PreToolUsehook on theAgenttool in~/.claude/settings.jsonthat blocks any spawn whosemodelfield is notopus. The hook runs before the tool dispatches, so a non-opus spawn never reaches the API. - Convention (lightest): add a one-line note to your
~/.claude/CLAUDE.md: "Every Agent tool call MUST passmodel: \"opus\"." Claude reads CLAUDE.md every session.
Smaller models work fine for the main conversation. The contrarian pass and synthesis steps in Investigation specifically depend on Opus-class reasoning depth; smaller models tend to skip the contrarian phase or produce shallow syntheses.
The Codex plugin uses a separate Codex-specific skill file at plugins/research-codex/skills/research/SKILL.md. That copy tells Codex to use model: "gpt-5.5" with reasoning_effort: "xhigh" when the user explicitly authorizes subagents. If subagents are not authorized, the Codex skill runs the same investigation phases inline.
When the skill activates, the full SKILL.md body loads into the main agent's context. As of v0.2.7 the activation cost is approximately 4,500 to 5,500 tokens (depending on tokenizer). The skill registration metadata (frontmatter only, always loaded) is a separate ~130 tokens.
Comparison points:
- Most Vercel
skills.shreference skills sit at 500 to 2,000 tokens - Anthropic reference skills typically run 1,500 to 3,000 tokens
- This skill is roughly double that
The reason is that the Investigation phase is a substantive procedure (5 cognitive phases verbatim, brief checklist, citation rules, required output format, gap handling) and the Storage phase has its own validation rules. The didactic content is real; it earns its keep when Investigation actually runs. But on Retrieval-only calls (the most common path), the agent loads all of it just to get to the loading-hierarchy and lookup-procedure sections.
The clean architectural answer is to apply progressive disclosure recursively, the same pattern the skill already applies to research data. Specifically: split the single SKILL.md into a thin dispatcher plus phase-specific procedure files that load only when their phase is active.
Target structure:
research/
├── SKILL.md # thin dispatcher: when, where, which procedure to load
└── procedures/
├── retrieval.md # loading hierarchy, lookup procedure, INDEX patterns
├── investigation.md # cognitive phases, brief checklist, citation rules, output format
└── storage.md # FINDINGS schema, Review-before-storing, conflict handling
How it would work mechanically:
- The agent activates the skill and reads
SKILL.md(cheap, ~1,500 to 2,000 tokens). SKILL.mdnames which procedure file to load for each phase: "For Retrieval, readprocedures/retrieval.md. For Investigation, readprocedures/investigation.mdAFTER deciding mode in Retrieval phase 5. For Storage, readprocedures/storage.mdafter Investigation returns."- The agent uses the
Readtool to pull only the procedure file relevant to the current phase. A pure Retrieval call (read INDEX, sed Summary, answer) never touchesinvestigation.mdand never pays for it.
This is the same "thin harness, fat skills" pattern GBrain uses (RESOLVER.md as the dispatcher, individual skill files loaded on demand). Applied here, it's a natural fit because the three phases (Retrieval, Investigation, Storage) are already cleanly separated in the workflow, and the heaviest procedure (Investigation) is also the least-frequent path. Most calls are Retrieval-only.
Expected post-refactor footprint:
SKILL.md(always loaded on activation): ~1,500 to 2,000 tokensprocedures/retrieval.md(loaded on every research-style question): ~800 to 1,200 tokensprocedures/investigation.md(loaded only when fresh research is needed): ~1,500 to 2,000 tokensprocedures/storage.md(loaded only when actually writing): ~800 to 1,200 tokens
A typical Retrieval-only call would pay ~2,500 to 3,200 tokens (SKILL.md + retrieval.md), down from the current ~5,000. An Investigation call would pay roughly the same as today (all phases involved), but at least the cost would be honest: you pay for what you use.
The skill is working, the size is heavy but not blocking, and there has been no demand from users yet. The repo just launched on 2026-04-25; the only confirmed user is the maintainer, and the maintainer has not hit context-budget pressure on this skill in real workflows. The refactor is a real restructure: probably half a day of careful editing, plus end-to-end testing on every phase (Retrieval new entry, Retrieval merge, Investigation new entry mode, Investigation merge mode, Storage paste path), plus updating the duplicated skill paths and plugin descriptors so subdirectory loading is honored, plus rewriting CHANGELOG and bumping to a minor version (likely v0.3.0).
The refactor will be triggered when any of these signals lands:
- A user reports the activation cost as a real friction in their context budget
- A new feature pushes
SKILL.mdpast 6,000 tokens - The procedure content grows organically to the point where the dispatcher spine already feels redundant
- Someone files an issue or PR proposing the split
Until one of those triggers, the single-file structure is the right call: everything is in one place, the file is readable end-to-end, and the load-bearing optimization (progressive disclosure of the actual research data via INDEX, then Summary, then full body) already works as designed. Optimizing the SKILL.md itself before there is felt friction is premature engineering.
If you want to discuss the refactor or volunteer feedback on activation cost in your own usage, open an issue at github.com/hec-ovi/research-skill/issues.
MIT. Free to use, modify, fork, distribute. Attribution appreciated, not required.