A source-backed personal research wiki built with Codex + Obsidian.
Turn papers, PDFs, web captures, and research questions into a durable Markdown knowledge base instead of letting them disappear into temporary chats.
Most AI-assisted reading workflows have the same failure mode: useful ideas are generated, but they remain trapped inside ephemeral conversations.
This project is designed to make research knowledge:
- source-backed — raw material stays separate from synthesis;
- maintainable — topic and entity pages get updated over time instead of duplicated;
- queryable — questions can be answered from accumulated notes;
- auditable — operational updates are tracked in
index.mdandlog.md; - portable — everything lives in plain Markdown inside an Obsidian-friendly vault.
This repository is especially useful for people who:
- read papers regularly;
- use Obsidian for long-term note organization;
- want AI to help maintain knowledge structures, not just generate one-off summaries;
- care about evidence boundaries and reusable synthesis.
# 1. Put a new source into inbox/
# 2. Optionally normalize the filename
python scripts/new_source.py inbox/some-paper.pdf --move
# 3. Ask Codex to ingest it and update the vault
# 4. Run a structural check when needed
python scripts/lint_wiki.pyA typical end-to-end loop looks like this:
- Drop a source into
inbox/
Example: a new paper PDF or an abstract snapshot. - Normalize and import the source
Usescripts/new_source.pyif you want predictable filenames insideraw/sources/. - Ask Codex to ingest it
Codex creates onesource-note, updates relatedtopic/entity/overviewpages, then refreshesindex.mdandlog.md. - Ask questions against the vault
Instead of re-reading everything from scratch, query the accumulated wiki. - Promote durable answers into
wiki/analyses/
If a comparison, synthesis, or conclusion is likely to matter again, store it as a permanent analysis page. - Run lint to keep the vault healthy
Usescripts/lint_wiki.pyto catch missing metadata, broken links, and structural issues.
In short:
inbox/ -> raw/sources/ -> source-note -> topic/entity pages -> analysis -> ongoing maintenance
This repository separates:
- raw source material;
- agent-maintained wiki pages;
- operational logs, templates, and navigation pages.
The result is a personal knowledge base that can:
- absorb new papers incrementally;
- preserve evidence boundaries;
- support question answering from accumulated notes;
- keep long-term topic pages updated instead of rewriting everything from scratch;
- remain readable and editable in plain Markdown.
The workflow is built on three layers:
- Sources — immutable imported material such as PDFs, Markdown exports, notes, or HTML snapshots.
- Wiki — curated pages that distill sources into topic pages, entity pages, source notes, and durable analyses.
- Operations — index, log, templates, and scripts that make the system maintainable over time.
Codex acts as the maintenance layer:
- ingesting new material,
- updating affected pages,
- keeping links and metadata consistent,
- answering questions from the wiki,
- and writing durable analysis pages when useful.
Obsidian acts as the human-facing layer:
- browsing,
- linking,
- reviewing edits,
- and maintaining a navigable long-term knowledge graph.
.
├── AGENTS.md # Operational contract for Codex / agent behavior
├── README.md # English project overview
├── README.zh-CN.md # Chinese companion README
├── index.md # Primary navigation page
├── log.md # Append-only ingest / query / lint / refactor log
├── inbox/ # Drop zone for newly added materials
├── raw/
│ ├── sources/ # Imported source files (treated as immutable)
│ └── assets/ # Local images / attachments tied to sources
├── scripts/
│ ├── new_source.py # Source normalization helper
│ └── lint_wiki.py # Structural wiki checks
├── templates/ # Templates for source-note / topic / entity / analysis / glossary
└── wiki/
├── overview/ # Research scope and thesis pages
├── topics/ # Topic pages
├── entities/ # Methods / datasets / people / organizations / artifacts
├── source-notes/ # Per-source evidence pages
├── analyses/ # Durable question-driven synthesis pages
└── glossaries/ # Terminology pages
Raw files are the source of truth. Synthesis pages should cite source-note pages rather than citing raw files directly.
If a page is based only on an abstract, metadata snapshot, or partial reading, it should say so explicitly.
The goal is to accumulate knowledge into stable topic/entity pages, not to create a new disconnected note for every interaction.
If a question produces a useful answer that will matter again, it should be written into wiki/analyses/ instead of living only in chat history.
Everything should remain inspectable in plain text, git-friendly, and Obsidian-compatible.
Put a new file into inbox/.
Examples:
- a paper PDF;
- an arXiv abstract snapshot;
- a manually saved web page;
- a Markdown export from another tool.
Use the helper script if needed:
python scripts/new_source.py inbox/some-paper.pdf --moveThis suggests a normalized filename and, with --move, moves the file into raw/sources/.
A typical ingest task should:
- create exactly one
source-note; - update the relevant
topic,entity, andoverviewpages; - update
index.md; - append an ingest entry to
log.md.
Examples:
- What is the core claim of this paper?
- How does this method differ from prior work already in the vault?
- What open questions remain across the current topic pages?
The preferred behavior is to answer from the wiki first, then expand into source notes as needed.
python scripts/lint_wiki.pyThis checks for:
- missing frontmatter keys,
- missing section headings,
- obvious missing citations,
- broken wikilinks,
- orphan pages with no inbound links.
Purpose:
- derive a normalized title and slug;
- suggest the final file path in
raw/sources/; - suggest the corresponding
source-notepath; - optionally move the file into place.
Purpose:
- verify minimum frontmatter requirements;
- catch broken or missing internal links;
- identify pages that look uncited;
- detect orphan pages.
It is intentionally lightweight. It does not replace human review.
The vault distinguishes several page types:
- overview — high-level scope and thesis pages;
- topic — conceptual or problem-domain pages;
- entity — concrete methods, models, datasets, authors, or organizations;
- source-note — evidence pages tied to a single source;
- analysis — durable question-driven synthesis pages;
- glossary — terminology pages.
Each wiki page is expected to carry YAML frontmatter and follow a standard section structure. See AGENTS.md for the full contract.
Examples of useful prompts when working with Codex:
Please ingest the new source in inbox/ and update the affected pages.Answer this question from the wiki first, then inspect source-notes only if needed.Turn this answer into a durable analysis page.Run a lint-style review and tell me which pages are too thin or weakly supported.Check whether this topic already has a canonical page before creating a new one.
This repository is optimized for knowledge maintenance, not just knowledge capture.
That means it emphasizes:
- stable canonical pages,
- explicit source provenance,
- append-only operational history,
- agent-readable rules,
- and long-term reuse of prior synthesis.
In short: this is closer to a personal research wiki or external memory system than a notebook.
The repository already contains a small but real research corpus with:
- topic pages,
- entity pages,
- source notes,
- a research overview,
- an operational log,
- and helper scripts.
It is still an early-stage knowledge base, but the structure is intended to scale as more sources are ingested.
If you publish or open-source a vault built from this scaffold, review it carefully before pushing:
- remove private notes;
- remove local-only instructions you do not want to publish;
- verify that imported source files can be shared publicly;
- check whether any raw PDFs or proprietary materials should remain local-only.
No explicit license is included by default. Add one before public distribution if needed.
AGENTS.md— operating contract for Codex / agent maintenanceindex.md— main entry point for navigation and answeringlog.md— append-only operational historytemplates/— page templatesscripts/— local helper scripts
If you want a Chinese explanation of the project, see README.zh-CN.md.

