LLM Personal Knowledge Base

A source-backed personal research wiki built with Codex + Obsidian.

Turn papers, PDFs, web captures, and research questions into a durable Markdown knowledge base instead of letting them disappear into temporary chats.

Why this exists

Most AI-assisted reading workflows have the same failure mode: useful ideas are generated, but they remain trapped inside ephemeral conversations.

This project is designed to make research knowledge:

source-backed — raw material stays separate from synthesis;
maintainable — topic and entity pages get updated over time instead of duplicated;
queryable — questions can be answered from accumulated notes;
auditable — operational updates are tracked in index.md and log.md;
portable — everything lives in plain Markdown inside an Obsidian-friendly vault.

Who this is for

This repository is especially useful for people who:

read papers regularly;
use Obsidian for long-term note organization;
want AI to help maintain knowledge structures, not just generate one-off summaries;
care about evidence boundaries and reusable synthesis.

Quick start

# 1. Put a new source into inbox/
# 2. Optionally normalize the filename
python scripts/new_source.py inbox/some-paper.pdf --move

# 3. Ask Codex to ingest it and update the vault
# 4. Run a structural check when needed
python scripts/lint_wiki.py

What it looks like

Example workflow

A typical end-to-end loop looks like this:

Drop a source into inbox/
Example: a new paper PDF or an abstract snapshot.
Normalize and import the source
Use scripts/new_source.py if you want predictable filenames inside raw/sources/.
Ask Codex to ingest it
Codex creates one source-note, updates related topic / entity / overview pages, then refreshes index.md and log.md.
Ask questions against the vault
Instead of re-reading everything from scratch, query the accumulated wiki.
Promote durable answers into wiki/analyses/
If a comparison, synthesis, or conclusion is likely to matter again, store it as a permanent analysis page.
Run lint to keep the vault healthy
Use scripts/lint_wiki.py to catch missing metadata, broken links, and structural issues.

In short:

inbox/ -> raw/sources/ -> source-note -> topic/entity pages -> analysis -> ongoing maintenance

What you get

This repository separates:

raw source material;
agent-maintained wiki pages;
operational logs, templates, and navigation pages.

The result is a personal knowledge base that can:

absorb new papers incrementally;
preserve evidence boundaries;
support question answering from accumulated notes;
keep long-term topic pages updated instead of rewriting everything from scratch;
remain readable and editable in plain Markdown.

Core idea

The workflow is built on three layers:

Sources — immutable imported material such as PDFs, Markdown exports, notes, or HTML snapshots.
Wiki — curated pages that distill sources into topic pages, entity pages, source notes, and durable analyses.
Operations — index, log, templates, and scripts that make the system maintainable over time.

Codex acts as the maintenance layer:

ingesting new material,
updating affected pages,
keeping links and metadata consistent,
answering questions from the wiki,
and writing durable analysis pages when useful.

Obsidian acts as the human-facing layer:

browsing,
linking,
reviewing edits,
and maintaining a navigable long-term knowledge graph.

Repository structure

.
├── AGENTS.md                 # Operational contract for Codex / agent behavior
├── README.md                 # English project overview
├── README.zh-CN.md           # Chinese companion README
├── index.md                  # Primary navigation page
├── log.md                    # Append-only ingest / query / lint / refactor log
├── inbox/                    # Drop zone for newly added materials
├── raw/
│   ├── sources/              # Imported source files (treated as immutable)
│   └── assets/               # Local images / attachments tied to sources
├── scripts/
│   ├── new_source.py         # Source normalization helper
│   └── lint_wiki.py          # Structural wiki checks
├── templates/                # Templates for source-note / topic / entity / analysis / glossary
└── wiki/
    ├── overview/             # Research scope and thesis pages
    ├── topics/               # Topic pages
    ├── entities/             # Methods / datasets / people / organizations / artifacts
    ├── source-notes/         # Per-source evidence pages
    ├── analyses/             # Durable question-driven synthesis pages
    └── glossaries/           # Terminology pages

Design principles

1. Sources stay separate from synthesis

Raw files are the source of truth. Synthesis pages should cite source-note pages rather than citing raw files directly.

2. Evidence boundaries must remain visible

If a page is based only on an abstract, metadata snapshot, or partial reading, it should say so explicitly.

3. Canonical pages beat duplicated summaries

The goal is to accumulate knowledge into stable topic/entity pages, not to create a new disconnected note for every interaction.

4. Durable answers belong in the vault

If a question produces a useful answer that will matter again, it should be written into wiki/analyses/ instead of living only in chat history.

5. Markdown first

Everything should remain inspectable in plain text, git-friendly, and Obsidian-compatible.

Typical workflow

1. Add a new source

Put a new file into inbox/.

Examples:

a paper PDF;
an arXiv abstract snapshot;
a manually saved web page;
a Markdown export from another tool.

2. Prepare or normalize the source

Use the helper script if needed:

python scripts/new_source.py inbox/some-paper.pdf --move

This suggests a normalized filename and, with --move, moves the file into raw/sources/.

3. Ask Codex to ingest it

A typical ingest task should:

create exactly one source-note;
update the relevant topic, entity, and overview pages;
update index.md;
append an ingest entry to log.md.

4. Ask questions against the wiki

Examples:

What is the core claim of this paper?
How does this method differ from prior work already in the vault?
What open questions remain across the current topic pages?

The preferred behavior is to answer from the wiki first, then expand into source notes as needed.

5. Run lint periodically

python scripts/lint_wiki.py

This checks for:

missing frontmatter keys,
missing section headings,
obvious missing citations,
broken wikilinks,
orphan pages with no inbound links.

Included local helpers

`scripts/new_source.py`

Purpose:

derive a normalized title and slug;
suggest the final file path in raw/sources/;
suggest the corresponding source-note path;
optionally move the file into place.

`scripts/lint_wiki.py`

Purpose:

verify minimum frontmatter requirements;
catch broken or missing internal links;
identify pages that look uncited;
detect orphan pages.

It is intentionally lightweight. It does not replace human review.

Page model

The vault distinguishes several page types:

overview — high-level scope and thesis pages;
topic — conceptual or problem-domain pages;
entity — concrete methods, models, datasets, authors, or organizations;
source-note — evidence pages tied to a single source;
analysis — durable question-driven synthesis pages;
glossary — terminology pages.

Each wiki page is expected to carry YAML frontmatter and follow a standard section structure. See AGENTS.md for the full contract.

Recommended prompt patterns

Examples of useful prompts when working with Codex:

Please ingest the new source in inbox/ and update the affected pages.
Answer this question from the wiki first, then inspect source-notes only if needed.
Turn this answer into a durable analysis page.
Run a lint-style review and tell me which pages are too thin or weakly supported.
Check whether this topic already has a canonical page before creating a new one.

What makes this different from ordinary note-taking

This repository is optimized for knowledge maintenance, not just knowledge capture.

That means it emphasizes:

stable canonical pages,
explicit source provenance,
append-only operational history,
agent-readable rules,
and long-term reuse of prior synthesis.

In short: this is closer to a personal research wiki or external memory system than a notebook.

Current status

The repository already contains a small but real research corpus with:

topic pages,
entity pages,
source notes,
a research overview,
an operational log,
and helper scripts.

It is still an early-stage knowledge base, but the structure is intended to scale as more sources are ingested.

Public-sharing note

If you publish or open-source a vault built from this scaffold, review it carefully before pushing:

remove private notes;
remove local-only instructions you do not want to publish;
verify that imported source files can be shared publicly;
check whether any raw PDFs or proprietary materials should remain local-only.

License / usage

No explicit license is included by default. Add one before public distribution if needed.

Related files

AGENTS.md — operating contract for Codex / agent maintenance
index.md — main entry point for navigation and answering
log.md — append-only operational history
templates/ — page templates
scripts/ — local helper scripts

If you want a Chinese explanation of the project, see README.zh-CN.md.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
raw/sources		raw/sources
scripts		scripts
templates		templates
wiki		wiki
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
index.md		index.md
log.md		log.md

Folders and files

Latest commit

History

Repository files navigation

LLM Personal Knowledge Base

Why this exists

Who this is for

Quick start

What it looks like

Example workflow

What you get

Core idea

Repository structure

Design principles

1. Sources stay separate from synthesis

2. Evidence boundaries must remain visible

3. Canonical pages beat duplicated summaries

4. Durable answers belong in the vault

5. Markdown first

Typical workflow

1. Add a new source

2. Prepare or normalize the source

3. Ask Codex to ingest it

4. Ask questions against the wiki

5. Run lint periodically

Included local helpers

scripts/new_source.py

scripts/lint_wiki.py

Page model

Recommended prompt patterns

What makes this different from ordinary note-taking

Current status

Public-sharing note

License / usage

Related files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scripts/new_source.py`

`scripts/lint_wiki.py`

Packages