Skip to content

jpalczewski/kajet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

131 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ““ kajet

codecov

Journaling-focused RAG for Obsidian vaults, optimized for Apple Silicon GPU. Runs as an MCP server with a web dashboard (Serena-inspired). Think Rosebud AI but for your local markdown notes.

πŸ“‹ Changelog | πŸ› οΈ Tools Reference | πŸ—ΊοΈ Roadmap

Why "kajet"?

Kajet is an old/regional Polish word for a notebook (from French cahier). Once common, now mostly found in dialects or among older generations. The name came from a walk in the snow with the dog (the /touch-grass endpoint was temporarily unavailable due to weather conditions) — the phrase "sprawdzić w kajecie" ("check it in the notebook") struck me as an absurdly fitting thing to say to an LLM.

Why this exists

I take a lot of notes in Obsidian and wanted a proper RAG pipeline that actually works for me β€” local, fast, and tailored to how I use my vault. local-rag was an interesting starting point, but it runs JS-only CPU models. I wanted something optimized for macOS and Apple Silicon GPU, not a glorified grep burning through CPU cycles.

Also: the male urge to write a side-project in Rust was too strong. Nobody talks about the 30 GB target/ folder, but here we are.

Features

  • πŸ” Semantic search over your entire vault via MCP search tool (hybrid vector + full-text)
  • 🧠 Local embeddings β€” AllMiniLM-L6-v2 via candle, Metal GPU on Apple Silicon, with custom model support
  • ⚠️ Embeddings migration path β€” The built-in Candle backend works but is limited to a handful of models. For access to modern embedding models (nomic-embed, BGE-M3, E5-mistral, multilingual models, etc.), kajet now supports Text Embeddings Inference (TEI) via the kajet-remote crate. TEI can run locally on the same machine or point to a remote endpoint as your needs scale. Candle backend remains available but is in maintenance mode.
  • 🌍 Unicode normalization β€” handles the two ways of writing Δ™ in Unicode: NFC (Δ™ as one character) vs NFD (e + combining ogonek). Searching for "GdaΕ„sk" finds "GdaΕ„sk" even when your filesystem and editor disagree on encoding
  • ⚑ Incremental indexing β€” only re-embeds changed files (content hashing)
  • πŸ‘€ Live file watcher β€” picks up vault changes automatically
  • πŸ“ Note editing β€” create, edit, append, and modify notes directly via MCP tools (create_note, edit_note)
  • 🌐 Web dashboard β€” search playground + live MCP event stream via WebSocket
  • ☁️ Cloud storage support β€” auto-detects cloud-synced vaults (iCloud, OneDrive, Dropbox, Google Drive) and stores LanceDB outside the sync folder. This is critical for performance: indexing 600 files on iCloud takes ~1 file/sec vs ~70s total (~8.5 files/sec) when stored locally on M4 Mac
  • πŸ“¦ Single binary β€” frontend embedded at compile time, zero runtime dependencies

System Requirements

Tested platforms:

  • macOS (Apple Silicon) β€” primary target with Metal GPU acceleration
  • Linux (x86_64) β€” experimental CPU fallback

Runtime requirements (rough estimates):

  • 16GB RAM (depends on vault size)
  • ~500MB disk space for embedding model
  • Additional space for LanceDB (varies by vault size)

Building from source:

  • 15-40GB free disk space for Rust target/ directory and dependencies
  • Expect 5-10 minute initial build (LanceDB pulls in large dependency trees)

Recommended:

  • macOS 12+ (Monterey) or later for Metal GPU support
  • 8GB+ RAM for vaults with 1000+ files

Prerequisites

macOS

brew install protobuf

Ubuntu/Debian

sudo apt install protobuf-compiler libssl-dev build-essential pkg-config

⚠️ Note: The above is AI hallucination. For a working Linux build, see the CI workflow β€” you'll need to translate dependencies to your favorite distro.

Installation

From source

# Clone and build (requires Deno for frontend build)
git clone https://github.com/yourusername/kajet.git
cd kajet
cd frontend && deno install && deno task build && cd ..
cargo build --release

# Binary will be at target/release/kajet

⚠️ Important Warning

kajet includes tools that can modify and delete your notes. Destructive edits are backed up automatically, but this is experimental software. LLM agents can be unpredictable β€” data loss is a real risk if your agent decides to overwrite files because you didn't say "good morning" or "thank you" nicely enough.

This MCP is for playing around with data you have backed up. Use git, Time Machine, or whatever backup solution you trust. Don't point it at your only copy of anything important.

You've been warned. πŸ™ƒ

Usage

As MCP server (Claude Code / Claude Desktop)

Add to your .claude/mcp.json or Claude Desktop config:

{
  "mcpServers": {
    "kajet": {
      "command": "/path/to/kajet",
      "args": ["--vault", "/path/to/your/obsidian/vault"]
    }
  }
}

The dashboard will be available at http://localhost:3579 while the MCP server is running.

Example queries to try in Claude:

  • "What notes do I have about machine learning?"
  • "Find my thoughts on productivity systems"
  • "Show me notes mentioning both Rust and performance"

With Goose

Goose has a GUI where you can add MCP servers by clicking through the interface.

Protip: Use the path to your built binary from target/release/kajet and pass --vault /path/to/your/markdown/repo as arguments.

Standalone (search playground only)

kajet --vault ~/Obsidian/Vault
# Dashboard at http://localhost:3579

Architecture

stdin/stdout ←→ [MCP stdio] ←→ Engine ←→ [Axum HTTP :3579] ←→ Browser
                                  ↓
                              LanceDB
                          (.kajet/ in vault or ~/Library/Application Support/kajet/)

Stack:

  • MCP: Official rmcp SDK with #[tool] macros
  • Embeddings: candle (AllMiniLM-L6-v2, Metal GPU on macOS, CPU fallback on Linux)
  • Vector DB: LanceDB (embedded, Lance columnar format)
  • Frontend: Svelte + Vite, embedded in binary via rust-embed
  • File watching: notify for live re-indexing
  • HTTP: Axum with WebSocket support

Workspace structure:

  • crates/core β€” Domain model, Engine, trait definitions
  • crates/parser β€” Markdown parsing, chunking, wikilink extraction
  • crates/backend β€” Concrete implementations (embedder, vector store)
  • crates/indexer β€” Incremental indexing pipeline and file watcher
  • crates/mcp β€” MCP protocol handler
  • crates/web β€” Axum HTTP server and WebSocket broadcaster
  • crates/writer β€” Note creation and editing

MCP Tools

kajet provides 12 MCP tools for semantic search, note management, and vault exploration.

See docs/TOOLS.md for complete documentation with parameters and examples.

Development

Prerequisites for building

Building from source

# 1. Build frontend first
cd frontend
deno install
deno task build
cd ..

# 2. Build Rust workspace
cargo build --release

# Binary at target/release/kajet

Running tests

# Run all tests with cargo-nextest (recommended)
cargo nextest run --workspace

# Run tests for a specific crate
cargo nextest run -p kajet-parser

# Run a single test by name
cargo nextest run -E 'test(test_name)'

# Fallback to standard cargo test if nextest not installed
cargo test --workspace

Code style

# Format check (runs on pre-commit hook)
cargo fmt --check

# Lint (runs on pre-commit hook)
cargo clippy --workspace -- -D warnings

# Typo check (runs on pre-commit hook)
typos

Pre-commit hooks are managed via Lefthook. Install with:

lefthook install

Running with MCP Inspector

# Use the included script
./run-inspector.sh /path/to/vault

# Or manually
npx @modelcontextprotocol/inspector cargo run -- --vault /path/to/vault

Project conventions

  • Commits: Use Conventional Commits (feat:, fix:, refactor:, perf:, docs:, test:, chore:, ci:)
  • i18n: User-facing strings go through t!() macro (rust-i18n). Locale files: locales/{en,pl}.toml
  • Logging levels:
    • INFO = entry point (query, params, result count)
    • DEBUG = timings and score stats
    • TRACE = raw data (embeddings, scores)
    • Use #[tracing::instrument] with skip(self) on search methods

Workspace architecture

The project uses a Cargo workspace with trait-based dependency injection:

kajet (root binary)
β”œβ”€β”€ kajet-core        # Domain model, Engine, trait definitions
β”œβ”€β”€ kajet-parser      # Markdown parsing, chunking, wikilinks
β”œβ”€β”€ kajet-backend     # Concrete implementations (embedder, vector store)
β”œβ”€β”€ kajet-indexer     # Incremental indexing pipeline + file watcher
β”œβ”€β”€ kajet-mcp         # MCP protocol handler
β”œβ”€β”€ kajet-web         # Axum HTTP server + WebSocket
└── kajet-writer      # Note creation and editing (WIP)

Roadmap

See GitHub Issues for planned features.

Acknowledgments

About

Journaling-focused RAG MCP server for Obsidian. Local embeddings, Metal GPU, semantic search for your markdown notes. πŸ““

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors