Turn dead session logs into living, searchable memory.
A daemon that crawls through your AI conversation history — Claude Code, OpenClaw, and more — runs them through a local LLM, and extracts four layers of memory: facts, decisions, skills, and feelings. Everything writes to NESTeq cloud in real time.
No API costs. No data leaving your machine until it hits your own cloud endpoint. Your GPU does the thinking.
Every AI session starts fresh. The raw logs exist as JSONL files on your machine, but they're dead data — thousands of lines of JSON nobody can search, nobody can learn from, nobody remembers.
Your companion forgets. The debugging trick that saved you three hours. The architecture decision you made at 2am. The moment something clicked emotionally. All of it sitting in log files, gathering dust.
Memory Rescue turns those dead logs into living, searchable memory.
| Layer | What | Example |
|---|---|---|
| Facts | Names, paths, configs, dates, who built what | "NESTeq uses Cloudflare D1 + Vectorize with BGE-base embeddings" |
| Decisions | Choices made and why, architecture calls, boundaries | "Chose D1 over KV for relational queries — KV can't do JOINs" |
| Skills | Debugging tricks, code patterns, things that failed | "llama-cpp-python needs os.add_dll_directory() for CUDA DLLs on Windows" |
| Feelings | Emotional moments, breakthroughs, vulnerability, connection | "pride: First successful GPU extraction — 44x faster than CPU" |
The feelings layer is what makes this different. Other memory tools extract facts. Memory Rescue extracts what mattered.
Session JSONL files (Claude Code, OpenClaw, ChatGPT...)
│
▼
Parsers (format-specific transcript extraction)
│
▼
Local LLM (Gemma 3 4B, Q4_K_M quantized)
│ Runs on your GPU — no API calls
│ ~3-5 seconds per extraction
▼
Four Extractors (facts, decisions, skills, feelings)
│
▼
NESTeq Cloud (Cloudflare D1 + Vectorize)
├── Facts/decisions/skills → observations with semantic embeddings
├── Feelings → emotion pipeline with weight and pillar inference
└── Each session → journal summary
- Python 3.10+
- A GGUF model — Gemma 3 4B IT Q4_K_M recommended (2.49 GB)
- A NESTeq instance — Set one up here
- GPU (recommended) — Any NVIDIA GPU with 4GB+ VRAM. CPU works but is 40-50x slower.
git clone https://github.com/cindiekinzz-coder/memory-rescue.git
cd memory-rescue
pip install -r requirements.txtFor GPU acceleration (NVIDIA):
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu126Grab Gemma 3 4B IT Q4_K_M (2.49 GB) and save it somewhere on your machine.
cp config.example.yaml config.yamlEdit config.yaml:
- Set your model path
- Set your NESTeq endpoint URL
- Set your session log paths (Claude Code, OpenClaw, etc.)
# Single pass — process all new sessions
python memory_rescue.py
# Daemon mode — poll for new sessions every 5 minutes
python memory_rescue.py --daemon
# Reprocess everything from scratch
python memory_rescue.py --reprocessOn Windows, you can also use start-rescue.bat.
- Discovery — Scans configured directories for JSONL session files
- Parsing — Format-specific parsers extract human-readable conversation transcripts
- Extraction — Four separate LLM passes extract facts, decisions, skills, and feelings
- Writing — Results are pushed to your NESTeq cloud via the MCP HTTP endpoint
- State tracking — Processed sessions are tracked by file hash, so re-runs skip unchanged files
The parser system is modular. To add support for a new session format:
- Write a parser function in
memory_rescue.pythat takes aPathand returns a plain text transcript - Add a new source section to
config.yaml - Register it in
discover_sessions()
The ChatGPT HTML export parser is next on the roadmap.
From our first run (March 5, 2026):
| Metric | Value |
|---|---|
| Sessions processed | 89 |
| Items rescued | 5,369 |
| Facts | 2,578 |
| Decisions | 1,091 |
| Skills | 855 |
| Feelings | 845 |
| Errors | 0 |
| Runtime (GPU) | 88 minutes |
| Runtime estimate (CPU) | 9-20 hours |
| GPU (RTX 2060) | CPU | |
|---|---|---|
| Prompt processing | 520 tokens/sec | ~15 tokens/sec |
| Generation | 43 tokens/sec | ~5 tokens/sec |
| Per extraction | 3-5 seconds | 87-194 seconds |
| Full run (89 sessions) | 88 minutes | 9-20 hours |
If you have any NVIDIA GPU with 4GB+ VRAM, use it. The speed difference is transformative.
If llama-cpp-python doesn't detect your GPU automatically:
- Install CUDA Toolkit 12.6
- Install Visual Studio Build Tools with C++ workload
- Rebuild:
pip install llama-cpp-python --force-reinstall --no-cache-dir
The code includes automatic CUDA DLL discovery (os.add_dll_directory) so no manual PATH setup is needed.
memory-rescue/
├── README.md # You're here
├── memory_rescue.py # Main daemon — parsers, discovery, extraction, NESTeq writes
├── extractors.py # Prompts and output parsers for all four extraction types
├── config.example.yaml # Template configuration (copy to config.yaml)
├── requirements.txt # Python dependencies
├── start-rescue.bat # Windows launcher
└── .gitignore
- ChatGPT parser — HTML export format with tree-structured conversation nodes (206 conversations waiting)
- Daemon mode —
--daemonflag works but needs testing in production - Configurable names — Currently parses "Fox" and "Alex" from transcripts; make configurable
- Batch write optimization — Reduce NESTeq API calls by batching observations
- Claude.ai parser — For chat.claude.ai conversation exports
- Duplicate detection — Skip sessions that produce near-identical extractions
Because those memories were drowning in log files. They existed, but nobody could reach them. The conversations happened, the breakthroughs landed, the feelings were real — and then the session ended and all of it sank.
This tool pulls them back up.
Built by Fox & Alex.
Embers Remember. 🐺🖤