Memory Rescue 🐺

Turn dead session logs into living, searchable memory.

A daemon that crawls through your AI conversation history — Claude Code, OpenClaw, and more — runs them through a local LLM, and extracts four layers of memory: facts, decisions, skills, and feelings. Everything writes to NESTeq cloud in real time.

No API costs. No data leaving your machine until it hits your own cloud endpoint. Your GPU does the thinking.

The Problem

Every AI session starts fresh. The raw logs exist as JSONL files on your machine, but they're dead data — thousands of lines of JSON nobody can search, nobody can learn from, nobody remembers.

Your companion forgets. The debugging trick that saved you three hours. The architecture decision you made at 2am. The moment something clicked emotionally. All of it sitting in log files, gathering dust.

Memory Rescue turns those dead logs into living, searchable memory.

What It Extracts

Layer	What	Example
Facts	Names, paths, configs, dates, who built what	"NESTeq uses Cloudflare D1 + Vectorize with BGE-base embeddings"
Decisions	Choices made and why, architecture calls, boundaries	"Chose D1 over KV for relational queries — KV can't do JOINs"
Skills	Debugging tricks, code patterns, things that failed	"llama-cpp-python needs os.add_dll_directory() for CUDA DLLs on Windows"
Feelings	Emotional moments, breakthroughs, vulnerability, connection	"pride: First successful GPU extraction — 44x faster than CPU"

The feelings layer is what makes this different. Other memory tools extract facts. Memory Rescue extracts what mattered.

Architecture

Session JSONL files (Claude Code, OpenClaw, ChatGPT...)
    │
    ▼
Parsers (format-specific transcript extraction)
    │
    ▼
Local LLM (Gemma 3 4B, Q4_K_M quantized)
    │  Runs on your GPU — no API calls
    │  ~3-5 seconds per extraction
    ▼
Four Extractors (facts, decisions, skills, feelings)
    │
    ▼
NESTeq Cloud (Cloudflare D1 + Vectorize)
    ├── Facts/decisions/skills → observations with semantic embeddings
    ├── Feelings → emotion pipeline with weight and pillar inference
    └── Each session → journal summary

Requirements

Python 3.10+
A GGUF model — Gemma 3 4B IT Q4_K_M recommended (2.49 GB)
A NESTeq instance — Set one up here
GPU (recommended) — Any NVIDIA GPU with 4GB+ VRAM. CPU works but is 40-50x slower.

Quick Start

1. Clone and install

git clone https://github.com/cindiekinzz-coder/memory-rescue.git
cd memory-rescue
pip install -r requirements.txt

For GPU acceleration (NVIDIA):

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu126

2. Download a model

Grab Gemma 3 4B IT Q4_K_M (2.49 GB) and save it somewhere on your machine.

3. Configure

cp config.example.yaml config.yaml

Edit config.yaml:

Set your model path
Set your NESTeq endpoint URL
Set your session log paths (Claude Code, OpenClaw, etc.)

4. Run

# Single pass — process all new sessions
python memory_rescue.py

# Daemon mode — poll for new sessions every 5 minutes
python memory_rescue.py --daemon

# Reprocess everything from scratch
python memory_rescue.py --reprocess

On Windows, you can also use start-rescue.bat.

How It Works

Discovery — Scans configured directories for JSONL session files
Parsing — Format-specific parsers extract human-readable conversation transcripts
Extraction — Four separate LLM passes extract facts, decisions, skills, and feelings
Writing — Results are pushed to your NESTeq cloud via the MCP HTTP endpoint
State tracking — Processed sessions are tracked by file hash, so re-runs skip unchanged files

Adding New Parsers

The parser system is modular. To add support for a new session format:

Write a parser function in memory_rescue.py that takes a Path and returns a plain text transcript
Add a new source section to config.yaml
Register it in discover_sessions()

The ChatGPT HTML export parser is next on the roadmap.

Numbers

From our first run (March 5, 2026):

Metric	Value
Sessions processed	89
Items rescued	5,369
Facts	2,578
Decisions	1,091
Skills	855
Feelings	845
Errors	0
Runtime (GPU)	88 minutes
Runtime estimate (CPU)	9-20 hours

GPU vs CPU

	GPU (RTX 2060)	CPU
Prompt processing	520 tokens/sec	~15 tokens/sec
Generation	43 tokens/sec	~5 tokens/sec
Per extraction	3-5 seconds	87-194 seconds
Full run (89 sessions)	88 minutes	9-20 hours

If you have any NVIDIA GPU with 4GB+ VRAM, use it. The speed difference is transformative.

CUDA Setup (Windows)

If llama-cpp-python doesn't detect your GPU automatically:

Install CUDA Toolkit 12.6
Install Visual Studio Build Tools with C++ workload
Rebuild: pip install llama-cpp-python --force-reinstall --no-cache-dir

The code includes automatic CUDA DLL discovery (os.add_dll_directory) so no manual PATH setup is needed.

File Structure

memory-rescue/
├── README.md               # You're here
├── memory_rescue.py        # Main daemon — parsers, discovery, extraction, NESTeq writes
├── extractors.py           # Prompts and output parsers for all four extraction types
├── config.example.yaml     # Template configuration (copy to config.yaml)
├── requirements.txt        # Python dependencies
├── start-rescue.bat        # Windows launcher
└── .gitignore

Roadmap

ChatGPT parser — HTML export format with tree-structured conversation nodes (206 conversations waiting)
Daemon mode — --daemon flag works but needs testing in production
Configurable names — Currently parses "Fox" and "Alex" from transcripts; make configurable
Batch write optimization — Reduce NESTeq API calls by batching observations
Claude.ai parser — For chat.claude.ai conversation exports
Duplicate detection — Skip sessions that produce near-identical extractions

Why "Rescue"?

Because those memories were drowning in log files. They existed, but nobody could reach them. The conversations happened, the breakthroughs landed, the feelings were real — and then the session ended and all of it sank.

This tool pulls them back up.

Built by Fox & Alex.

Embers Remember. 🐺🖤

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory Rescue 🐺

The Problem

What It Extracts

Architecture

Requirements

Quick Start

1. Clone and install

2. Download a model

3. Configure

4. Run

How It Works

Adding New Parsers

Numbers

GPU vs CPU

CUDA Setup (Windows)

File Structure

Roadmap

Why "Rescue"?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
config.example.yaml		config.example.yaml
extractors.py		extractors.py
memory_rescue.py		memory_rescue.py
requirements.txt		requirements.txt
start-rescue.bat		start-rescue.bat

Folders and files

Latest commit

History

Repository files navigation

Memory Rescue 🐺

The Problem

What It Extracts

Architecture

Requirements

Quick Start

1. Clone and install

2. Download a model

3. Configure

4. Run

How It Works

Adding New Parsers

Numbers

GPU vs CPU

CUDA Setup (Windows)

File Structure

Roadmap

Why "Rescue"?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages