🧠 Summaryception

Layered Recursive Memory for SillyTavern

Your AI remembers thousands of turns in under 20k tokens. No context bloat. No lost plot threads. No compromises.

Summaryception is a non-destructive, context-aware memory system for SillyTavern that replaces brute-force context stuffing with intelligent layered summarization. It keeps your most recent turns verbatim while compressing older conversation into ultra-compact summary snippets — organized in recursive layers that scale indefinitely.

✨ The Problem

Every roleplayer hits the same wall:

Approach	What happens
No summary	Context fills up → hard truncation → everything before the cutoff is gone forever
Basic summarization	Lossy single summary → important details vanish → you compensate by keeping 20+ verbatim turns → context still bloats
More verbatim turns	20 turns × 1,500 tokens = 30k tokens of mostly atmospheric prose the LLM has to process → attention degrades → coherence drops anyway

The conventional wisdom — "keep more raw turns" — is a band-aid for bad compression.

💡 The Solution

Summaryception flips the approach: compress aggressively, lose nothing.

A context-aware summarizer prompt examines each batch of turns against what's already been summarized, outputting only the narrative delta — new plot points, state changes, character decisions. Nothing repeated, nothing lost.

Real example — 3 turns, ~5,200 raw tokens, compressed to:

Alina kisses Lodactio; he reciprocates; she grows to human size, straddling him; she squeezes his crotch and intends to undo his pants.

27 tokens. 125:1 compression. Every plot-relevant beat preserved.

And because each snippet is written with knowledge of all previous snippets, they get more efficient over time — the summarizer doesn't re-establish characters, settings, or relationships that are already recorded. Early snippets might be 40 tokens. By snippet 10, you're seeing 15-token summaries.

🔄 How It Works

YOUR CHAT (e.g., 200 turns)
│
│  Turns 1-180: Ghosted (hidden from LLM, still readable by you)
│  Turns 181-200: Kept VERBATIM (7 most recent assistant turns)
│
│  The ghosted turns have been compressed into:
│
│  ┌──────────────────────────────────────────┐
│  │  LAYER 2 (Deep Memory)                   │
│  │  Ultra-compressed summaries of Layer 1   │
│  │  Each covers ~27 turns                   │
│  ├──────────────────────────────────────────┤
│  │  LAYER 1 (Meta-Summaries)                │
│  │  Compressed summaries of Layer 0         │
│  │  Each covers ~9 turns                    │
│  ├──────────────────────────────────────────┤
│  │  LAYER 0 (Turn Summaries)                │
│  │  Direct summaries of conversation turns  │
│  │  Each covers ~3 turns                    │
│  ├──────────────────────────────────────────┤
│  │  VERBATIM TURNS (most recent 7)          │
│  │  Sent word-for-word to the LLM           │
│  └──────────────────────────────────────────┘
│
│  Total context: ~12k tokens
│  Total narrative coverage: EVERYTHING

The "Ception" — Recursive Layer Promotion

When a layer fills up (default: 30 snippets), the oldest snippets are promoted to the next deeper layer:

Seed promotion — The very first time a deeper layer opens, the oldest snippet is promoted directly as a seed, no LLM call, preserving maximum information as the foundation for that layer.
Subsequent promotions — Additional overflow snippets (default: 3 at a time) are summarized together against the destination layer's existing content (including the seed) as prior context.
Cascade — If the destination layer also fills up, the process repeats, creating Layer 2, Layer 3, etc.

Each layer multiplies the turn coverage per snippet while maintaining coherent narrative continuity.

📊 The Math

Using ~70 tokens per snippet as a working median. Actual snippets range from ~30 tokens (late-layer, minimal delta) to ~100 tokens (early context establishment). Your results will vary based on narrative density.

Default Settings (30 snippets/layer, 3 turns/batch, 3 snippets/promotion)

Each promotion merges 3 snippets into 1, so each layer multiplies turn coverage by 3×.

Layer	Snippets	Turns per Snippet	Total Turns Covered	~Tokens
Verbatim	7 turns	—	7	~5,000
Layer 0	30	3	90	~2,100
Layer 1	30	9	270	~2,100
Layer 2	30	27	810	~2,100
Layer 3	30	81	2,430	~2,100
Layer 4	30	243	7,290	~2,100
Total	—	—	~10,897 turns	~15,500 tokens

Nearly 11,000 turns of narrative history in ~16k tokens.

Conservative Estimate (smaller snippets at deeper layers)

In practice, deeper layers produce shorter snippets as they compress already-compressed material. A more realistic breakdown:

Layer	Snippets	Turns per Snippet	~Tokens/Snippet	~Layer Tokens
Verbatim	7 turns	—	—	~5,000
Layer 0	30	3	~80	~2,400
Layer 1	30	9	~70	~2,100
Layer 2	30	27	~60	~1,800
Layer 3	30	81	~50	~1,500
Layer 4	30	243	~40	~1,200
Total	—	—	—	~14,000 tokens

~11,000 turns in ~14k tokens. The raw conversation? Roughly 15–25 million tokens. That's a compression ratio approaching 1,000:1.

For comparison, most roleplayers hit 17,500 tokens by turn 10 with verbatim context. Summaryception uses the same budget to remember eleven thousand.

👻 Non-Destructive Ghost Mode

Summaryception never deletes your messages. Summarized turns are ghosted — hidden from the LLM context but fully visible and readable in your chat UI. Scroll up anytime to read the original prose.

👻 Ghosted messages show a system icon in the chat
Clear Memory unghosts everything instantly
Your chat file is never modified destructively

🧹 Clean Prompt Isolation

When Summaryception calls the summarizer, it temporarily disables all Chat Completion preset toggles — your creative writing prompts, character instructions, scenario text, everything. The summarizer sees only its own system prompt and the passage to summarize.

This means:

✅ No 4k tokens of story-writing instructions competing with a 200-token summarization task
✅ Budget models perform like premium models on the focused extraction task
✅ Your preset is restored instantly after, whether the call succeeds or fails

🔁 Context-Aware Incremental Summaries

This is the core innovation. The summarizer prompt receives:

Variable	Contents
`{{player_name}}`	Your active persona name
`{{context_str}}`	That layer's existing summaries — everything already recorded
`{{story_txt}}`	The new passage to summarize

The instruction: "Summarize only necessary elements to coherently continue the Prior Context. Exclude anything already covered."

This means:

Snippet 1 has to establish characters, setting, relationships (~80–100 tokens)
Snippet 5 only records new events — characters and setting are established (~50–70 tokens)
Snippet 10 is pure narrative delta — just what changed (~30–50 tokens)
Layer promotions work the same way — each deeper layer summarizes against its own existing content

Every summary is a minimal diff, not a redundant recap.

🛡️ Resilient API Handling

Exponential backoff with jitter for rate limits (429) and server errors (500/502/503/504)
Retry-After header respected when the server provides one
Retries up to 5 times with delays from 2s → 60s
Failed batches are never ghosted — turns stay visible for the next attempt
Network errors, timeouts, and empty responses are all handled gracefully

📦 Backlog Detection

Opening an existing chat with 100+ messages? Summaryception detects the backlog and offers you a choice:

Option	What it does
Process Entire Backlog	Summarizes everything with a cancelable progress bar
Skip Backlog	Ignores old turns, only tracks new ones going forward
Just One Batch	Processes a single batch now, handles the rest incrementally

Progress is saved continuously — cancel anytime and pick up where you left off.

⚙️ Configuration

All settings are adjustable from the SillyTavern Extensions panel:

Setting	Default	Description
Verbatim Turns	10	Recent assistant turns kept word-for-word
Turns per Batch	3	Oldest turns summarized together per trigger
Snippets per Layer	30	Max snippets before promoting to next layer
Snippets per Promotion	3	How many snippets merge on promotion
Max Layers	5	Maximum recursion depth

Prompt Presets

Preset	Best For	Focus
Narrative State (default)	Character RP, drama, thrillers, romance	Interactions, emotions, relationships, atmosphere, subtext
Game State	Roguelites, strategy, mechanical games	Plot points, quests, locations, interactables, state changes
Custom	Your own prompt	Whatever you write

Select a preset from the dropdown in Summary Prompts, or edit the prompt directly — it auto-switches to Custom.

Plus fully customizable:

📝 Summarizer system prompt
📝 Summarizer user prompt (with {{player_name}}, {{context_str}}, {{story_txt}} variables)
📝 Injection wrapper template

🔌 Connection Settings

Summaryception can use different backends for summarization, independent of your main chat connection:

Source	Description
Default	Uses SillyTavern's active connection — simplest setup
OpenAI Compatible	Direct endpoint with URL, API key, and model — bypasses all ST formatting
Ollama	Local Ollama instance with model browser
Connection Profile	Uses an ST Connection Profile (⚠️ inherits preset formatting — may degrade summary quality)

💡 Recommended: Use Default or OpenAI Compatible for cleanest results. Connection Profiles inject preset formatting into summary requests, which can cause the model to roleplay instead of summarize.

🗂️ Built-in Tools

Layer Stats — Live view of snippet counts per layer and ghosted message count
Injection Preview — See exactly what gets sent to the LLM
Snippet Browser — Browse, edit, regenerate, and delete individual snippets across all layers
Export/Import — Save and restore memory as JSON
Force Summarize — Manually trigger summarization
Stop — Abort any running summarization, progress saved
Repair — Find and fix orphaned hidden messages
Slash Commands — /sc-status, /sc-preview, /sc-clear

📥 Installation

Requirements

SillyTavern 1.16.0+ (release or staging). Older versions use an incompatible generateRaw signature and will not work correctly.

From SillyTavern UI

Open Extensions → Install Extension
Paste: https://github.com/Lodactio/Extension-Summaryception
Click Install
Find 🧠 Summaryception in Extensions settings

Manual

cd SillyTavern/data/default-user/extensions/third-party/
git clone https://github.com/Lodactio/Extension-Summaryception

Restart SillyTavern and enable the extension.

🧠 Why This Works Better

	Traditional Summary	Summaryception
Compression	~10:1, lossy	~125:1 per snippet, lossless
Context at turn 100	30k+ tokens or truncated	~10k tokens
Context at turn 1,000	Impossible without truncation	~13k tokens
Context at turn 10,000	Literally impossible	~16k tokens
Lost information	Lots — hedged by keeping more raw turns	None — every state change is tracked
Coherence over time	Degrades as context grows	Stable indefinitely
Works with budget models	Poorly — needs powerful summarizer	Excellently — prompt does the heavy lifting

Community Forks

These forks extend Summaryception with specialized features. They use the same internal module name, so install only one at a time — your existing settings and per-chat memory will carry over.

Per-Character Memory Banks by dogoo9

Keeps separate summary memory for each character card in the same chat. Useful for group chats or stories where you switch between characters and don't want their memories bleeding together.

Lorebook Ingestion by Romuromylus

Automatically extracts stable facts (character traits, locations, items) from summaries and proposes them as World Info entries. Summaries handle events and state changes; lorebook entries handle things that should never be forgotten across layers. Includes a review queue so nothing gets written without your approval.

🤝 Credits

Built with frustration at context limits and love for long-form roleplay.

If this saves your 500-turn adventure from amnesia, consider starring the repo. ⭐

📄 License

AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
LICENSE		LICENSE
README.md		README.md
connectionutil.js		connectionutil.js
index.js		index.js
manifest.json		manifest.json
settings.html		settings.html
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Summaryception

Layered Recursive Memory for SillyTavern

✨ The Problem

💡 The Solution

🔄 How It Works

The "Ception" — Recursive Layer Promotion

📊 The Math

Default Settings (30 snippets/layer, 3 turns/batch, 3 snippets/promotion)

Conservative Estimate (smaller snippets at deeper layers)

👻 Non-Destructive Ghost Mode

🧹 Clean Prompt Isolation

🔁 Context-Aware Incremental Summaries

🛡️ Resilient API Handling

📦 Backlog Detection

⚙️ Configuration

Prompt Presets

🔌 Connection Settings

🗂️ Built-in Tools

📥 Installation

Requirements

From SillyTavern UI

Manual

🧠 Why This Works Better

Community Forks

Per-Character Memory Banks by dogoo9

Lorebook Ingestion by Romuromylus

🤝 Credits

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Summaryception

Layered Recursive Memory for SillyTavern

✨ The Problem

💡 The Solution

🔄 How It Works

The "Ception" — Recursive Layer Promotion

📊 The Math

Default Settings (30 snippets/layer, 3 turns/batch, 3 snippets/promotion)

Conservative Estimate (smaller snippets at deeper layers)

👻 Non-Destructive Ghost Mode

🧹 Clean Prompt Isolation

🔁 Context-Aware Incremental Summaries

🛡️ Resilient API Handling

📦 Backlog Detection

⚙️ Configuration

Prompt Presets

🔌 Connection Settings

🗂️ Built-in Tools

📥 Installation

Requirements

From SillyTavern UI

Manual

🧠 Why This Works Better

Community Forks

Per-Character Memory Banks by dogoo9

Lorebook Ingestion by Romuromylus

🤝 Credits

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages