This is the reference for the Python surface — every class, every method, every parameter. If you arrived here while writing consumer code, you're in the right place. If you're trying to start using TardigradeDB, the Quickstart is friendlier; this page assumes you already know what you're looking for and want the signature.
The surface has two main classes. TardigradeClient is the high-level facade you'll use for most things — it bundles the engine, file ingestion, multi-view consolidation, and the query path behind one object so you don't have to wire those pieces together yourself. KnowledgePackStore is the lower-level consumer for HuggingFace direct injection — use it when you want zero-token KV cache injection into model.generate() rather than the convenience of the facade.
The high-level entry point. Bundles Engine, FileIngestor, MemoryConsolidator, and the query path behind a unified API — so you can write client.store(...) and client.query(...) without instantiating the engine or the ingestion machinery yourself.
TardigradeClient(
db_path,
*,
tokenizer=None,
owner=1,
kv_capture_fn=None,
vamana_threshold=9999,
)db_path—str | Path. Directory for persistent storage; the engine is created internally and lives here for the client's lifetime.tokenizer— a tokenizer with.encode()/.decode()methods. Required for real KV capture; omit it only for the random-stub testing path below.owner— owner id for memory isolation across agents or tenants (default:1). One client always operates under one owner; create separate clients for separate owners.kv_capture_fn—(chunk_text, tokenizer) -> (key, layer_payloads). The function the client calls to turn a chunk of text into a retrieval key (the vector used for similarity scoring) plus the per-layer KV tensors that get persisted as the pack. If you passNone, the client falls back to a random-vector stub — fine for smoke-testing the API shape but produces near-random retrieval. For real use, supply a function that drives a forward pass on your model; seeknowledge-pack-store.mdfor the canonical HuggingFace bridge.vamana_threshold— pack count at which the engine starts using its Vamana ANN graph instead of brute-force search. The default of9999keeps brute-force on for small workloads where it's faster anyway.
| Method | Returns | Description |
|---|---|---|
store(fact_text, *, salience=80.0) |
int (pack_id) |
Store a single fact as a KV pack |
query(query_text, *, k=5) |
list[dict] |
Retrieve top-k packs |
ingest_text(text, *, document_id=None, chunk_size=512) |
IngestResult |
Chunk and ingest a text document |
ingest_file(path, *, document_id=None, chunk_size=512) |
IngestResult |
Read a file and ingest it |
consolidate(pack_id) |
int |
Attach multi-view retrieval keys to one pack; returns views attached |
consolidate_all() |
dict[int, int] |
Consolidate all eligible packs; returns {pack_id: views_attached} |
list_packs() |
list[dict] |
All packs for this owner |
pack_count() |
int |
Number of packs for this owner |
engine |
Engine |
Direct access to the underlying tardigrade_db.Engine |
Full end-to-end injection via HuggingFace models. Use when you need KV cache injection directly into model.generate().
KnowledgePackStore(engine, model, tokenizer, owner=1, query_layer=None)engine— pre-builttardigrade_db.Engineinstance.model— HuggingFace causal LM (AutoModelForCausalLMor compatible).tokenizer— the matching tokenizer, including its chat template — the store path wraps text intokenizer.apply_chat_template(...)before the forward pass.owner— owner id for memory isolation (default:1).query_layer— which transformer layer's hidden states to read for the retrieval key. Defaults to roughly two-thirds of the way through the model (int(num_hidden_layers × 0.67)), a heuristic that works well for most uniform-softmax models because the middle-to-late layers carry the most semantic signal — earlier layers are too lexical, the final layers are too output-shaped. For hybrid-attention models the heuristic doesn't apply; useCalibrationRegistryinstead.
Store a fact as a KV cache pack. Returns the assigned pack_id.
pack_id = kps.store("User prefers morning meetings before 10am")Store a fact and link it to an existing memory.
existing = kps.store("Went to bookstore in Pilsen")
kps.store_and_link("The bookstore is called Casa Azul", existing)Store multiple related facts and link them all to each other.
kps.store_linked([
"Lucia's instructor is Tomoko",
"Tomoko drives a Honda Civic",
])Delete a memory permanently.
Retrieve the best matching memory, inject its KV cache, generate a response.
Returns (generated_text, prompt_tokens, had_memory).
Retrieve with trace-boosted scoring, follow trace links, compose multiple packs, inject, generate.
Lower-level: retrieve and build DynamicCache without generating.
Returns (cache, query_ids, attention_mask) or (None, query_ids, None).
Retrieve k packs and compose them. No trace link following.
⚠️ Validation status. The 2026-05-14 bench audit found that every RLS mode (keyword / multiphrasing / embedding / generative / agent) underperforms the no-RLS baseline on clean LoCoMo; the DeepSeek agent reformulator loses 12.7pp. The API is documented below for completeness, but RLS is not the recommended retrieval path today. Seedocs/guide/concepts.md§ Reflective Latent Search and the bench audit before reaching for it.
Agentic retrieval loop that reformulates queries when the initial retrieval is not confident.
ReflectiveLatentSearch(
engine,
model,
tokenizer,
query_layer,
hidden_size,
owner=1,
k=5,
strategies=None, # default: [KeywordExpansionStrategy()]
confidence_threshold=1.10,
max_attempts=2,
)The loop: retrieve, evaluate confidence, and if confidence is low, reformulate the query, re-retrieve for each variant, and fuse the rankings via Reciprocal Rank Fusion (RRF).
Confidence is computed as the score of the top-ranked match divided by the score of the second-ranked match — score[0] / score[1]. If that ratio is at or above confidence_threshold (default 1.10), the top result is clearly stronger than the alternatives and RLS returns immediately. If it's below, the retrieval is ambiguous and RLS iterates through the configured strategies, fusing the resulting ranked lists with RRF.
See concepts.md before reaching for this — RLS is documented but does not currently improve over the no-RLS baseline on clean benchmark data.
Run the full RLS loop. Returns a list of MemoryCellHandle — each handle is a lightweight reference to a stored memory (analogous to a pack id, but the type used through the RLS API specifically; see the MemoryCellHandle entry under tardigrade_hooks below).
from tardigrade_hooks.rls import ReflectiveLatentSearch, KeywordExpansionStrategy
rls = ReflectiveLatentSearch(
engine, model, tokenizer,
query_layer=16, hidden_size=1024,
strategies=[KeywordExpansionStrategy()],
)
handles = rls.query("What outdoor activities does this person enjoy?")All strategies implement ReformulationStrategy.reformulate(query_text) -> list[str].
| Class | Requires | Description |
|---|---|---|
KeywordExpansionStrategy() |
nothing | Extract content words, expand via synonym table |
MultiPhrasingStrategy() |
nothing | Template-based variants (keyword-only + WH-question form) |
EmbeddingExpansionStrategy(tokenizer, embed_weights, top_k=10) |
embedding table | Nearest-neighbor lookup in the model's own vocabulary |
GenerativeReformulationStrategy(model, tokenizer, max_new_tokens=40) |
local LLM | Small model rephrases the query (e.g. Qwen2.5-3B) |
LLMAgentReformulationStrategy(api_key, model=None) |
external API | Calls DeepSeek (or any OpenAI-compatible API) for vocabulary-bridged reformulations |
Cost order (lowest → highest): keyword < multiphrasing < embedding < generative < agent.
from tardigrade_hooks.rls import rrf_fuse_handles
fused = rrf_fuse_handles(handle_lists, k=60)Fuses multiple MemoryCellHandle lists via Reciprocal Rank Fusion, deduplicating by cell_id.
| Constant | Value | Meaning |
|---|---|---|
RLS_DEFAULT_CONFIDENCE_THRESHOLD |
1.10 |
Min score ratio before RLS re-retrieves |
RLS_DEFAULT_MAX_ATTEMPTS |
2 |
Max reformulation iterations |
RLS_MODE_NONE |
"none" |
No reformulation |
RLS_MODE_KEYWORD |
"keyword" |
KeywordExpansionStrategy |
RLS_MODE_MULTIPHRASING |
"multiphrasing" |
MultiPhrasingStrategy |
RLS_MODE_EMBEDDING |
"embedding" |
EmbeddingExpansionStrategy |
RLS_MODE_GENERATIVE |
"generative" |
GenerativeReformulationStrategy |
RLS_MODE_AGENT |
"agent" |
LLMAgentReformulationStrategy |
Stage-2 re-ranking over text-bearing candidates using a cross-encoder model.
from tardigrade_hooks.reranker import CrossEncoderReranker
reranker = CrossEncoderReranker(
model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", # 22M params
)
reranked = reranker.rerank(
query_text="What does Zara do for work?",
candidates=handles,
get_text=lambda h: engine.pack_text(h.cell_id),
)Requires candidates to have associated text (stored via mem_write_pack(..., text=...) or set_pack_text()). ~30% latency overhead vs retrieval alone (~86ms vs ~67ms p95 on Qwen3-0.6B).
Token-bounded chunker with configurable overlap.
from tardigrade_hooks.chunker import TextChunker
chunker = TextChunker(
tokenizer,
max_tokens=512, # DEFAULT_CHUNK_TOKENS
overlap_tokens=64, # CHUNK_OVERLAP_TOKENS
min_tokens=32, # MIN_CHUNK_TOKENS
)
chunks = chunker.chunk(text) # list of Chunk(text, token_count, start_char, end_char)Ingests a text document as sequential KV memory packs. Consecutive chunks are linked via Supports edges.
from tardigrade_hooks.file_ingestor import FileIngestor
ingestor = FileIngestor(
engine,
tokenizer=tokenizer,
owner=1,
chunker=chunker, # optional; uses TextChunker(512) by default
salience=70.0, # DEFAULT_FILE_INGEST_SALIENCE
kv_capture_fn=fn, # (chunk_text, tokenizer) -> (key, layer_payloads)
)
result = ingestor.ingest(text, document_id="readme")
# IngestResult(pack_ids=[1, 2, 3], chunk_count=3, edge_count=2, document_id="readme")
result = ingestor.ingest_file("/path/to/doc.txt")@dataclass
class IngestResult:
pack_ids: list[int]
chunk_count: int
edge_count: int
document_id: str | NoneThe parent-document pattern: the canonical pack stores the KV tensor; views are additional retrieval surfaces on the same fact, stored as linked packs.
from tardigrade_hooks.view_generator import ViewGenerator
gen = ViewGenerator(
# All keyword-only (constructor uses `*` after `self`)
model=None, # required for mode="llm"
tokenizer=None, # required for mode="llm"
framings=("summary", "question", "paraphrase"), # DEFAULT_VIEW_FRAMINGS
mode="rule", # "rule" (no model) or "llm" (HyPE-style LLM questions)
)
views = gen.generate("Tomoko Nishida teaches swimming at the Pilsen aquatic center")
# Returns a list of view strings — one per framing — generated by the
# rule-based strategies (summary / question / paraphrase). Output shape and
# wording depend on the input text and the active framing set.Available framing names: "summary", "question", "paraphrase", "llm_question".
Tier-gated, idempotent multi-view attachment. Only consolidates packs at or above the configured minimum tier.
from tardigrade_hooks.consolidator import MemoryConsolidator
consolidator = MemoryConsolidator(
engine,
owner=1,
view_generator=gen,
min_tier=1, # CONSOLIDATION_MIN_TIER: Validated tier
)
n = consolidator.consolidate(pack_id) # int: views attached to this pack
all_n = consolidator.consolidate_all(owner=1) # dict[int, int]Background Active Object daemon that runs consolidation sweeps automatically.
from tardigrade_hooks.consolidation_sweep import ConsolidationSweepThread
sweep = ConsolidationSweepThread(consolidator, interval_seconds=60)
sweep.start()
# ...
sweep.stop()
print(f"Total views attached: {sweep.views_attached}")| Method | Description |
|---|---|
engine.add_view_keys(pack_id, keys) |
Attach additional retrieval keys to an existing canonical pack |
engine.view_count(pack_id) |
Number of views currently attached to a pack |
The low-level Rust engine exposed via PyO3.
import tardigrade_db
engine = tardigrade_db.Engine("/path/to/storage")| Method | Description |
|---|---|
mem_write(owner, layer, key, value, salience, parent_cell_id) |
Write a single cell |
mem_read(query_key, k, owner) |
Read top-k cells |
mem_write_pack(owner, retrieval_key, layer_payloads, salience, text=None) |
Write a multi-layer KV pack with optional fact text |
mem_read_pack(query_key, k, owner) |
Read top-k packs |
mem_read_pack_with_trace_boost(query_key, k, owner, boost_factor) |
Read with trace-boosted scoring |
mem_read_tokens(tokens, k, owner) |
Direct token-level retrieval. tokens: np.ndarray of shape (n_tokens, d_model) float32. k: top-k results. owner: optional owner filter. Returns the same ReadResult as mem_read_pack. Skips the Python encode/parse round-trip used by mem_read_pack. |
| Method | Description |
|---|---|
load_pack_by_id(pack_id) |
Load a pack directly by ID |
add_pack_link(pack_id_1, pack_id_2) |
Create durable trace link between packs |
add_pack_edge(pack_id_1, pack_id_2, edge_type) |
Create a typed edge (use constants: EDGE_SUPPORTS, EDGE_CONTRADICTS, etc.) |
pack_supports(pack_id) / pack_contradicts(pack_id) |
Query semantic edges |
pack_links(pack_id) |
All packs linked to a given pack |
pack_count() |
Total number of packs stored |
pack_importance(pack_id) |
Current importance score |
pack_text(pack_id) |
Get stored fact text (None if not stored) |
set_pack_text(pack_id, text) |
Set or update fact text |
delete_pack(pack_id) |
Permanently delete a pack |
add_view_keys(pack_id, keys) |
Attach additional retrieval keys (multi-view v2) |
view_count(pack_id) |
Views attached to a pack |
| Method | Description |
|---|---|
cell_importance(cell_id) |
Current importance score |
cell_tier(cell_id) |
Current tier (Draft/Validated/Core) |
advance_days(days) |
Simulate time passage for decay |
evict_draft_packs(owner) |
Remove all Draft-tier packs for an owner |
| Method | Description |
|---|---|
cell_count() |
Total cells in engine |
trace_ancestors(cell_id) |
Get causal parent chain |
has_vamana() |
Whether ANN index is active |
status() |
Engine health + metrics dict |
compact() |
Trigger segment compaction — a mark-sweep GC that walks the segment files, drops cells that have been deleted, and rewrites the live segments to reclaim disk space. Safe to call at any time; runs incrementally and won't block reads. |
refresh() |
Reload WAL + rebuild derived state |
set_refinement_mode(mode, **kwargs) |
Configure query-side refinement. mode is "none" (raw retrieval), "centered" (subtract corpus mean from query/keys before scoring), or "prf" (Rocchio-style pseudo-relevance feedback in K-space). See docs/experiments/vague_queries/results.md. |
from tardigrade_hooks.constants import (
EDGE_CAUSED_BY, # 0
EDGE_FOLLOWS, # 1
EDGE_CONTRADICTS, # 2
EDGE_SUPPORTS, # 3
)