Skip to content

Latest commit

 

History

History
421 lines (304 loc) · 17.2 KB

File metadata and controls

421 lines (304 loc) · 17.2 KB

Python API Reference

This is the reference for the Python surface — every class, every method, every parameter. If you arrived here while writing consumer code, you're in the right place. If you're trying to start using TardigradeDB, the Quickstart is friendlier; this page assumes you already know what you're looking for and want the signature.

The surface has two main classes. TardigradeClient is the high-level facade you'll use for most things — it bundles the engine, file ingestion, multi-view consolidation, and the query path behind one object so you don't have to wire those pieces together yourself. KnowledgePackStore is the lower-level consumer for HuggingFace direct injection — use it when you want zero-token KV cache injection into model.generate() rather than the convenience of the facade.

TardigradeClient

The high-level entry point. Bundles Engine, FileIngestor, MemoryConsolidator, and the query path behind a unified API — so you can write client.store(...) and client.query(...) without instantiating the engine or the ingestion machinery yourself.

Constructor

TardigradeClient(
    db_path,
    *,
    tokenizer=None,
    owner=1,
    kv_capture_fn=None,
    vamana_threshold=9999,
)
  • db_pathstr | Path. Directory for persistent storage; the engine is created internally and lives here for the client's lifetime.
  • tokenizer — a tokenizer with .encode() / .decode() methods. Required for real KV capture; omit it only for the random-stub testing path below.
  • owner — owner id for memory isolation across agents or tenants (default: 1). One client always operates under one owner; create separate clients for separate owners.
  • kv_capture_fn(chunk_text, tokenizer) -> (key, layer_payloads). The function the client calls to turn a chunk of text into a retrieval key (the vector used for similarity scoring) plus the per-layer KV tensors that get persisted as the pack. If you pass None, the client falls back to a random-vector stub — fine for smoke-testing the API shape but produces near-random retrieval. For real use, supply a function that drives a forward pass on your model; see knowledge-pack-store.md for the canonical HuggingFace bridge.
  • vamana_threshold — pack count at which the engine starts using its Vamana ANN graph instead of brute-force search. The default of 9999 keeps brute-force on for small workloads where it's faster anyway.

Methods

Method Returns Description
store(fact_text, *, salience=80.0) int (pack_id) Store a single fact as a KV pack
query(query_text, *, k=5) list[dict] Retrieve top-k packs
ingest_text(text, *, document_id=None, chunk_size=512) IngestResult Chunk and ingest a text document
ingest_file(path, *, document_id=None, chunk_size=512) IngestResult Read a file and ingest it
consolidate(pack_id) int Attach multi-view retrieval keys to one pack; returns views attached
consolidate_all() dict[int, int] Consolidate all eligible packs; returns {pack_id: views_attached}
list_packs() list[dict] All packs for this owner
pack_count() int Number of packs for this owner
engine Engine Direct access to the underlying tardigrade_db.Engine

KnowledgePackStore

Full end-to-end injection via HuggingFace models. Use when you need KV cache injection directly into model.generate().

Constructor

KnowledgePackStore(engine, model, tokenizer, owner=1, query_layer=None)
  • engine — pre-built tardigrade_db.Engine instance.
  • model — HuggingFace causal LM (AutoModelForCausalLM or compatible).
  • tokenizer — the matching tokenizer, including its chat template — the store path wraps text in tokenizer.apply_chat_template(...) before the forward pass.
  • owner — owner id for memory isolation (default: 1).
  • query_layer — which transformer layer's hidden states to read for the retrieval key. Defaults to roughly two-thirds of the way through the model (int(num_hidden_layers × 0.67)), a heuristic that works well for most uniform-softmax models because the middle-to-late layers carry the most semantic signal — earlier layers are too lexical, the final layers are too output-shaped. For hybrid-attention models the heuristic doesn't apply; use CalibrationRegistry instead.

Storage Methods

store(fact_text, salience=80.0)

Store a fact as a KV cache pack. Returns the assigned pack_id.

pack_id = kps.store("User prefers morning meetings before 10am")

store_and_link(fact_text, related_pack_id, salience=80.0)

Store a fact and link it to an existing memory.

existing = kps.store("Went to bookstore in Pilsen")
kps.store_and_link("The bookstore is called Casa Azul", existing)

store_linked(facts, salience=80.0)

Store multiple related facts and link them all to each other.

kps.store_linked([
    "Lucia's instructor is Tomoko",
    "Tomoko drives a Honda Civic",
])

forget(pack_id)

Delete a memory permanently.

Retrieval + Injection Methods

generate(query_text, **gen_kwargs)

Retrieve the best matching memory, inject its KV cache, generate a response.

Returns (generated_text, prompt_tokens, had_memory).

generate_with_trace(query_text, k=1, composer=None, boost_factor=0.3, **gen_kwargs)

Retrieve with trace-boosted scoring, follow trace links, compose multiple packs, inject, generate.

retrieve_and_inject(query_text)

Lower-level: retrieve and build DynamicCache without generating.

Returns (cache, query_ids, attention_mask) or (None, query_ids, None).

generate_multi(query_text, k=3, composer=None, **gen_kwargs)

Retrieve k packs and compose them. No trace link following.


Reflective Latent Search (RLS)

⚠️ Validation status. The 2026-05-14 bench audit found that every RLS mode (keyword / multiphrasing / embedding / generative / agent) underperforms the no-RLS baseline on clean LoCoMo; the DeepSeek agent reformulator loses 12.7pp. The API is documented below for completeness, but RLS is not the recommended retrieval path today. See docs/guide/concepts.md § Reflective Latent Search and the bench audit before reaching for it.

Agentic retrieval loop that reformulates queries when the initial retrieval is not confident.

ReflectiveLatentSearch

ReflectiveLatentSearch(
    engine,
    model,
    tokenizer,
    query_layer,
    hidden_size,
    owner=1,
    k=5,
    strategies=None,         # default: [KeywordExpansionStrategy()]
    confidence_threshold=1.10,
    max_attempts=2,
)

The loop: retrieve, evaluate confidence, and if confidence is low, reformulate the query, re-retrieve for each variant, and fuse the rankings via Reciprocal Rank Fusion (RRF).

Confidence is computed as the score of the top-ranked match divided by the score of the second-ranked match — score[0] / score[1]. If that ratio is at or above confidence_threshold (default 1.10), the top result is clearly stronger than the alternatives and RLS returns immediately. If it's below, the retrieval is ambiguous and RLS iterates through the configured strategies, fusing the resulting ranked lists with RRF.

See concepts.md before reaching for this — RLS is documented but does not currently improve over the no-RLS baseline on clean benchmark data.

query(question, top_k=None) → list[MemoryCellHandle]

Run the full RLS loop. Returns a list of MemoryCellHandle — each handle is a lightweight reference to a stored memory (analogous to a pack id, but the type used through the RLS API specifically; see the MemoryCellHandle entry under tardigrade_hooks below).

from tardigrade_hooks.rls import ReflectiveLatentSearch, KeywordExpansionStrategy

rls = ReflectiveLatentSearch(
    engine, model, tokenizer,
    query_layer=16, hidden_size=1024,
    strategies=[KeywordExpansionStrategy()],
)
handles = rls.query("What outdoor activities does this person enjoy?")

Reformulation Strategies

All strategies implement ReformulationStrategy.reformulate(query_text) -> list[str].

Class Requires Description
KeywordExpansionStrategy() nothing Extract content words, expand via synonym table
MultiPhrasingStrategy() nothing Template-based variants (keyword-only + WH-question form)
EmbeddingExpansionStrategy(tokenizer, embed_weights, top_k=10) embedding table Nearest-neighbor lookup in the model's own vocabulary
GenerativeReformulationStrategy(model, tokenizer, max_new_tokens=40) local LLM Small model rephrases the query (e.g. Qwen2.5-3B)
LLMAgentReformulationStrategy(api_key, model=None) external API Calls DeepSeek (or any OpenAI-compatible API) for vocabulary-bridged reformulations

Cost order (lowest → highest): keyword < multiphrasing < embedding < generative < agent.

RRF Fusion

from tardigrade_hooks.rls import rrf_fuse_handles

fused = rrf_fuse_handles(handle_lists, k=60)

Fuses multiple MemoryCellHandle lists via Reciprocal Rank Fusion, deduplicating by cell_id.

Constants (tardigrade_hooks.constants)

Constant Value Meaning
RLS_DEFAULT_CONFIDENCE_THRESHOLD 1.10 Min score ratio before RLS re-retrieves
RLS_DEFAULT_MAX_ATTEMPTS 2 Max reformulation iterations
RLS_MODE_NONE "none" No reformulation
RLS_MODE_KEYWORD "keyword" KeywordExpansionStrategy
RLS_MODE_MULTIPHRASING "multiphrasing" MultiPhrasingStrategy
RLS_MODE_EMBEDDING "embedding" EmbeddingExpansionStrategy
RLS_MODE_GENERATIVE "generative" GenerativeReformulationStrategy
RLS_MODE_AGENT "agent" LLMAgentReformulationStrategy

CrossEncoderReranker

Stage-2 re-ranking over text-bearing candidates using a cross-encoder model.

from tardigrade_hooks.reranker import CrossEncoderReranker

reranker = CrossEncoderReranker(
    model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",  # 22M params
)

reranked = reranker.rerank(
    query_text="What does Zara do for work?",
    candidates=handles,
    get_text=lambda h: engine.pack_text(h.cell_id),
)

Requires candidates to have associated text (stored via mem_write_pack(..., text=...) or set_pack_text()). ~30% latency overhead vs retrieval alone (~86ms vs ~67ms p95 on Qwen3-0.6B).


File Ingestion

TextChunker

Token-bounded chunker with configurable overlap.

from tardigrade_hooks.chunker import TextChunker

chunker = TextChunker(
    tokenizer,
    max_tokens=512,    # DEFAULT_CHUNK_TOKENS
    overlap_tokens=64, # CHUNK_OVERLAP_TOKENS
    min_tokens=32,     # MIN_CHUNK_TOKENS
)

chunks = chunker.chunk(text)  # list of Chunk(text, token_count, start_char, end_char)

FileIngestor

Ingests a text document as sequential KV memory packs. Consecutive chunks are linked via Supports edges.

from tardigrade_hooks.file_ingestor import FileIngestor

ingestor = FileIngestor(
    engine,
    tokenizer=tokenizer,
    owner=1,
    chunker=chunker,       # optional; uses TextChunker(512) by default
    salience=70.0,         # DEFAULT_FILE_INGEST_SALIENCE
    kv_capture_fn=fn,      # (chunk_text, tokenizer) -> (key, layer_payloads)
)

result = ingestor.ingest(text, document_id="readme")
# IngestResult(pack_ids=[1, 2, 3], chunk_count=3, edge_count=2, document_id="readme")

result = ingestor.ingest_file("/path/to/doc.txt")

IngestResult

@dataclass
class IngestResult:
    pack_ids: list[int]
    chunk_count: int
    edge_count: int
    document_id: str | None

Multi-view Consolidation v2

The parent-document pattern: the canonical pack stores the KV tensor; views are additional retrieval surfaces on the same fact, stored as linked packs.

ViewGenerator

from tardigrade_hooks.view_generator import ViewGenerator

gen = ViewGenerator(
    # All keyword-only (constructor uses `*` after `self`)
    model=None,                    # required for mode="llm"
    tokenizer=None,                # required for mode="llm"
    framings=("summary", "question", "paraphrase"),  # DEFAULT_VIEW_FRAMINGS
    mode="rule",                   # "rule" (no model) or "llm" (HyPE-style LLM questions)
)

views = gen.generate("Tomoko Nishida teaches swimming at the Pilsen aquatic center")
# Returns a list of view strings — one per framing — generated by the
# rule-based strategies (summary / question / paraphrase). Output shape and
# wording depend on the input text and the active framing set.

Available framing names: "summary", "question", "paraphrase", "llm_question".

MemoryConsolidator

Tier-gated, idempotent multi-view attachment. Only consolidates packs at or above the configured minimum tier.

from tardigrade_hooks.consolidator import MemoryConsolidator

consolidator = MemoryConsolidator(
    engine,
    owner=1,
    view_generator=gen,
    min_tier=1,    # CONSOLIDATION_MIN_TIER: Validated tier
)

n = consolidator.consolidate(pack_id)          # int: views attached to this pack
all_n = consolidator.consolidate_all(owner=1)  # dict[int, int]

ConsolidationSweepThread

Background Active Object daemon that runs consolidation sweeps automatically.

from tardigrade_hooks.consolidation_sweep import ConsolidationSweepThread

sweep = ConsolidationSweepThread(consolidator, interval_seconds=60)
sweep.start()
# ...
sweep.stop()
print(f"Total views attached: {sweep.views_attached}")

Engine methods (multi-view)

Method Description
engine.add_view_keys(pack_id, keys) Attach additional retrieval keys to an existing canonical pack
engine.view_count(pack_id) Number of views currently attached to a pack

Engine (Rust)

The low-level Rust engine exposed via PyO3.

import tardigrade_db

engine = tardigrade_db.Engine("/path/to/storage")

Core Methods

Method Description
mem_write(owner, layer, key, value, salience, parent_cell_id) Write a single cell
mem_read(query_key, k, owner) Read top-k cells
mem_write_pack(owner, retrieval_key, layer_payloads, salience, text=None) Write a multi-layer KV pack with optional fact text
mem_read_pack(query_key, k, owner) Read top-k packs
mem_read_pack_with_trace_boost(query_key, k, owner, boost_factor) Read with trace-boosted scoring
mem_read_tokens(tokens, k, owner) Direct token-level retrieval. tokens: np.ndarray of shape (n_tokens, d_model) float32. k: top-k results. owner: optional owner filter. Returns the same ReadResult as mem_read_pack. Skips the Python encode/parse round-trip used by mem_read_pack.

Pack Management

Method Description
load_pack_by_id(pack_id) Load a pack directly by ID
add_pack_link(pack_id_1, pack_id_2) Create durable trace link between packs
add_pack_edge(pack_id_1, pack_id_2, edge_type) Create a typed edge (use constants: EDGE_SUPPORTS, EDGE_CONTRADICTS, etc.)
pack_supports(pack_id) / pack_contradicts(pack_id) Query semantic edges
pack_links(pack_id) All packs linked to a given pack
pack_count() Total number of packs stored
pack_importance(pack_id) Current importance score
pack_text(pack_id) Get stored fact text (None if not stored)
set_pack_text(pack_id, text) Set or update fact text
delete_pack(pack_id) Permanently delete a pack
add_view_keys(pack_id, keys) Attach additional retrieval keys (multi-view v2)
view_count(pack_id) Views attached to a pack

Governance

Method Description
cell_importance(cell_id) Current importance score
cell_tier(cell_id) Current tier (Draft/Validated/Core)
advance_days(days) Simulate time passage for decay
evict_draft_packs(owner) Remove all Draft-tier packs for an owner

Other

Method Description
cell_count() Total cells in engine
trace_ancestors(cell_id) Get causal parent chain
has_vamana() Whether ANN index is active
status() Engine health + metrics dict
compact() Trigger segment compaction — a mark-sweep GC that walks the segment files, drops cells that have been deleted, and rewrites the live segments to reclaim disk space. Safe to call at any time; runs incrementally and won't block reads.
refresh() Reload WAL + rebuild derived state
set_refinement_mode(mode, **kwargs) Configure query-side refinement. mode is "none" (raw retrieval), "centered" (subtract corpus mean from query/keys before scoring), or "prf" (Rocchio-style pseudo-relevance feedback in K-space). See docs/experiments/vague_queries/results.md.

Edge type constants (tardigrade_hooks.constants)

from tardigrade_hooks.constants import (
    EDGE_CAUSED_BY,   # 0
    EDGE_FOLLOWS,     # 1
    EDGE_CONTRADICTS, # 2
    EDGE_SUPPORTS,    # 3
)