TreeIndex is a vectorless semantic indexing SDK that converts large text into searchable knowledge trees.
It is inspired by PageIndex, with a simple npm-first developer workflow and bring-your-own-LLM setup.
- Provide large source text.
- TreeIndex incrementally builds semantic nodes (topics + subtopics).
- For a query, it retrieves relevant node IDs.
- It gathers grounded source snippets from those nodes.
- It generates an answer from the retrieved context.
npm install treeindeximport { TreeIndex } from "treeindex";
const treeIndex = new TreeIndex({
baseURL: "https://api.openai.com/v1",
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-5.1",
});Use this flow when indexing a document for the first time.
// Load source text
treeIndex.loadData(largeText);
// Build tree from loaded text
const tree = await treeIndex.generateTree();
// Persist the tree in your DB/storage layer
// TreeIndex does not store trees for you.
await saveTreeToDatabase(tree);
// Load the same tree into SDK state for retrieval/answering
treeIndex.loadTree(tree);
// Generate answer
const answer = await treeIndex.generateAnswer("What are assets vs liabilities?");
console.log(answer);Use this flow when the tree already exists in your storage.
const storedTree = await fetchTreeFromDatabase(documentId);
// loadData is still needed for node text slicing in findNodes()/generateAnswer()
treeIndex.loadData(largeText);
treeIndex.loadTree(storedTree);
const answer = await treeIndex.generateAnswer("What are assets vs liabilities?");
console.log(answer);If you want full control over answer generation logic:
treeIndex.loadData(largeText);
treeIndex.loadTree(storedTree);
const relevantNodeIds = await treeIndex.retrieveRelevantNodes("What are assets vs liabilities?");
const foundNodes = treeIndex.findNodes(relevantNodeIds);
const context = foundNodes.map((n) => n.data).join("\n");
// Your own generation call (any model/provider)
const answer = await myCustomGenerator({
query: "What are assets vs liabilities?",
context,
});
console.log(answer);PageIndex-style workflows hit practical friction:
- tree generation can fail on weakly structured input
- local setup can feel heavy
- provider customization can be harder than expected
- only supports PDF input
TreeIndex focuses on a simpler developer experience:
- install with npm and start quickly
- designed to still produce a tree even when source text is poorly structured (with potential accuracy trade-offs)
- bring your own API key and model
- straightforward JavaScript and TypeScript integration
TreeIndex works with OpenAI-compatible chat endpoints by supplying the provider base URL and model.
| Provider | Typical baseURL |
|---|---|
| OpenAI | https://api.openai.com/v1 |
| Gemini | https://generativelanguage.googleapis.com/v1beta/openai/ |
| Anthropic | https://api.anthropic.com/v1 |
| Grok (xAI) | https://api.x.ai/v1 |
| Ollama | http://localhost:11434/v1 |
| OpenRouter | https://openrouter.ai/api/v1 |
type TreeIndexOptions = {
baseURL: string;
apiKey: string;
model: string;
};Creates a TreeIndex instance backed by your chosen provider/model.
Loads source text to index.
Builds or extends the semantic knowledge tree from loaded data.
type TreeNode = {
nodeId: string;
title: string;
summary: string;
stringSubset: [number, number];
nodes: TreeNode[];
};Loads an existing tree (for reuse or persisted state).
Returns node IDs that are semantically relevant to the query.
Returns matched nodes with extracted source snippets.
type FoundNode = {
nodeId: string;
title: string;
summary: string;
data: string; // extracted from loaded data using stringSubset
};Generates an answer grounded in retrieved node data.
TreeIndex intentionally avoids embedding-first infrastructure:
- more accurate retrieval on long, complex documents where embeddings may struggle to capture nuance
- no embedding pipeline setup required
- no vector database hosting cost
- semantic tree remains human-readable and inspectable
- similar chunks (in vector approach) are not always relevant
- QUALITY DEPENDS ON MODEL CAPABILITY, ONLY BEST REASONING MODELS WILL WORK WELL
- long documents may require multiple recursive indexing passes and much more time
- malformed model JSON responses can reduce retrieval quality
- provider/model feature support may vary
- tree persistence is not handled by TreeIndex; you must store and load trees in your own database/storage
- project is early stage and API surface may evolve
Issues and pull requests are welcome.
Please open an issue first so implementation details can be aligned early.
MIT