featcat

AI-powered Feature Catalog with CLI, TUI, REST API, and Web UI

featcat is a lightweight Feature Catalog for data teams. It is not a Feature Store (no online serving) — it's a metadata management tool with an AI layer for searching, documenting, monitoring, and tracing the lineage of features sitting in Parquet files on disk, S3, or MinIO.

The Problem

Features scattered everywhere: Parquet across local disks, S3, and MinIO — nobody knows what exists
Missing documentation: Columns have no descriptions; new team members don't know what avg_session_duration means
Hard to find the right features: Starting a new project with no idea which features are already available
No lineage: Derived features lose track of where their inputs came from
Undetected drift: Feature distributions change silently until model performance degrades

Key Features

Module	Description
Catalog	Register data sources, scan Parquet to auto-extract schema + stats; SQLite or PostgreSQL backend
Agentic Chat	Tool-calling AI agent with intent classifier, conversation memory across turns, and Vietnamese/English support
Discovery	Describe a use case → AI recommends relevant features and suggests new ones
Auto-doc	LLM-generated documentation for each feature, with batch generation jobs
FTS5 Search	SQLite full-text search with BM25 ranking and Vietnamese diacritic folding
Lineage	Track parent/child relationships between features; auto-detect from SQL definitions
Similarity	TF-IDF + embedding-backed feature similarity matrix and graph; duplicate detection
Monitoring	PSI / KL-divergence / Wasserstein drift metrics, null spikes, range violations, scheduled checks
Web UI	React SPA with a sticky top-bar global search (autocomplete + keyboard nav on every route): dashboard, feature browser, chat, lineage graph, similarity matrix, audit log
TUI	Terminal UI with dashboard, feature browser, AI chat
REST API	FastAPI server; every CLI/TUI/Web operation goes through the same endpoints
S3 / MinIO	Read Parquet directly from S3 — metadata only, never copies data locally
Scheduler	APScheduler-driven jobs for refresh, monitoring, doc generation

Four Interfaces, One Backend

Interface	Use case
`featcat <cmd>`	Scripted ops, CI, terminal users
`featcat ui`	Full-screen TUI for quick browsing
`featcat serve`	FastAPI server at `:8000`, JSON REST + SSE chat
Web UI	React SPA bundled into the server, accessible at `:8000` once `serve` is running

All four call into the same CatalogBackend abstraction (featcat/catalog/backend.py), so local-vs-remote (FEATCAT_SERVER_URL) is a one-env-var switch.

Screenshots

_{Features browser — filter by source, tags, doc status; FTS5 search with Vietnamese diacritic folding}	_{Feature detail — schema, stats, AI-generated docs, lineage, recent usage}
_{Agentic chat — native tool-calling AI with conversation memory and bilingual (EN/VI) responses}	_{Feature groups — bundle related features for projects and downstream use cases}
_{Similarity matrix — TF-IDF + embedding-backed; spot duplicates at a glance}	_{Lineage graph — trace derivation chains parsed from SQL definitions}
_{Audit log — actionable issues across the catalog (missing docs, drift, broken lineage)}

Quick Start

# 1. Clone and install (no venv activation needed — uv handles it)
git clone https://github.com/codepawl/featcat.git && cd featcat
make install

# 2. Initialize catalog
uv run featcat init

# 3. Register and scan a data source
uv run featcat source add device_perf /data/features/device_performance.parquet
uv run featcat source scan device_perf

# 4. Browse features (CLI)
uv run featcat feature list
uv run featcat feature info device_perf.cpu_usage

# 5. (Optional) Enable AI — requires llama.cpp running at :8080
#    The repo ships a docker-compose for a Gemma GGUF backend; see deploy/.
docker compose -f deploy/docker-compose.yml up -d llama

uv run featcat discover "customer churn prediction"
uv run featcat ask "features related to user engagement"

# 6. Start the server (REST API + Web UI at http://localhost:8000)
uv run featcat serve

The bundled ./dev.sh script does the full local stack (LLM container + backend + Vite dev server) in one go.

TUI

featcat ui

Keybindings: D Dashboard · F Features · M Monitor · C Chat · Q Quit · ? Help

System Health Check

featcat doctor

[x] Python 3.10+
[x] SQLite catalog exists (catalog.db)
[x] llama.cpp running at localhost:8080
[x] Model gemma-4-E2B-it-Q4_K_M loaded
[x] 14 features registered
[x] 10 features have docs (71.4%)
[ ] 2 features have drift warnings

Tech Stack

Backend: Python 3.10+ · FastAPI · SQLAlchemy (SQLite default, PostgreSQL supported) · APScheduler · Pydantic
AI: llama.cpp via OpenAI-compatible HTTP · native tool calling · response caching · FTS5 search
Web: React 19 · TypeScript · Vite · Tailwind CSS · TanStack tooling
Data: PyArrow · s3fs (S3/MinIO) · pgvector (optional, for embedding-backed similarity)
CLI/TUI: Typer · Rich · Textual

Project Structure

featcat/
├── catalog/        # Models, backends (Local/Remote), scanner, similarity, search
├── ai/             # Agentic chat, tool executor, intent classifier, session memory
├── llm/            # LLM abstraction (llama.cpp + cached wrapper)
├── plugins/        # Discovery, Autodoc, Monitoring, NL Query
├── server/         # FastAPI app, routes, scheduler, static assets
├── lineage/        # SQL-based lineage detection
├── db/             # SQLAlchemy engine + ORM models (SQLite + PostgreSQL)
├── tasks/          # Celery jobs (optional async batch work)
├── tui/            # Textual TUI (screens, widgets)
├── utils/          # Prompts, language detection, statistics, cache
├── config.py       # Pydantic settings (env > project > user > defaults)
└── cli.py          # Typer CLI entry point

web/                # React SPA — Vite build outputs to featcat/server/static/
deploy/             # Docker compose for llama.cpp + featcat
audits/             # Internal design + verification docs

Testing

Backend: make test (pytest, 730+ tests covering catalog, server, AI agent, plugins, lineage, S3 storage)
Type-check: make type-check (mypy, all 100+ source files)
Lint: make lint (ruff)
Web UI E2E: cd web && bun run test:e2e — Playwright suite across 12 user journeys against an isolated backend with all AI endpoints mocked. See web/tests/e2e/README.md.
Pre-commit gate: make check runs lint + type-check + test in sequence.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
.agents/docs/internal		.agents/docs/internal
.github		.github
artifacts		artifacts
assets		assets
audits		audits
deploy		deploy
docs		docs
featcat		featcat
packages/client		packages/client
scripts		scripts
slides/screenshots/featcat-demo-2026-05-13		slides/screenshots/featcat-demo-2026-05-13
tests		tests
web		web
.claudeignore		.claudeignore
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
dev.sh		dev.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

featcat

The Problem

Key Features

Four Interfaces, One Backend

Screenshots

Quick Start

TUI

System Health Check

Tech Stack

Project Structure

Testing

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

featcat

The Problem

Key Features

Four Interfaces, One Backend

Screenshots

Quick Start

TUI

System Health Check

Tech Stack

Project Structure

Testing

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages