lakehouse-insight

AI chatbot for the data lakehouse. Connects to Marina via OAuth, answers questions with inline tables and Plotly charts.

The package is deliberately self-contained: it depends only on environment variables and a small set of third-party libraries. It can be embedded in a host application as a FastAPI sub-app or run standalone via the included main.py.

Modes

Standalone

docker build -t lakehouse-insight .
docker run -p 8000:8000 \
  -e MARINA_URL=https://marina.example.com \
  -e LLM_BASE_URL=https://mindrouter.example.com/v1 \
  -e LLM_API_KEY=... \
  -e LLM_MODEL=gpt-4o-mini \
  -v /local/state:/var/lib/insight \
  lakehouse-insight

Open http://localhost:8000/ and configure the Marina client_id + private key on first run.

Embedded (sub-app)

import insight
app.mount("/insight", insight.build_app())

The host app gets all routes under /insight/*. Templates and static assets resolve relative to the package directory.

Configuration

Variable	Required	Default	Purpose
`MARINA_URL`	yes	--	Base URL of the Marina deployment
`LLM_BASE_URL`	yes	--	OpenAI-compatible LLM endpoint
`LLM_API_KEY`	yes	--	API key for the LLM endpoint
`LLM_MODEL`	yes	--	Model identifier passed to the LLM
`INSIGHT_CREDENTIALS_FILE`	no	`/var/lib/insight/insight_credentials.json`	Where client_id + private key are stored, mode 0600
`AUTH_TOKEN_AUDIENCE`	no	`marina`	Expected `aud` claim on assertion JWTs
`PORT`	no	`8000`	Listen port (standalone only)

See .env.example for a starting template.

Auth flow

Insight is a Marina client. On startup the user pastes their client_id and a private key generated by Shipyard. Each Marina request mints a short-lived bearer via POST /auth/token (RFC 7523 client_assertion), caches it in process until near expiry, and reuses it for /query, /files, and any future SQL gateway calls.

Routes

GET / -- chat UI
GET /llm-status -- probe LLM connectivity
GET /tables -- list views the configured client can read
POST /config -- save credentials
POST /chat -- streaming chat (SSE)
GET /preview/{file_hash} -- proxy a file from Marina for preview rendering

Layout

insight/                  # the package itself
  __init__.py             # build_app()
  auth.py                 # OAuth client_credentials + bearer cache
  credentials.py          # file-backed client_id + private key
  llm.py                  # LLM streaming agent + tool definitions
  marina_client.py        # Marina REST wrappers
  routes.py               # FastAPI routes
  templates/              # insight.html + _layout.html
main.py                   # standalone entry point
Dockerfile
requirements.txt
.env.example

Origin

Extracted from the data-lakehouse monorepo (issue #222). The package is functionally identical to the embedded version that ships with Shipyard; this repo is the source of truth going forward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lakehouse-insight

Modes

Standalone

Embedded (sub-app)

Configuration

Auth flow

Routes

Layout

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
insight		insight
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

lakehouse-insight

Modes

Standalone

Embedded (sub-app)

Configuration

Auth flow

Routes

Layout

Origin

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages