AI chatbot for the data lakehouse. Connects to Marina via OAuth, answers questions with inline tables and Plotly charts.
The package is deliberately self-contained: it depends only on environment variables and a small set of third-party libraries. It can be embedded in a host application as a FastAPI sub-app or run standalone via the included main.py.
docker build -t lakehouse-insight .
docker run -p 8000:8000 \
-e MARINA_URL=https://marina.example.com \
-e LLM_BASE_URL=https://mindrouter.example.com/v1 \
-e LLM_API_KEY=... \
-e LLM_MODEL=gpt-4o-mini \
-v /local/state:/var/lib/insight \
lakehouse-insightOpen http://localhost:8000/ and configure the Marina client_id + private key on first run.
import insight
app.mount("/insight", insight.build_app())The host app gets all routes under /insight/*. Templates and static assets resolve relative to the package directory.
| Variable | Required | Default | Purpose |
|---|---|---|---|
MARINA_URL |
yes | -- | Base URL of the Marina deployment |
LLM_BASE_URL |
yes | -- | OpenAI-compatible LLM endpoint |
LLM_API_KEY |
yes | -- | API key for the LLM endpoint |
LLM_MODEL |
yes | -- | Model identifier passed to the LLM |
INSIGHT_CREDENTIALS_FILE |
no | /var/lib/insight/insight_credentials.json |
Where client_id + private key are stored, mode 0600 |
AUTH_TOKEN_AUDIENCE |
no | marina |
Expected aud claim on assertion JWTs |
PORT |
no | 8000 |
Listen port (standalone only) |
See .env.example for a starting template.
Insight is a Marina client. On startup the user pastes their client_id and a private key generated by Shipyard. Each Marina request mints a short-lived bearer via POST /auth/token (RFC 7523 client_assertion), caches it in process until near expiry, and reuses it for /query, /files, and any future SQL gateway calls.
GET /-- chat UIGET /llm-status-- probe LLM connectivityGET /tables-- list views the configured client can readPOST /config-- save credentialsPOST /chat-- streaming chat (SSE)GET /preview/{file_hash}-- proxy a file from Marina for preview rendering
insight/ # the package itself
__init__.py # build_app()
auth.py # OAuth client_credentials + bearer cache
credentials.py # file-backed client_id + private key
llm.py # LLM streaming agent + tool definitions
marina_client.py # Marina REST wrappers
routes.py # FastAPI routes
templates/ # insight.html + _layout.html
main.py # standalone entry point
Dockerfile
requirements.txt
.env.example
Extracted from the data-lakehouse monorepo (issue #222). The package is functionally identical to the embedded version that ships with Shipyard; this repo is the source of truth going forward.