Skip to content

AI4RA/lakehouse-insight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

lakehouse-insight

AI chatbot for the data lakehouse. Connects to Marina via OAuth, answers questions with inline tables and Plotly charts.

The package is deliberately self-contained: it depends only on environment variables and a small set of third-party libraries. It can be embedded in a host application as a FastAPI sub-app or run standalone via the included main.py.

Modes

Standalone

docker build -t lakehouse-insight .
docker run -p 8000:8000 \
  -e MARINA_URL=https://marina.example.com \
  -e LLM_BASE_URL=https://mindrouter.example.com/v1 \
  -e LLM_API_KEY=... \
  -e LLM_MODEL=gpt-4o-mini \
  -v /local/state:/var/lib/insight \
  lakehouse-insight

Open http://localhost:8000/ and configure the Marina client_id + private key on first run.

Embedded (sub-app)

import insight
app.mount("/insight", insight.build_app())

The host app gets all routes under /insight/*. Templates and static assets resolve relative to the package directory.

Configuration

Variable Required Default Purpose
MARINA_URL yes -- Base URL of the Marina deployment
LLM_BASE_URL yes -- OpenAI-compatible LLM endpoint
LLM_API_KEY yes -- API key for the LLM endpoint
LLM_MODEL yes -- Model identifier passed to the LLM
INSIGHT_CREDENTIALS_FILE no /var/lib/insight/insight_credentials.json Where client_id + private key are stored, mode 0600
AUTH_TOKEN_AUDIENCE no marina Expected aud claim on assertion JWTs
PORT no 8000 Listen port (standalone only)

See .env.example for a starting template.

Auth flow

Insight is a Marina client. On startup the user pastes their client_id and a private key generated by Shipyard. Each Marina request mints a short-lived bearer via POST /auth/token (RFC 7523 client_assertion), caches it in process until near expiry, and reuses it for /query, /files, and any future SQL gateway calls.

Routes

  • GET / -- chat UI
  • GET /llm-status -- probe LLM connectivity
  • GET /tables -- list views the configured client can read
  • POST /config -- save credentials
  • POST /chat -- streaming chat (SSE)
  • GET /preview/{file_hash} -- proxy a file from Marina for preview rendering

Layout

insight/                  # the package itself
  __init__.py             # build_app()
  auth.py                 # OAuth client_credentials + bearer cache
  credentials.py          # file-backed client_id + private key
  llm.py                  # LLM streaming agent + tool definitions
  marina_client.py        # Marina REST wrappers
  routes.py               # FastAPI routes
  templates/              # insight.html + _layout.html
main.py                   # standalone entry point
Dockerfile
requirements.txt
.env.example

Origin

Extracted from the data-lakehouse monorepo (issue #222). The package is functionally identical to the embedded version that ships with Shipyard; this repo is the source of truth going forward.

About

I chatbot for the data lakehouse. Connects to Marina via OAuth, answers questions with inline tables and Plotly charts.

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors