Stormvino

OpenAI-compatible LLM server for Intel Arc GPUs. Runs local inference via OpenVINO. Speaks the OpenAI API — drop it behind any client that accepts a base_url. No NVIDIA required.

Hardware compatibility

GPU	VRAM	Status	Notes
Arc B60	24 GB	✅ Production	EnvyStorm reference machine
Arc B50	16 GB	🔜 Testing	TinyB — install in progress
Arc B65	TBD	🔜 Planned	Next after B50 confirmed
Arc B70	TBD	🔜 Planned
Other Arc	any	⚙️ Auto-tuned	VRAM detected at runtime

OS: Linux Mint 22.x / Ubuntu 24.04 (Noble). Kernel: linux-oem-24.04 required for Battlemage (B-series) GPUs. System RAM: 16 GB minimum. Disk: 50 GB+ for a useful model set.

Install paths — pick one

🤖 Claude Code (recommended for single machine)

Fully automated. CC asks 3 questions, handles everything including the mandatory kernel reboot. You watch.

Step 1 — Install Claude Code if you haven't:

npm install -g @anthropic-ai/claude-code

Step 2 — Clone the repo and start CC in it:

git clone https://github.com/Jermalk/stormvino.git /opt/ov_server
cd /opt/ov_server
claude

Step 3 — In the CC chat, type exactly:

Run the Stormvino installation runbook. @CC_INSTALL.md

The @CC_INSTALL.md mention loads the runbook directly — no file dragging needed. CC reads it and takes over. Answer the 3 questions it asks, then watch.

→ See CC_INSTALL.md for what CC does at each phase.

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

One command installs on any number of Arc machines simultaneously. Detects GPU VRAM at runtime and tunes config automatically. Fully headless — handles reboots without human intervention.

git clone https://github.com/Jermalk/stormvino.git
cd stormvino
# edit vars/main.yml (3 lines) — then:
ansible-playbook -i hosts.yml stormvino.yml

→ See ANSIBLE.md for the full plan and current implementation status.

📖 Manual (full control, learn every step)

Step-by-step guide with a verification test between every phase. Covers kernel, drivers, Python env, PostgreSQL, models, and systemd services.

git clone https://github.com/Jermalk/stormvino.git
cd stormvino
./install.sh    # detects hardware, routes to the right path

→ See INSTALL.md.

What you get

Endpoint	Description
`POST /v1/chat/completions`	OpenAI-compatible chat, streaming supported
`POST /v1/embeddings`	Sentence embeddings (multilingual-e5-large)
`GET /v1/models`	List discovered models
`POST /v1/images/generations`	Image generation (SDXL, optional)
`POST /v1/audio/transcriptions`	Speech-to-text (Whisper, optional)
`POST /v1/audio/speech`	Text-to-speech (Kokoro / Piper, optional)
`GET /health`	Server health + loaded models + VRAM stats
`GET /monitor`	Web dashboard — live VRAM, throughput, request log

Default port: 11435. Accessible over LAN.

Tested models (B60 / 24 GB VRAM)

Model	VRAM	Role
`qwen3-14b-int4-ov`	9.1 GB	Default — reasoning, coding, chat
`qwen3-8b-int4-ov`	4.6 GB	Agent turns, fast responses
`multilingual-e5-large-int8`	563 MB	Embeddings + task routing
`whisper-large-v3-int8-ov`	~2 GB	Speech-to-text
`qwen2.5-vl-7b-int4-ov`	~5 GB	Vision — image understanding

→ See MODELS.md for conversion instructions and VRAM budget tables.

Quick health check

curl -s http://localhost:11435/health | python3 -m json.tool

curl -s http://localhost:11435/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3-8b-int4-ov","messages":[{"role":"user","content":"Hello"}]}'

Libraries stack

Inference (server runtime)

Library	Version
openvino	2026.1.0
openvino-genai	2026.1.0.0
openvino-tokenizers	2026.1.0.0
optimum-intel	1.27.0
optimum	2.1.0
transformers	4.57.6
tokenizers	0.22.2

Model conversion (offline, via optimum-cli)

Library	Version
nncf	3.1.0
onnx	1.21.0
onnxruntime	1.25.0
safetensors	0.7.0
huggingface_hub	0.36.2

Configuration

Runtime settings live in config.json. Key settings auto-patched by the installers based on detected GPU VRAM:

Key	Description
`device`	OpenVINO device — auto-detected (e.g. `GPU.1`)
`kv_cache_size_gb`	KV cache per model — tuned to VRAM tier
`max_loaded_models`	Models held in VRAM simultaneously
`default_model`	Model used when client doesn't specify
`embedding_model`	Embedding model directory name
`postgres_dsn`	Observability database connection string

Full reference: INSTALL.md § Phase 7.

Architecture

Layer	Component
HTTP	FastAPI + Uvicorn, single worker
LLM inference	`openvino_genai.LLMPipeline`, executor-offloaded
VLM inference	`openvino_genai.VLMPipeline`
Embeddings	`OVModelForFeatureExtraction` (optimum-intel)
Task routing	Embedding similarity + signal detection
STT	`openvino_genai.WhisperPipeline`
TTS	Kokoro-ONNX (EN) + Piper (PL)
Observability	PostgreSQL 16 + pgvector
Monitor UI	Svelte + uPlot

Hardware reports welcome

Tested Stormvino on a GPU not in the compatibility table? Open a hardware report issue — GPU model, VRAM, kernel version, tokens/sec. Builds the matrix for everyone.

Origin

Stormvino grew out of Shangri-Lab — a personal lab built by an IT architect from Silesia who had no Python background, a pair of Intel Arc GPUs, and a firm belief that local inference shouldn't require Nvidia hardware or magic frameworks.

The philosophy is unchanged: build the simplest thing that gives full visibility first, tune quality only after you can observe it.

Built with Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 302 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
autotest		autotest
dev		dev
infergate		infergate
monitor		monitor
plugins		plugins
tests		tests
.gitignore		.gitignore
ANSIBLE.md		ANSIBLE.md
CC_INSTALL.md		CC_INSTALL.md
CLAUDE.md		CLAUDE.md
INSTALL.md		INSTALL.md
MODELS.md		MODELS.md
Makefile		Makefile
README.md		README.md
admin_routes.py		admin_routes.py
app_state.py		app_state.py
catalogue.py		catalogue.py
chat_handler.py		chat_handler.py
config.json		config.json
db.py		db.py
gpu_monitor.py		gpu_monitor.py
image_pipeline.py		image_pipeline.py
install.sh		install.sh
media_routes.py		media_routes.py
model_manager.py		model_manager.py
monitor_sidecar.py		monitor_sidecar.py
news_routes.py		news_routes.py
news_scraper.py		news_scraper.py
ov_server.py		ov_server.py
plugin_runner.py		plugin_runner.py
prompt_builder.py		prompt_builder.py
pytest.ini		pytest.ini
requirements-server.txt		requirements-server.txt
requirements-system-snapshot.txt		requirements-system-snapshot.txt
router.py		router.py
server_config.py		server_config.py
stt_pipeline.py		stt_pipeline.py
tts_pipeline.py		tts_pipeline.py
voice_client.py		voice_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stormvino

Hardware compatibility

Install paths — pick one

🤖 Claude Code (recommended for single machine)

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

📖 Manual (full control, learn every step)

What you get

Tested models (B60 / 24 GB VRAM)

Quick health check

Libraries stack

Configuration

Architecture

Hardware reports welcome

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stormvino

Hardware compatibility

Install paths — pick one

🤖 Claude Code (recommended for single machine)

⚙️ Ansible (recommended for multiple machines / repeatable deploys)

📖 Manual (full control, learn every step)

What you get

Tested models (B60 / 24 GB VRAM)

Quick health check

Libraries stack

Configuration

Architecture

Hardware reports welcome

Origin

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages