diff --git a/docs.json b/docs.json
index 41802d6..de7b0c6 100644
--- a/docs.json
+++ b/docs.json
@@ -206,6 +206,7 @@
"icon": "robot",
"pages": [
"langflow-ollama",
+ "examples/ai-agents/claude-code-byom",
"examples/ai-agents/browsesafe",
"examples/ai-agents/overnight-ralph-loop"
]
diff --git a/examples/ai-agents/claude-code-byom.mdx b/examples/ai-agents/claude-code-byom.mdx
new file mode 100644
index 0000000..45e334e
--- /dev/null
+++ b/examples/ai-agents/claude-code-byom.mdx
@@ -0,0 +1,361 @@
+---
+title: "BYOM: Bring Your Own Vast Hosted Model to Claude"
+slug: claude-code-byom-vast
+createdAt: Thu Mar 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time)
+updatedAt: Thu Mar 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time)
+---
+
+
+
+Claude Code supports Bring Your Own Model (BYOM) — you can point it at any API that speaks the [Anthropic Messages format](https://docs.anthropic.com/en/api/messages) (`/v1/messages`). [Ollama](https://ollama.com/) serves this API natively, so you can deploy an open-source model on a Vast.ai GPU instance and connect Claude Code directly to it. No proxy, no API translation layer, no Anthropic account required.
+
+This guide covers deploying two models and connecting Claude Code to them:
+
+| Model | Parameters | VRAM Used | Best For |
+|-------|-----------|-----------|----------|
+| [Qwen3-Coder-Next](https://ollama.com/library/qwen3-coder-next) | 80B MoE (3B active) | ~57 GB | State-of-the-art coding, tool calling |
+| [GPT-OSS-20B](https://ollama.com/library/gpt-oss:20b) | 20B (4-bit quantized) | ~14 GB | Lightweight, fast responses, fine-tuned for Claude Code |
+
+Qwen3-Coder-Next is a Mixture of Experts model from Alibaba — 80 billion total parameters but only 3 billion active per token, giving strong coding ability at efficient inference cost. GPT-OSS-20B is fine-tuned specifically for Claude Code's tool-calling format.
+
+## Prerequisites
+
+- A [Vast.ai](https://vast.ai/) account with credits ([console](https://cloud.vast.ai/))
+- [Vast.ai CLI](https://vast.ai/docs/cli/commands) installed
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed locally
+- `curl` and `jq` for testing the endpoint
+
+## Hardware Requirements
+
+| Model | Min GPU VRAM | Recommended GPU | Disk |
+|-------|-------------|-----------------|------|
+| Qwen3-Coder-Next | 80 GB | A100 80GB, H100 | 200 GB |
+| GPT-OSS-20B | 16 GB | RTX 3090, RTX 4090 | 100 GB |
+
+Qwen3-Coder-Next uses ~48 GB for model weights and ~8 GB for KV cache in Ollama's default Q4 quantization — totaling ~57 GB, which requires an 80 GB GPU like the A100 or H100. GPT-OSS-20B uses ~12 GB for weights and ~1 GB for KV cache. Disk space is needed for the Ollama image plus model downloads.
+
+## Step 1: Install the Vast.ai CLI
+
+Install the CLI and set your API key. You can find your API key in the [Vast.ai console](https://cloud.vast.ai/) under Account → API Key:
+
+```bash
+pip install vastai
+vastai set api-key
+```
+
+## Step 2: Choose a Model, Find a GPU, and Deploy
+
+
+
+ Search for a GPU with at least 80 GB VRAM. Look for an A100 or H100 in the results — these offer the best performance for this model:
+
+ ```bash
+ vastai search offers \
+ 'gpu_ram>=80 num_gpus=1 reliability>0.9 disk_space>=200 inet_down>200 dph<2.0' \
+ -o 'dph'
+ ```
+
+ Pick an offer ID from the first column, then create the instance:
+
+ ```bash
+ vastai create instance \
+ --image ollama/ollama:latest \
+ --env "-p 11434:11434" \
+ --disk 200 \
+ --onstart-cmd "ollama serve & sleep 5 && ollama pull qwen3-coder-next"
+ ```
+
+
+ This model fits on smaller GPUs. Search for instances with at least 16 GB VRAM:
+
+ ```bash
+ vastai search offers \
+ 'gpu_ram>=16 num_gpus=1 reliability>0.9 disk_space>=100 inet_down>200 dph<1.0' \
+ -o 'dph'
+ ```
+
+ Pick an offer ID from the first column, then create the instance:
+
+ ```bash
+ vastai create instance \
+ --image ollama/ollama:latest \
+ --env "-p 11434:11434" \
+ --disk 100 \
+ --onstart-cmd "ollama serve & sleep 5 && ollama pull gpt-oss:20b"
+ ```
+
+
+
+The command starts the Ollama server, waits for it to initialize, then downloads the model weights. Save the instance ID from the output — you'll need it in the next steps.
+
+### What the flags do
+
+| Flag | Purpose |
+|------|---------|
+| `--image ollama/ollama:latest` | Official Ollama Docker image with GPU support |
+| `-p 11434:11434` | Exposes Ollama's default port to the internet |
+| `--disk 200` | Allocates enough disk for the Docker image plus model weights |
+| `ollama serve &` | Starts the Ollama server in the background |
+| `ollama pull ` | Downloads the model weights (runs once on first boot) |
+
+## Step 3: Wait for the Model to Download
+
+Monitor the instance logs to track the download progress:
+
+```bash
+vastai logs --tail 20
+```
+
+Look for `success` in the output, which confirms the model finished downloading:
+
+```text
+pulling 30e51a7cb1cf: 100% ▏████████████████████ 51 GB
+verifying sha256 digest
+writing manifest
+success
+```
+
+## Step 4: Get Your Endpoint
+
+Retrieve the public IP and mapped port for your instance:
+
+```bash
+vastai show instance --raw | \
+ jq -r '"\(.public_ipaddr):\(.ports["11434/tcp"][0].HostPort)"'
+```
+
+This outputs your endpoint in `:` format. Save this — you'll use it to verify and connect Claude Code.
+
+## Step 5: Verify the Endpoint
+
+Before connecting Claude Code, confirm the model is running and responding correctly.
+
+### Check model availability
+
+List the models loaded in Ollama:
+
+```bash
+curl -s http://:/v1/models \
+ -H "x-api-key: ollama" | jq .
+```
+
+You should see your model listed in the response.
+
+### Test basic chat
+
+Send a simple message using the Anthropic Messages API format:
+
+```bash
+curl -s http://:/v1/messages \
+ -H "content-type: application/json" \
+ -H "anthropic-version: 2023-06-01" \
+ -H "x-api-key: ollama" \
+ -d '{
+ "model": "qwen3-coder-next",
+ "max_tokens": 256,
+ "messages": [{"role": "user", "content": "Say hello in one sentence"}]
+ }' | jq .
+```
+
+For GPT-OSS-20B, replace the model name with `gpt-oss:20b`.
+
+Expected output:
+
+```json
+{
+ "id": "msg_f2419f865f0ab7866135d9f2",
+ "type": "message",
+ "role": "assistant",
+ "model": "qwen3-coder-next",
+ "content": [
+ {
+ "type": "text",
+ "text": "Hello!"
+ }
+ ],
+ "stop_reason": "end_turn",
+ "usage": {
+ "input_tokens": 13,
+ "output_tokens": 3
+ }
+}
+```
+
+### Test tool calling
+
+Claude Code relies on tool calling to edit files, run commands, and navigate your codebase. Verify the model handles tool calls correctly:
+
+```bash
+curl -s http://:/v1/messages \
+ -H "content-type: application/json" \
+ -H "anthropic-version: 2023-06-01" \
+ -H "x-api-key: ollama" \
+ -d '{
+ "model": "qwen3-coder-next",
+ "max_tokens": 1024,
+ "tools": [
+ {
+ "name": "Write",
+ "description": "Write content to a file",
+ "input_schema": {
+ "type": "object",
+ "properties": {
+ "file_path": {"type": "string"},
+ "content": {"type": "string"}
+ },
+ "required": ["file_path", "content"]
+ }
+ }
+ ],
+ "messages": [{"role": "user", "content": "Create hello.py that prints hello world"}]
+ }' | jq .
+```
+
+A successful response includes `"stop_reason": "tool_use"` and a `tool_use` content block with the file path and content:
+
+```json
+{
+ "id": "msg_00789b0ea0df023942763847",
+ "type": "message",
+ "role": "assistant",
+ "model": "qwen3-coder-next",
+ "content": [
+ {
+ "type": "tool_use",
+ "id": "call_spd57315",
+ "name": "Write",
+ "input": {
+ "file_path": "hello.py",
+ "content": "print(\"hello world\")\n"
+ }
+ }
+ ],
+ "stop_reason": "tool_use",
+ "usage": {
+ "input_tokens": 306,
+ "output_tokens": 36
+ }
+}
+```
+
+## Step 6: Connect Claude Code
+
+Set the environment variables that tell Claude Code to use your self-hosted model instead of Anthropic's API. Replace `:` with the endpoint from step 4.
+
+
+
+ ```bash
+ export ANTHROPIC_BASE_URL="http://:"
+ export ANTHROPIC_API_KEY="ollama"
+ export ANTHROPIC_AUTH_TOKEN="ollama"
+ export ANTHROPIC_MODEL="qwen3-coder-next"
+ claude --model qwen3-coder-next
+ ```
+
+
+ ```bash
+ export ANTHROPIC_BASE_URL="http://:"
+ export ANTHROPIC_API_KEY="ollama"
+ export ANTHROPIC_AUTH_TOKEN="ollama"
+ export ANTHROPIC_MODEL="gpt-oss:20b"
+ claude --model "gpt-oss:20b"
+ ```
+
+
+
+Claude Code launches and connects to your model. Try asking it to create a file, edit code, or run a command to confirm tool calling works end-to-end.
+
+### What the environment variables do
+
+| Variable | Purpose |
+|----------|---------|
+| `ANTHROPIC_BASE_URL` | Points Claude Code at your Ollama instance instead of `api.anthropic.com` |
+| `ANTHROPIC_API_KEY` | Required by Claude Code but can be any value — Ollama doesn't enforce auth |
+| `ANTHROPIC_AUTH_TOKEN` | Same as above — set to any non-empty string |
+| `ANTHROPIC_MODEL` | The model name to request from Ollama |
+
+### Persistent Configuration (optional)
+
+To avoid setting environment variables every time, add the configuration to `~/.claude/settings.json`:
+
+```json
+{
+ "env": {
+ "ANTHROPIC_BASE_URL": "http://:",
+ "ANTHROPIC_API_KEY": "ollama",
+ "ANTHROPIC_AUTH_TOKEN": "ollama"
+ }
+}
+```
+
+Then launch with:
+
+```bash
+claude --model qwen3-coder-next
+```
+
+
+The `settings.json` approach stores your endpoint persistently. If you destroy the Vast.ai instance, you'll need to update the IP and port or remove the configuration to use Anthropic's API again.
+
+
+## Cleanup
+
+Destroy your instance when you're done to stop billing:
+
+```bash
+vastai destroy instance
+```
+
+## Next Steps
+
+- **Try other models**: Ollama supports [hundreds of models](https://ollama.com/search). Any model with tool-calling support works with Claude Code — try `qwen3-coder` (30B) for a middle ground between the two options above.
+- **Secure your endpoint**: The default setup has no authentication. For production use, add a reverse proxy with TLS and API key validation.
+- **Scale up**: An H100 offers faster inference than an A100 for Qwen3-Coder-Next, with more headroom for longer context windows and concurrent requests.
+
+## Resources
+
+- [Claude Code documentation](https://docs.anthropic.com/en/docs/claude-code)
+- [Ollama](https://ollama.com/)
+- [Qwen3-Coder-Next on Ollama](https://ollama.com/library/qwen3-coder-next)
+- [GPT-OSS-20B on Ollama](https://ollama.com/library/gpt-oss:20b)
+- [Vast.ai CLI documentation](https://vast.ai/docs/cli/commands)