pi-llama-server

Note: This is how I use pi.dev + llama.cpp on my local machine. I created a plugin so that I can update my setup quickly.

pi-llama-server

Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Provides live model listing and ability to load/unload via the llama-server API.

Prerequisites

A running llama-server instance (from llama.cpp) in router-mode (the default if you don't mention -m)
Pi Coding Agent installed (@mariozechner/pi-coding-agent)

Install

pi install npm:pi-llama-server

Or from git:

pi install git:github.com/user/pi-llama-server

Pi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.

Configuration

The llama-server URL is resolved in this order:

Per-project config — create .pi/llama-server.json in your project root:
```
{ "url": "http://10.0.0.5:9090" }
```

Environment variable — set globally:

export LLAMA_SERVER_URL=http://10.0.0.5:9090

Default — falls back to http://127.0.0.1:8080

Usage

Browse and manage models

Run the /models slash command inside Pi to see all models on the llama-server with live status:

Status	Meaning
🟢 `loaded`	Model is loaded and ready
🟡 `loading`	Model is being loaded
🔴 `failed`	Model failed to load
⚪ other	Unknown state

Select a model to load, unload, or switch to it.

Switch models

Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. The extension will automatically tell llama-server to load the chosen model.

How it works

When Pi starts, the extension:

Resolves the llama-server URL from config/env/default
Queries GET /models to discover available GGUF models
Registers each model as an OpenAI-compatible provider under {url}/v1
Listens for model switch events and calls POST /models/load on the server
Provides the /models interactive command for managing models

llama-server endpoints used

Endpoint	Method	Purpose
`/models`	GET	List all models
`/models/load`	POST	Load a model
`/models/unload`	POST	Unload a model
`/v1/...`	POST	OpenAI-compatible completions (via Pi provider)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
extensions		extensions
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pi-llama-server

Prerequisites

Install

Configuration

Usage

Browse and manage models

Switch models

How it works

llama-server endpoints used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pi-llama-server

Prerequisites

Install

Configuration

Usage

Browse and manage models

Switch models

How it works

llama-server endpoints used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages