llmkit (Rust)

Unified async LLM client library. One API, multiple providers, minimal dependencies (tokio + reqwest + small crypto/encoding utilities — see Cargo.toml; nothing else).

The code is generated + hand-coded with the help of AI: a typed provider matrix and the typed-builder API surface are generated from a single source of truth, while request building, transport, streaming, caching, batching, and tool-loop behavior are hand-coded on top.

Shares a code-generation pipeline with the Go, TypeScript, and Python SDKs.

Install

[dependencies]
llmkit = "1.0"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

Quick Start

use llmkit::builders::anthropic;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let c = anthropic(std::env::var("ANTHROPIC_API_KEY")?);
    let resp = c.text()
        .system("Be concise.")
        .temperature(0.3)
        .prompt("Why is the sky blue?")
        .await?;

    println!("{}", resp.text);
    println!("{} input tokens", resp.usage.input);
    Ok(())
}

The typed builder is the only public surface as of v1.0.0. One mental model — client.<capability>().<chain>.<terminal> — across every capability.

Providers

Per-provider factory functions in llmkit::builders:

ai21      anthropic  azure     bedrock   cerebras  cohere    deepseek
doubao    ernie      fireworks google    grok      groq      lmstudio
minimax   mistral    moonshot  ollama    openai    openrouter
perplexity qwen      sambanova together  vllm      yi        zhipu

Or use the generic new_client(ProviderName::OpenAI, key). 27 providers, 4 API shapes (OpenAI-compatible, Anthropic Messages, Google Generative AI, AWS Bedrock Converse). Bedrock auth uses SigV4; other providers use API-key auth.

API

Text — one-shot prompt

let resp = c.text()
    .system("You are helpful")
    .temperature(0.7)
    .max_tokens(200)
    .prompt("What is 2+2?")
    .await?;

println!("{}", resp.text);              // "4"
println!("{}", resp.usage.input);       // prompt tokens
println!("{}", resp.usage.output);      // completion tokens
println!("{}", resp.usage.cache_read);  // tokens served from cache
println!("{}", resp.usage.cache_write); // tokens written to cache (Anthropic explicit)
println!("{}", resp.usage.reasoning);   // internal reasoning tokens (OpenAI o-series, Gemini 2.5+)

Capability-scoped fields (cache_read, cache_write, reasoning) are zero when the provider doesn't report them separately.

Stream — callback + trailing handle

Rust's stream surface is callback-based. The callback fires for each chunk; the awaited terminal returns the final Response with token counts.

let resp = c.text()
    .system("Be brief")
    .stream("Tell me a joke", |chunk| print!("{}", chunk))
    .await?;
println!("\nUsage: {:?}", resp.usage);

The callback shape is the trailing-handle pattern from the other SDKs expressed in callback form: callback receives chunks (≡ iterator), the returned Result<Response> is the trailing handle (≡ stream.response() in TS/Python). The impl Stream<Item = ...> variant from futures would mirror visually but adds a third-party dependency the project's stdlib-first rule disallows.

Agent — tool loop

use llmkit::Tool;

let add = Tool::new(
    "add",
    "Add two numbers",
    serde_json::json!({
        "type": "object",
        "properties": {
            "a": {"type": "number"},
            "b": {"type": "number"},
        },
    }),
    |args| Ok((args["a"].as_f64().unwrap() + args["b"].as_f64().unwrap()).to_string()),
);

let mut bot = c.agent()
    .system("You are a calculator.")
    .tool(add)
    .max_tool_iterations(5);

let resp = bot.prompt("What is 2+3?").await?;
println!("{}", resp.text);

*Agent is stateful — repeated bot.prompt(...) calls accumulate history. Chain methods (.system(...), .tool(...)) consume self and produce a fresh-state clone, so a forked builder gets a fresh conversation. bot.reset() clears state without dropping chained config.

Tool dispatch covers Anthropic tool_use, OpenAI tool_calls, Google functionCall, and Bedrock Converse toolUse.

Image — text-to-image and edit

Supports Google's Nano Banana 2 (gemini-3.1-flash-image-preview) and Pro (gemini-3-pro-image-preview); OpenAI's gpt-image-2, gpt-image-1.5, gpt-image-1, and gpt-image-1-mini; xAI's grok-imagine-image-quality; Google Cloud Vertex AI's Imagen 3 / Imagen 4 (imagen-3.0-generate-002, imagen-3.0-fast-generate-001, imagen-4.0-generate-preview-06-06).

use llmkit::builders::google;

let c = google(std::env::var("GOOGLE_API_KEY")?);
let img = c.image()
    .model("gemini-3.1-flash-image-preview")
    .aspect_ratio("16:9")
    .image_size("2K")
    .generate("A nano banana dish, studio lighting")
    .await?;

std::fs::write("out.png", &img.images[0].data)?;

For compositional editing, chain .text(...) and .image(mime, bytes) to interleave references with descriptions:

c.image()
    .model("gemini-3.1-flash-image-preview")
    .text("Person:")
    .image("image/png", person_bytes)
    .text("Outfit:")
    .image("image/png", outfit_bytes)
    .generate("Generate the person wearing the outfit.")
    .await?;

Aspect ratios and sizes validate against a per-model whitelist before the HTTP request. Empty whitelists mean "no client-side check; pass through" — providers like OpenAI accept arbitrary sizes within documented bounds (max edge ≤3840, both edges multiples of 16, ratio ≤3:1, total pixels 655K–8.3M), so the SDK trusts the API boundary instead of carrying a stale list.

For OpenAI, the chain dispatches automatically — no image parts hits /v1/images/generations (JSON), one or more image parts hits /v1/images/edits (multipart/form-data with one image[] field per reference, in caller order).

Provider knobs are typed chain methods on the Image builder:

Method	Provider support	Wire field
`.quality(s)`	OpenAI gpt-image-*	`quality`
`.output_format(s)`	OpenAI gpt-image-*	`output_format`
`.background(s)`	OpenAI gpt-image-*	`background`
`.count(n)`	OpenAI + xAI Grok	`n`
`.mask(mime, bytes)`	OpenAI gpt-image-* (edits)	multipart `mask`

The chain validates per provider — calling .quality(...) on a Google or xAI builder returns Err(Validation { ... }) immediately, no HTTP round-trip. Knobs without typed methods (OpenAI: output_compression, moderation) remain reachable via .extra_fields(...), which is unvalidated and freeform.

use llmkit::builders::openai;

let c = openai(std::env::var("OPENAI_API_KEY")?);
let resp = c.image()
    .model("gpt-image-2")
    .image_size("1024x1024")
    .quality("high")
    .count(4)
    .generate("A red circle on a white background")
    .await?;

OpenAI gpt-image-* models require organization verification — see platform.openai.com/docs/guides/your-data#organization-verification.

Up to 14 reference images per Google request, 16 per OpenAI request.

Vertex AI Imagen (Google Cloud)

Vertex Imagen uses the :predict endpoint family and OAuth bearer auth instead of API keys. The SDK takes a bearer token (string); caller manages OAuth refresh externally (e.g. gcloud auth print-access-token, service-account JSON, or workload identity).

use llmkit::builders::vertex;

// Caller substitutes {project_id} and {location} before passing the URL.
let base_url = "https://us-central1-aiplatform.googleapis.com\
    /v1/projects/my-gcp-project/locations/us-central1/publishers/google/models";

let c = vertex(std::env::var("VERTEX_BEARER_TOKEN")?).with_base_url(base_url);

let resp = c
    .image()
    .model("imagen-3.0-generate-002")
    .aspect_ratio("16:9")
    .count(2)
    .generate("A red circle")
    .await?;

Edit-mode (single image into instances[0].image) and inpainting (.mask(mime, bytes) into instances[0].mask.image) work the same way. Imagen-specific knobs like negativePrompt and safetySetting are reachable through .extra_fields(...) — they spread into the request's parameters block. Vertex's :predict response does not carry token counts; resp.tokens stays zero.

Safety Settings

Control content filtering for Gemini providers. safety_settings applies to text generation, streaming, agents, and Gemini image generation. safety_filter applies to Vertex Imagen only.

use llmkit::builders::{google, vertex};
use llmkit::types::{
    SafetySetting,
    HARM_CATEGORY_DANGEROUS_CONTENT,
    HARM_CATEGORY_HARASSMENT,
    HARM_BLOCK_THRESHOLD_NONE,
    HARM_BLOCK_THRESHOLD_HIGH_ONLY,
    IMAGE_SAFETY_FILTER_BLOCK_FEW,
};

// Gemini text or agent
let c = google(std::env::var("GOOGLE_API_KEY")?);
let resp = c
    .text()
    .safety_settings(vec![
        SafetySetting { category: HARM_CATEGORY_DANGEROUS_CONTENT.into(), threshold: HARM_BLOCK_THRESHOLD_NONE.into() },
        SafetySetting { category: HARM_CATEGORY_HARASSMENT.into(), threshold: HARM_BLOCK_THRESHOLD_HIGH_ONLY.into() },
    ])
    .prompt("Write a story")
    .await?;

// Vertex Imagen
let vc = vertex(std::env::var("VERTEX_BEARER_TOKEN")?);
let img = vc
    .image()
    .model("imagen-3.0-generate-002")
    .safety_filter(IMAGE_SAFETY_FILTER_BLOCK_FEW)
    .generate("A landscape")
    .await?;

safety_settings on Vertex Imagen and safety_filter on non-Imagen providers return Err(ValidationError). The HARM_CATEGORY_*, HARM_BLOCK_THRESHOLD_*, and IMAGE_SAFETY_FILTER_* constants cover all documented values; raw strings also work.

Upload — Path or Bytes

use llmkit::builders::openai;

let c = openai(std::env::var("OPENAI_API_KEY")?);

// from a path
let file = c.upload().path("./data.pdf").run().await?;

// from bytes (filename required)
let file2 = c.upload()
    .bytes(buf)
    .filename("report.pdf")
    .mime_type("application/pdf")
    .run()
    .await?;

Batches

use llmkit::builders::BatchHandleExt;

let results = c.text()
    .system("Be brief")
    .batch(vec!["Translate hello to French".into(), "Translate hello to Spanish".into()])
    .await?;
for r in &results { println!("{}", r.text); }

// Or split:
let handle = c.text().submit_batch(prompts).await?;
let results = handle.wait().await?;

Both inline (Anthropic) and file-reference (OpenAI two-hop) flows are handled internally. Import the BatchHandleExt trait to call .wait() on the returned handle.

Caching

// Anthropic — explicit cache_control wrap of the system prompt.
c.text().system(long_sys_prompt).caching().prompt("...").await?;

// OpenAI — automatic server-side caching (caching() is a hint; reads
// surface in resp.usage.cache_read regardless).
c.text().system(long_sys_prompt).caching().prompt("...").await?;

// Google — pre-flight POST creates a cachedContents resource, then
// the main call references it. Google requires ~1k+ tokens of system
// prompt:
c.text().system(big_sys_prompt).caching().prompt("...").await?;

The mode is provider-specific and inferred from the provider config. The default TTL comes from src/providers/generated/caching.rs (Google: 3600s).

Options

Across every *Text / *Agent builder:

Concept	Method
System prompt	`.system(s)`
Model override	`.model(name)`
Sampling	`.temperature(t)`
Token cap	`.max_tokens(n)`
Caching	`.caching()`
Conversation hist	`.history(msgs)`
Structured output	`.schema(json)`
Middleware hooks	`.middleware(fns)`
Reasoning effort	`.reasoning_effort(l)`
Thinking budget	`.thinking_budget(n)`

Sampling hyperparameters (.top_p, .top_k, .seed, .frequency_penalty, .presence_penalty, .stop_sequences) are validated per provider; unsupported options return Error::Validation rather than silently dropping.

The Image builder has a narrower set: .model, .aspect_ratio, .image_size, .include_text, .text, .image, .middleware. Upload: .path, .bytes, .filename, .mime_type, .middleware.

Self-hosted endpoints

use llmkit::builders::openai;

let c = openai("anything").with_base_url("http://localhost:8080/v1");

Works for any OpenAI-compatible server (vLLM, LM Studio, Ollama, corporate gateways).

Middleware

Register pre/post hooks around LLM requests, tool calls, image generation, cache creation, uploads, and batch submits. Pre-phase middleware can veto by returning Some(error); post-phase return values are discarded.

use std::sync::Arc;
use llmkit::builders::anthropic;
use llmkit::middleware::{Event, MiddlewareFn, MiddlewareOp, MiddlewarePhase};

// Observation: log token usage after every LLM request.
let log_usage: MiddlewareFn = Arc::new(|e: &Event| {
    if e.op == MiddlewareOp::LlmRequest && e.phase == MiddlewarePhase::Post {
        if let Some(u) = &e.usage {
            let ms = e.duration.map(|d| d.as_millis()).unwrap_or(0);
            println!(
                "{}/{}: {} in, {} out, {ms} ms",
                e.provider, e.model, u.input, u.output,
            );
        }
    }
    None
});

// Veto: abort if a daily budget is exceeded (pre-phase).
let limit = 5.00_f64;
let spent = Arc::new(std::sync::Mutex::new(0.0_f64));
let spent_for_gate = Arc::clone(&spent);
let budget_gate: MiddlewareFn = Arc::new(move |e: &Event| {
    if e.op == MiddlewareOp::LlmRequest
        && e.phase == MiddlewarePhase::Pre
        && *spent_for_gate.lock().unwrap() >= limit
    {
        let msg = format!("daily budget ${:.2} exceeded", limit);
        return Some(Box::<dyn std::error::Error + Send + Sync>::from(msg));
    }
    None
});

let c = anthropic("…");
let resp = c
    .text()
    .middleware(vec![budget_gate, log_usage])
    .prompt("…")
    .await?;

A pre-phase veto surfaces as llmkit::Error::MiddlewareVeto(String) carrying the formatted cause, so callers can discriminate it from transport or provider errors via match err { Error::MiddlewareVeto(msg) => … }. Middlewares fire in registration order; the first Some(_) pre-phase return aborts.

Wired at six sites: Text.prompt / Agent::chat LLM call (op=LlmRequest), Agent tool execution (op=ToolCall), Image.generate (op=ImageGeneration), Upload.run (op=Upload), Text.submit_batch (op=BatchSubmit), Google resource caching pre-flight (op=CacheCreate).

Architecture

Generated (src/providers/generated/*.rs, src/builders/mod.rs) — per-provider config + the typed-builder API surface. Pure data and struct skeletons, no business logic.
Hand-coded (src/{lib,types,error,http,transforms,middleware,caching,batch,agent,sigv4,paths,request,response,stream,uploads,image,options}.rs, plus src/builders/{text,agent,image,stream,batch,upload}.rs and internal_tests.rs) — HTTP, request shaping, SSE consumer, agent tool loop, SigV4 signing, caching, batch lifecycle, multipart upload, middleware fanout, builder terminals.

Transforms dispatch on config fields (system_placement, wraps_options_in, auth_scheme), not provider names.

The Error enum is #[non_exhaustive] and builder structs are #[non_exhaustive] with pub(crate) fields — the chain methods are the intended interface, and we can add fields in 1.0.x without a SemVer break.

Mirror

This repo is a read-only mirror of a private monorepo. File issues here; code patches should target the private source via christian@aktagon.com.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
src		src
tests		tests
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
cov-lib-only.json		cov-lib-only.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmkit (Rust)

Install

Quick Start

Providers

API

Text — one-shot prompt

Stream — callback + trailing handle

Agent — tool loop

Image — text-to-image and edit

Vertex AI Imagen (Google Cloud)

Safety Settings

Upload — Path or Bytes

Batches

Caching

Options

Self-hosted endpoints

Middleware

Architecture

Mirror

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmkit (Rust)

Install

Quick Start

Providers

API

Text — one-shot prompt

Stream — callback + trailing handle

Agent — tool loop

Image — text-to-image and edit

Vertex AI Imagen (Google Cloud)

Safety Settings

Upload — Path or Bytes

Batches

Caching

Options

Self-hosted endpoints

Middleware

Architecture

Mirror

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages