unillm is a unified LLM interface for edge computing. It provides a consistent, type-safe API across multiple LLM providers with minimal dependencies and optimized memory usage for edge environments.
日本語 | English
- 🚀 Edge-First: ~50KB bundle size, ~10ms cold start, optimized for edge runtimes
- 🔄 Unified Interface: Single API for Anthropic, OpenAI, Groq, Gemini, Cloudflare, and more
- 🌊 Streaming Native: Built on Web Streams API with nagare integration
- 🎯 Type-Safe: Full TypeScript support with Zod schema validation
- 📦 Minimal Dependencies: Only Zod (~11KB) required
- ⚡ Memory Optimized: Automatic chunking and backpressure handling
npm install @aid-on/unillmyarn add @aid-on/unillmpnpm add @aid-on/unillmimport { unillm } from "@aid-on/unillm";
// Fluent API with type safety
const response = await unillm()
.model("openai:gpt-4o-mini")
.credentials({ openaiApiKey: process.env.OPENAI_API_KEY })
.temperature(0.7)
.generate("Explain quantum computing in simple terms");
console.log(response.text);unillm returns @aid-on/nagare Stream<T> for reactive stream processing:
import { unillm } from "@aid-on/unillm";
import type { Stream } from "@aid-on/nagare";
const stream: Stream<string> = await unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.stream("Write a story about AI");
// Use nagare's reactive operators
const enhanced = stream
.map(chunk => chunk.trim())
.filter(chunk => chunk.length > 0)
.throttle(16) // ~60fps for UI updates
.tap(chunk => console.log(chunk))
.toSSE(); // Convert to Server-Sent EventsGenerate type-safe structured data with Zod schemas:
import { z } from "zod";
const PersonSchema = z.object({
name: z.string(),
age: z.number(),
skills: z.array(z.string())
});
const result = await unillm()
.model("groq:llama-3.1-8b-instant")
.credentials({ groqApiKey: "..." })
.schema(PersonSchema)
.generate("Generate a software engineer profile");
// Type-safe access
console.log(result.object.name); // string
console.log(result.object.skills); // string[]Ultra-concise syntax for common models:
import { anthropic, openai, groq, gemini, cloudflare } from "@aid-on/unillm";
// One-liners for quick prototyping
await anthropic.sonnet("sk-ant-...").generate("Hello");
await openai.mini("sk-...").generate("Hello");
await groq.instant("gsk_...").generate("Hello");
await gemini.flash("AIza...").generate("Hello");
await cloudflare.llama({ accountId: "...", apiToken: "..." }).generate("Hello");anthropic:claude-opus-4-5-20251101- Claude Opus 4.5 (Most Intelligent)anthropic:claude-haiku-4-5-20251001- Claude Haiku 4.5 (Ultra Fast)anthropic:claude-sonnet-4-5-20250929- Claude Sonnet 4.5 (Best for Coding)anthropic:claude-opus-4-1-20250805- Claude Opus 4.1anthropic:claude-opus-4-20250514- Claude Opus 4anthropic:claude-sonnet-4-20250514- Claude Sonnet 4anthropic:claude-3-5-haiku-20241022- Claude 3.5 Haikuanthropic:claude-3-haiku-20240307- Claude 3 Haiku
openai:gpt-4o- GPT-4o (Latest, fastest GPT-4)openai:gpt-4o-mini- GPT-4o Mini (Cost-effective)openai:gpt-4o-2024-11-20- GPT-4o November snapshotopenai:gpt-4o-2024-08-06- GPT-4o August snapshotopenai:gpt-4-turbo- GPT-4 Turbo (High capability)openai:gpt-4-turbo-preview- GPT-4 Turbo Previewopenai:gpt-4- GPT-4 (Original)openai:gpt-3.5-turbo- GPT-3.5 Turbo (Fast & cheap)openai:gpt-3.5-turbo-0125- GPT-3.5 Turbo Latest
groq:llama-3.3-70b-versatile- Llama 3.3 70B Versatilegroq:llama-3.1-8b-instant- Llama 3.1 8B Instantgroq:meta-llama/llama-guard-4-12b- Llama Guard 4 12Bgroq:openai/gpt-oss-120b- GPT-OSS 120Bgroq:openai/gpt-oss-20b- GPT-OSS 20Bgroq:groq/compound- Groq Compoundgroq:groq/compound-mini- Groq Compound Mini
gemini:gemini-3-pro-preview- Gemini 3 Pro Previewgemini:gemini-3-flash-preview- Gemini 3 Flash Previewgemini:gemini-2.5-pro- Gemini 2.5 Progemini:gemini-2.5-flash- Gemini 2.5 Flashgemini:gemini-2.0-flash- Gemini 2.0 Flashgemini:gemini-2.0-flash-lite- Gemini 2.0 Flash Litegemini:gemini-1.5-pro-002- Gemini 1.5 Pro 002gemini:gemini-1.5-flash-002- Gemini 1.5 Flash 002
cloudflare:@cf/meta/llama-4-scout-17b-16e-instruct- Llama 4 Scoutcloudflare:@cf/meta/llama-3.3-70b-instruct-fp8-fast- Llama 3.3 70B FP8cloudflare:@cf/meta/llama-3.1-70b-instruct- Llama 3.1 70Bcloudflare:@cf/meta/llama-3.1-8b-instruct-fast- Llama 3.1 8B Fastcloudflare:@cf/meta/llama-3.1-8b-instruct- Llama 3.1 8Bcloudflare:@cf/openai/gpt-oss-120b- GPT-OSS 120Bcloudflare:@cf/openai/gpt-oss-20b- GPT-OSS 20Bcloudflare:@cf/ibm/granite-4.0-h-micro- IBM Granite 4.0cloudflare:@cf/mistralai/mistral-small-3.1-24b-instruct- Mistral Small 3.1cloudflare:@cf/mistralai/mistral-7b-instruct-v0.2- Mistral 7Bcloudflare:@cf/google/gemma-3-12b-it- Gemma 3 12Bcloudflare:@cf/qwen/qwq-32b- QwQ 32Bcloudflare:@cf/qwen/qwen2.5-coder-32b-instruct- Qwen 2.5 Coder
const builder = unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.temperature(0.7)
.maxTokens(1000)
.topP(0.9)
.system("You are a helpful assistant")
.messages([
{ role: "user", content: "Previous question..." },
{ role: "assistant", content: "Previous answer..." }
]);
// Reusable configuration
const response1 = await builder.generate("New question");
const response2 = await builder.stream("Another question");Automatic memory management for edge environments:
import { createMemoryOptimizedStream } from "@aid-on/unillm";
const stream = await createMemoryOptimizedStream(
largeResponse,
{
maxMemory: 1024 * 1024, // 1MB limit
chunkSize: 512 // Optimal chunk size
}
);import { UnillmError, RateLimitError } from "@aid-on/unillm";
try {
const response = await unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.generate("Hello");
} catch (error) {
if (error instanceof RateLimitError) {
console.log(`Rate limited. Retry after ${error.retryAfter}ms`);
} else if (error instanceof UnillmError) {
console.log(`LLM error: ${error.message}`);
}
}import { useState } from "react";
import { unillm } from "@aid-on/unillm";
export default function ChatComponent() {
const [response, setResponse] = useState("");
const [loading, setLoading] = useState(false);
const handleGenerate = async () => {
setLoading(true);
const stream = await unillm()
.model("groq:llama-3.1-8b-instant")
.credentials({ groqApiKey: import.meta.env.VITE_GROQ_API_KEY })
.stream("Write a haiku");
for await (const chunk of stream) {
setResponse(prev => prev + chunk);
}
setLoading(false);
};
return (
<div>
<button onClick={handleGenerate} disabled={loading}>
{loading ? "Generating..." : "Generate"}
</button>
<p>{response}</p>
</div>
);
}export default {
async fetch(request: Request, env: Env) {
const stream = await unillm()
.model("cloudflare:@cf/meta/llama-3.1-8b-instruct")
.credentials({
accountId: env.CF_ACCOUNT_ID,
apiToken: env.CF_API_TOKEN
})
.stream("Hello from the edge!");
return new Response(stream.toReadableStream(), {
headers: { "Content-Type": "text/event-stream" }
});
}
};| Method | Description | Example |
|---|---|---|
model(id) |
Set the model ID | model("groq:llama-3.3-70b-versatile") |
credentials(creds) |
Set API credentials | credentials({ groqApiKey: "..." }) |
temperature(n) |
Set temperature (0-1) | temperature(0.7) |
maxTokens(n) |
Set max tokens | maxTokens(1000) |
topP(n) |
Set top-p sampling | topP(0.9) |
schema(zod) |
Set output schema | schema(PersonSchema) |
system(text) |
Set system prompt | system("You are...") |
messages(msgs) |
Set message history | messages([...]) |
generate(prompt) |
Generate response | await generate("Hello") |
stream(prompt) |
Stream response | await stream("Hello") |
MIT