Skip to content
/ unillm Public

Edge-native unified LLM interface - 48 models across Anthropic, OpenAI, Groq, Gemini, Cloudflare with Web Streams API

License

Notifications You must be signed in to change notification settings

Aid-On/unillm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@aid-on/unillm

npm version TypeScript License: MIT

unillm is a unified LLM interface for edge computing. It provides a consistent, type-safe API across multiple LLM providers with minimal dependencies and optimized memory usage for edge environments.

日本語 | English

Features

  • 🚀 Edge-First: ~50KB bundle size, ~10ms cold start, optimized for edge runtimes
  • 🔄 Unified Interface: Single API for Anthropic, OpenAI, Groq, Gemini, Cloudflare, and more
  • 🌊 Streaming Native: Built on Web Streams API with nagare integration
  • 🎯 Type-Safe: Full TypeScript support with Zod schema validation
  • 📦 Minimal Dependencies: Only Zod (~11KB) required
  • ⚡ Memory Optimized: Automatic chunking and backpressure handling

Installation

npm install @aid-on/unillm
yarn add @aid-on/unillm
pnpm add @aid-on/unillm

Quick Start

import { unillm } from "@aid-on/unillm";

// Fluent API with type safety
const response = await unillm()
  .model("openai:gpt-4o-mini")
  .credentials({ openaiApiKey: process.env.OPENAI_API_KEY })
  .temperature(0.7)
  .generate("Explain quantum computing in simple terms");

console.log(response.text);

Streaming with nagare

unillm returns @aid-on/nagare Stream<T> for reactive stream processing:

import { unillm } from "@aid-on/unillm";
import type { Stream } from "@aid-on/nagare";

const stream: Stream<string> = await unillm()
  .model("groq:llama-3.3-70b-versatile")
  .credentials({ groqApiKey: "..." })
  .stream("Write a story about AI");

// Use nagare's reactive operators
const enhanced = stream
  .map(chunk => chunk.trim())
  .filter(chunk => chunk.length > 0)
  .throttle(16)  // ~60fps for UI updates
  .tap(chunk => console.log(chunk))
  .toSSE();      // Convert to Server-Sent Events

Structured Output

Generate type-safe structured data with Zod schemas:

import { z } from "zod";

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
  skills: z.array(z.string())
});

const result = await unillm()
  .model("groq:llama-3.1-8b-instant")
  .credentials({ groqApiKey: "..." })
  .schema(PersonSchema)
  .generate("Generate a software engineer profile");

// Type-safe access
console.log(result.object.name);     // string
console.log(result.object.skills);   // string[]

Provider Shortcuts

Ultra-concise syntax for common models:

import { anthropic, openai, groq, gemini, cloudflare } from "@aid-on/unillm";

// One-liners for quick prototyping
await anthropic.sonnet("sk-ant-...").generate("Hello");
await openai.mini("sk-...").generate("Hello");
await groq.instant("gsk_...").generate("Hello");
await gemini.flash("AIza...").generate("Hello");
await cloudflare.llama({ accountId: "...", apiToken: "..." }).generate("Hello");

Supported Models (48 Models)

Anthropic (8 models) - v0.4.0

  • anthropic:claude-opus-4-5-20251101 - Claude Opus 4.5 (Most Intelligent)
  • anthropic:claude-haiku-4-5-20251001 - Claude Haiku 4.5 (Ultra Fast)
  • anthropic:claude-sonnet-4-5-20250929 - Claude Sonnet 4.5 (Best for Coding)
  • anthropic:claude-opus-4-1-20250805 - Claude Opus 4.1
  • anthropic:claude-opus-4-20250514 - Claude Opus 4
  • anthropic:claude-sonnet-4-20250514 - Claude Sonnet 4
  • anthropic:claude-3-5-haiku-20241022 - Claude 3.5 Haiku
  • anthropic:claude-3-haiku-20240307 - Claude 3 Haiku

OpenAI (9 models)

  • openai:gpt-4o - GPT-4o (Latest, fastest GPT-4)
  • openai:gpt-4o-mini - GPT-4o Mini (Cost-effective)
  • openai:gpt-4o-2024-11-20 - GPT-4o November snapshot
  • openai:gpt-4o-2024-08-06 - GPT-4o August snapshot
  • openai:gpt-4-turbo - GPT-4 Turbo (High capability)
  • openai:gpt-4-turbo-preview - GPT-4 Turbo Preview
  • openai:gpt-4 - GPT-4 (Original)
  • openai:gpt-3.5-turbo - GPT-3.5 Turbo (Fast & cheap)
  • openai:gpt-3.5-turbo-0125 - GPT-3.5 Turbo Latest

Groq (7 models)

  • groq:llama-3.3-70b-versatile - Llama 3.3 70B Versatile
  • groq:llama-3.1-8b-instant - Llama 3.1 8B Instant
  • groq:meta-llama/llama-guard-4-12b - Llama Guard 4 12B
  • groq:openai/gpt-oss-120b - GPT-OSS 120B
  • groq:openai/gpt-oss-20b - GPT-OSS 20B
  • groq:groq/compound - Groq Compound
  • groq:groq/compound-mini - Groq Compound Mini

Google Gemini (8 models)

  • gemini:gemini-3-pro-preview - Gemini 3 Pro Preview
  • gemini:gemini-3-flash-preview - Gemini 3 Flash Preview
  • gemini:gemini-2.5-pro - Gemini 2.5 Pro
  • gemini:gemini-2.5-flash - Gemini 2.5 Flash
  • gemini:gemini-2.0-flash - Gemini 2.0 Flash
  • gemini:gemini-2.0-flash-lite - Gemini 2.0 Flash Lite
  • gemini:gemini-1.5-pro-002 - Gemini 1.5 Pro 002
  • gemini:gemini-1.5-flash-002 - Gemini 1.5 Flash 002

Cloudflare Workers AI (13 models)

  • cloudflare:@cf/meta/llama-4-scout-17b-16e-instruct - Llama 4 Scout
  • cloudflare:@cf/meta/llama-3.3-70b-instruct-fp8-fast - Llama 3.3 70B FP8
  • cloudflare:@cf/meta/llama-3.1-70b-instruct - Llama 3.1 70B
  • cloudflare:@cf/meta/llama-3.1-8b-instruct-fast - Llama 3.1 8B Fast
  • cloudflare:@cf/meta/llama-3.1-8b-instruct - Llama 3.1 8B
  • cloudflare:@cf/openai/gpt-oss-120b - GPT-OSS 120B
  • cloudflare:@cf/openai/gpt-oss-20b - GPT-OSS 20B
  • cloudflare:@cf/ibm/granite-4.0-h-micro - IBM Granite 4.0
  • cloudflare:@cf/mistralai/mistral-small-3.1-24b-instruct - Mistral Small 3.1
  • cloudflare:@cf/mistralai/mistral-7b-instruct-v0.2 - Mistral 7B
  • cloudflare:@cf/google/gemma-3-12b-it - Gemma 3 12B
  • cloudflare:@cf/qwen/qwq-32b - QwQ 32B
  • cloudflare:@cf/qwen/qwen2.5-coder-32b-instruct - Qwen 2.5 Coder

Advanced Usage

Fluent Builder Pattern

const builder = unillm()
  .model("groq:llama-3.3-70b-versatile")
  .credentials({ groqApiKey: "..." })
  .temperature(0.7)
  .maxTokens(1000)
  .topP(0.9)
  .system("You are a helpful assistant")
  .messages([
    { role: "user", content: "Previous question..." },
    { role: "assistant", content: "Previous answer..." }
  ]);

// Reusable configuration
const response1 = await builder.generate("New question");
const response2 = await builder.stream("Another question");

Memory Optimization

Automatic memory management for edge environments:

import { createMemoryOptimizedStream } from "@aid-on/unillm";

const stream = await createMemoryOptimizedStream(
  largeResponse,
  { 
    maxMemory: 1024 * 1024,  // 1MB limit
    chunkSize: 512           // Optimal chunk size
  }
);

Error Handling

import { UnillmError, RateLimitError } from "@aid-on/unillm";

try {
  const response = await unillm()
    .model("groq:llama-3.3-70b-versatile")
    .credentials({ groqApiKey: "..." })
    .generate("Hello");
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log(`Rate limited. Retry after ${error.retryAfter}ms`);
  } else if (error instanceof UnillmError) {
    console.log(`LLM error: ${error.message}`);
  }
}

Integration Examples

With React

import { useState } from "react";
import { unillm } from "@aid-on/unillm";

export default function ChatComponent() {
  const [response, setResponse] = useState("");
  const [loading, setLoading] = useState(false);
  
  const handleGenerate = async () => {
    setLoading(true);
    const stream = await unillm()
      .model("groq:llama-3.1-8b-instant")
      .credentials({ groqApiKey: import.meta.env.VITE_GROQ_API_KEY })
      .stream("Write a haiku");
    
    for await (const chunk of stream) {
      setResponse(prev => prev + chunk);
    }
    setLoading(false);
  };
  
  return (
    <div>
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate"}
      </button>
      <p>{response}</p>
    </div>
  );
}

With Cloudflare Workers

export default {
  async fetch(request: Request, env: Env) {
    const stream = await unillm()
      .model("cloudflare:@cf/meta/llama-3.1-8b-instruct")
      .credentials({
        accountId: env.CF_ACCOUNT_ID,
        apiToken: env.CF_API_TOKEN
      })
      .stream("Hello from the edge!");
    
    return new Response(stream.toReadableStream(), {
      headers: { "Content-Type": "text/event-stream" }
    });
  }
};

API Reference

unillm() Builder Methods

Method Description Example
model(id) Set the model ID model("groq:llama-3.3-70b-versatile")
credentials(creds) Set API credentials credentials({ groqApiKey: "..." })
temperature(n) Set temperature (0-1) temperature(0.7)
maxTokens(n) Set max tokens maxTokens(1000)
topP(n) Set top-p sampling topP(0.9)
schema(zod) Set output schema schema(PersonSchema)
system(text) Set system prompt system("You are...")
messages(msgs) Set message history messages([...])
generate(prompt) Generate response await generate("Hello")
stream(prompt) Stream response await stream("Hello")

License

MIT