Joolines! π Ever wanted to whip up an AI inference endpoint faster than a tumbleweed in a tornado? If you're a JS/TypeScript wrangled familiar with frameworks like Express.js, this here project is your golden Michelada! We're gonna show you how to quickly get an AI-powered Hono app up and running on Cloudflare Workers, leveraging their nifty AI features.
This guide will walk you through the essentials, so you can spend less time head-scratchin' and more time buildin' awesome stuff!
Ctrl + Click to watch Loom video
The main goal here, folks, is to give JavaScript and TypeScript developers a super-fast walkthrough for setting up an inference endpoint. We're talkin' cutting-edge AI capabilities, courtesy of Cloudflare AI, all served up slick and quick with the Hono framework. Think of this as your "get 'er done" blueprint for AI on the edge. No fuss, no muss!
Before we dive headfirst into the code, make sure you've got these tools in your saddlebag:
- A Cloudflare Account: Can't ride a Cloudflare Worker without being part of the ranch, right? If you don't have one, sign up here β it's free to get started! βοΈ
- Node.js: This is the engine that powers our local development. Make sure you've got a recent version installed. You can grab it from nodejs.org. π’
- A Package Manager (npm/yarn/pnpm): We'll be using
pnpmin this project 'cause it's lean and mean, butnpmoryarnwill also get the job done. If you're new topnpm, it's worth checkin' out! π¦
Alright, time to roll up our sleeves and get our hands dirty!
-
Scaffold a New Worker Project: Fire up your terminal and let's create a brand-spankin'-new Cloudflare Worker project. Replace
<name-of-the-project>with whatever cool name you've cooked up!pnpm create cloudflare@latest <name-of-the-project>
Follow the prompts. When it asks "Do you want to use TypeScript?", say yes! When it asks "Do you want to deploy your application?", you can say no for now, we'll do that later. For the type of application, choose the "Hello World" worker, as we're building our Hono app from scratch.
-
Install Hono and Friends: Navigate into your new project directory (
cd <name-of-the-project>) and let's add Hono and a couple of handy sidekicks:hono: The star of our show β a small, simple, and ultrafast web framework.zod: Awesome for schema declaration and validation (optional but highly recommended for robust apps).eventsource-parser: This little critter will be key for handling AI streams, as we'll see later.
pnpm add hono zod eventsource-parser
-
Create Your Entry Point: Inside the
srcdirectory of your project, create a new file namedindex.ts. This is where our Hono application logic will live.// src/index.ts import { Hono } from 'hono'; // Define Cloudflare Bindings, including AI // This helps TypeScript understand the environment you're running in. export interface Env { AI: Ai; // This line is crucial for Cloudflare AI } const app = new Hono<{ Bindings: Env }>(); app.get('/', (c) => { return c.text('Howdy, Hono and Cloudflare AI! π€ '); }); // Example: A simple AI route (we'll expand on this concept) app.post('/query', async (c): Promise<Response> => { const body = await c.req.json<z.infer<typeof querySchema>>(); const { success, data, error } = querySchema.safeParse(body); if (!success) { return c.json({ error: formatZodError(error), ok: false }, 400); } const { query: userQuery } = data; const messages = [ { role: 'system', content: `You are a helpful assistant, and always try to respond respectfully, in case you don't know the answer respond with βI don't knowβ is perfectly fine.`, }, { role: 'user', content: userQuery }, ]; // call the DeepSeek R1 model const eventSourceStream = (await c.env.AI.run('@cf/deepseek-ai/deepseek-r1-distill-qwen-32b', { messages, temperature: 0.1, top_p: 0.95, max_tokens: 500, stream: true, })) as ReadableStream; if (eventSourceStream === undefined) { return c.json({ error: 'Error in AI model' }, 500); } // EventSourceStream is handy for local event sources, but we want just stream text const tokenStream = eventSourceStream.pipeThrough(new TextDecoderStream()).pipeThrough(new EventSourceParserStream()); const textResponse = streamText(c, async (stream) => { for await (const msg of tokenStream) { if (msg.data !== '[DONE]') { const data = JSON.parse(msg.data); const hasUsage = data?.usage && Object.keys(data.usage).length > 0; if (hasUsage) { const { total_tokens } = data.usage; stream.write(`${data?.response ?? ''}. <<Total Tokens: ${total_tokens}>>`); } stream.write(data?.response ?? ''); } else { stream.close(); } } }); return new Response(textResponse.body, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', Connection: 'keep-alive', 'Transfer-Encoding': 'chunked', 'X-Accel-Buffering': 'no', // Nginx specific 'X-Content-Type-Options': 'nosniff', 'Access-Control-Allow-Origin': '*', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'no-referrer', }, }); }); export default app;
Don't forget to take a gander at the Hono docs anytime you need a refresher or want to explore its other cool features. They've done a bang-up job over there!
This project ain't just about slinging code; it's about learnin' a few tricks of the trade. Here are the key takeaways that'll save you some headaches down the line:
If you've ever danced with Express.js, you'll find Hono's syntax to be pretty darn straightforward. The routing, middleware, and request/response handling will feel like comin' home. This makes the learning curve smooth as butter, letting you focus on the Cloudflare AI specifics.
This is a biggie! To actually use Cloudflare's AI models within your Worker, you gotta tell Cloudflare about it. You do this by adding an "AI binding" to your wrangler.jsonc (or wrangler.toml) configuration file.
Open up your wrangler.jsonc file (it should be in the root of your project) and add the ai binding like so:
Streaming Like a Pro with AI.run() π Many AI models, especially Large Language Models (LLMs), can generate responses token by token. Instead of waiting for the whole shebang, you can stream the response back to the client. This makes your app feel way more responsive β like getting a play-by-play instead of waiting 'til the end of the game! Cloudflare's AI.run() method supports streaming out of the box. When you call a model that supports it, you can set stream: true in the options:
// Inside an async Hono route handler
// ...
const eventSourceStream = (await c.env.AI.run('@cf/deepseek-ai/deepseek-r1-distill-qwen-32b', {
messages,
temperature: 0.1,
top_p: 0.95,
max_tokens: 500,
stream: true, // <--- Yeehaw! Enable streaming!
})) as ReadableStream;
// ...This stream variable is a ReadableStream. Now, the trick is how to send this to your client effectively. Taming the Stream with EventSourceParserStream πͺοΈβ‘οΈπ¬οΈ Now, this was the real kicker for me! When I first tried to consume the stream directly from AI.run() using just TextDecoder(), I ran into some gnarly issues. It wasn't always parsing the chunks correctly, leading to garbled text or incomplete messages. π«
The secret sauce πΆοΈ here is to leverage the EventSourceParserStream from the eventsource-parser package we installed earlier. This utility is designed to handle Server-Sent Events (SSE) formatted streams, which is what AI.run() provides when stream: true.
π Blast Off! And there you have it, folks! A quick rundown on how to get your Hono app chattin' with Cloudflare AI, complete with streaming goodness. To run this locally, you'd typically use:
pnpm devAnd to deploy to Cloudflare:
pnpm deployTesting using httpie
http --stream POST :8787/query query="Tell me a brief joke"