Skip to content

renato1010/streaming-cloudflare-hono

Repository files navigation

Hono AI Quickstart on Cloudflare Workers πŸš€πŸ§ πŸ”₯

Joolines! πŸ‘‹ Ever wanted to whip up an AI inference endpoint faster than a tumbleweed in a tornado? If you're a JS/TypeScript wrangled familiar with frameworks like Express.js, this here project is your golden Michelada! We're gonna show you how to quickly get an AI-powered Hono app up and running on Cloudflare Workers, leveraging their nifty AI features.

This guide will walk you through the essentials, so you can spend less time head-scratchin' and more time buildin' awesome stuff!

Ctrl + Click to watch Loom video

πŸ“œ Table of Contents

🎯 What's the Big Idea?

The main goal here, folks, is to give JavaScript and TypeScript developers a super-fast walkthrough for setting up an inference endpoint. We're talkin' cutting-edge AI capabilities, courtesy of Cloudflare AI, all served up slick and quick with the Hono framework. Think of this as your "get 'er done" blueprint for AI on the edge. No fuss, no muss!

πŸ› οΈ Gettin' Your Ducks in a Row (Prerequisites)

Before we dive headfirst into the code, make sure you've got these tools in your saddlebag:

  1. A Cloudflare Account: Can't ride a Cloudflare Worker without being part of the ranch, right? If you don't have one, sign up here – it's free to get started! ☁️
  2. Node.js: This is the engine that powers our local development. Make sure you've got a recent version installed. You can grab it from nodejs.org. 🟒
  3. A Package Manager (npm/yarn/pnpm): We'll be using pnpm in this project 'cause it's lean and mean, but npm or yarn will also get the job done. If you're new to pnpm, it's worth checkin' out! πŸ“¦

πŸ—οΈ Let's Build This Thing! (Creating a New Worker Project)

Alright, time to roll up our sleeves and get our hands dirty!

  1. Scaffold a New Worker Project: Fire up your terminal and let's create a brand-spankin'-new Cloudflare Worker project. Replace <name-of-the-project> with whatever cool name you've cooked up!

    pnpm create cloudflare@latest <name-of-the-project>

    Follow the prompts. When it asks "Do you want to use TypeScript?", say yes! When it asks "Do you want to deploy your application?", you can say no for now, we'll do that later. For the type of application, choose the "Hello World" worker, as we're building our Hono app from scratch.

  2. Install Hono and Friends: Navigate into your new project directory (cd <name-of-the-project>) and let's add Hono and a couple of handy sidekicks:

    • hono: The star of our show – a small, simple, and ultrafast web framework.
    • zod: Awesome for schema declaration and validation (optional but highly recommended for robust apps).
    • eventsource-parser: This little critter will be key for handling AI streams, as we'll see later.
    pnpm add hono zod eventsource-parser
  3. Create Your Entry Point: Inside the src directory of your project, create a new file named index.ts. This is where our Hono application logic will live.

    // src/index.ts
    import { Hono } from 'hono';
    
    // Define Cloudflare Bindings, including AI
    // This helps TypeScript understand the environment you're running in.
    export interface Env {
    	AI: Ai; // This line is crucial for Cloudflare AI
    }
    
    const app = new Hono<{ Bindings: Env }>();
    
    app.get('/', (c) => {
    	return c.text('Howdy, Hono and Cloudflare AI! 🀠');
    });
    
    // Example: A simple AI route (we'll expand on this concept)
    app.post('/query', async (c): Promise<Response> => {
    	const body = await c.req.json<z.infer<typeof querySchema>>();
    	const { success, data, error } = querySchema.safeParse(body);
    	if (!success) {
    		return c.json({ error: formatZodError(error), ok: false }, 400);
    	}
    	const { query: userQuery } = data;
    
    	const messages = [
    		{
    			role: 'system',
    			content: `You are a helpful assistant, and always try to respond respectfully, 
    			in case you don't know the answer respond with β€œI don't know” is perfectly fine.`,
    		},
    		{ role: 'user', content: userQuery },
    	];
    	// call the DeepSeek R1 model
    	const eventSourceStream = (await c.env.AI.run('@cf/deepseek-ai/deepseek-r1-distill-qwen-32b', {
    		messages,
    		temperature: 0.1,
    		top_p: 0.95,
    		max_tokens: 500,
    		stream: true,
    	})) as ReadableStream;
    
    	if (eventSourceStream === undefined) {
    		return c.json({ error: 'Error in AI model' }, 500);
    	}
    	// EventSourceStream is handy for local event sources, but we want just stream text
    	const tokenStream = eventSourceStream.pipeThrough(new TextDecoderStream()).pipeThrough(new EventSourceParserStream());
    
    	const textResponse = streamText(c, async (stream) => {
    		for await (const msg of tokenStream) {
    			if (msg.data !== '[DONE]') {
    				const data = JSON.parse(msg.data);
    				const hasUsage = data?.usage && Object.keys(data.usage).length > 0;
    				if (hasUsage) {
    					const { total_tokens } = data.usage;
    					stream.write(`${data?.response ?? ''}. <<Total Tokens: ${total_tokens}>>`);
    				}
    				stream.write(data?.response ?? '');
    			} else {
    				stream.close();
    			}
    		}
    	});
    	return new Response(textResponse.body, {
    		headers: {
    			'Content-Type': 'text/event-stream',
    			'Cache-Control': 'no-cache',
    			Connection: 'keep-alive',
    			'Transfer-Encoding': 'chunked',
    			'X-Accel-Buffering': 'no', // Nginx specific
    			'X-Content-Type-Options': 'nosniff',
    			'Access-Control-Allow-Origin': '*',
    			'X-Frame-Options': 'DENY',
    			'X-XSS-Protection': '1; mode=block',
    			'Referrer-Policy': 'no-referrer',
    		},
    	});
    });
    
    export default app;

    Don't forget to take a gander at the Hono docs anytime you need a refresher or want to explore its other cool features. They've done a bang-up job over there!

πŸ’‘ The Real Nuggets (Main Takeaways)

This project ain't just about slinging code; it's about learnin' a few tricks of the trade. Here are the key takeaways that'll save you some headaches down the line:

Hono: Familiar Territory 🏞️

If you've ever danced with Express.js, you'll find Hono's syntax to be pretty darn straightforward. The routing, middleware, and request/response handling will feel like comin' home. This makes the learning curve smooth as butter, letting you focus on the Cloudflare AI specifics.

Binding AI in wrangler.jsonc πŸ”—

This is a biggie! To actually use Cloudflare's AI models within your Worker, you gotta tell Cloudflare about it. You do this by adding an "AI binding" to your wrangler.jsonc (or wrangler.toml) configuration file.

Open up your wrangler.jsonc file (it should be in the root of your project) and add the ai binding like so:

// wrangler.jsonc
{
	"name": "<name-of-the-project>",
	"main": "src/index.ts",
	"compatibility_date": "YYYY-MM-DD", // Use the date generated for you
	"ai": {
		// <-- THIS IS THE MAGIC!
		"binding": "AI" // This makes `env.AI` available in your worker
	},
	"vars": {
		// You can add other environment variables here
	}
	// ... other configurations
}

Streaming Like a Pro with AI.run() 🌊 Many AI models, especially Large Language Models (LLMs), can generate responses token by token. Instead of waiting for the whole shebang, you can stream the response back to the client. This makes your app feel way more responsive – like getting a play-by-play instead of waiting 'til the end of the game! Cloudflare's AI.run() method supports streaming out of the box. When you call a model that supports it, you can set stream: true in the options:

// Inside an async Hono route handler
// ...
const eventSourceStream = (await c.env.AI.run('@cf/deepseek-ai/deepseek-r1-distill-qwen-32b', {
	messages,
	temperature: 0.1,
	top_p: 0.95,
	max_tokens: 500,
	stream: true, // <--- Yeehaw! Enable streaming!
})) as ReadableStream;
// ...

This stream variable is a ReadableStream. Now, the trick is how to send this to your client effectively. Taming the Stream with EventSourceParserStream πŸŒͺ️➑️🌬️ Now, this was the real kicker for me! When I first tried to consume the stream directly from AI.run() using just TextDecoder(), I ran into some gnarly issues. It wasn't always parsing the chunks correctly, leading to garbled text or incomplete messages. 😫

The secret sauce 🌢️ here is to leverage the EventSourceParserStream from the eventsource-parser package we installed earlier. This utility is designed to handle Server-Sent Events (SSE) formatted streams, which is what AI.run() provides when stream: true.

πŸš€ Blast Off! And there you have it, folks! A quick rundown on how to get your Hono app chattin' with Cloudflare AI, complete with streaming goodness. To run this locally, you'd typically use:

pnpm dev

And to deploy to Cloudflare:

pnpm deploy

Local test

Testing using httpie

http --stream POST :8787/query query="Tell me a brief joke"

About

"Create a streaming-ready endpoint with Cloudflare Workers/AI, and Hono"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors