Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 29 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ This repo contains a modern fullstack cookbook app showcasing the agentic AI cap

## Requirements

- **Python** 3.11 or higher; we recommend [uv 0.7+](https://github.com/astral-sh/uv) for working with Python
- **Node.js** 22.x or higher; we recommend [pnpm 10.17+](https://pnpm.io/installation) for working with Node.js
- **Python** 3.11 or higher; we recommend [uv 0.7+](https://github.com/astral-sh/uv) for working with Python
- **Node.js** 22.x or higher; we recommend [pnpm 10.17+](https://pnpm.io/installation) for working with Node.js

## Quick Start

Expand Down Expand Up @@ -44,6 +44,7 @@ COOKBOOK_ENDPOINTS='[
cd backend && uv sync
cd ..
cd frontend && npm install
cd ..
```

### Run the app
Expand Down Expand Up @@ -131,21 +132,21 @@ backend/

#### Backend Features & Technologies

- FastAPI - Modern Python web framework
- uvicorn - ASGI server
- uv - Fast Python package manager
- openai - OpenAI Python client for LLM proxying
- FastAPI - Modern Python web framework
- uvicorn - ASGI server
- uv - Fast Python package manager
- openai - OpenAI Python client for LLM proxying

#### Backend Routes

- `GET /api/health` - Health check
- `GET /api/recipes` - List available recipe slugs
- `GET /api/endpoints` - List configured LLM endpoints
- `GET /api/models?endpointId=xxx` - List models for endpoint
- `POST /api/recipes/multiturn-chat` - Multi-turn chat endpoint
- `POST /api/recipes/batch-text-classification` - Text Classification endpoint
- `POST /api/recipes/image-captioning` - Image captioning endpoint
- `GET /api/recipes/{slug}/code` - Get recipe backend source code
- `GET /api/health` - Health check
- `GET /api/recipes` - List available recipe slugs
- `GET /api/endpoints` - List configured LLM endpoints
- `GET /api/models?endpointId=xxx` - List models for endpoint
- `POST /api/recipes/multiturn-chat` - Multi-turn chat endpoint
- `POST /api/recipes/batch-text-classification` - Text Classification endpoint
- `POST /api/recipes/image-captioning` - Image captioning endpoint
- `GET /api/recipes/{slug}/code` - Get recipe backend source code

### React Frontend

Expand All @@ -166,26 +167,26 @@ frontend/

#### Frontend Features & Technologies

- **React 18 + TypeScript** - Type-safe component development
- **Vite** - Lightning-fast dev server and optimized production builds
- **React Router v7** - Auto-generated routing with lazy loading
- **Mantine v7** - Comprehensive UI component library with dark/light themes
- **SWR** - Lightweight data fetching with automatic caching
- **Vercel AI SDK** - Streaming chat UI with token-by-token responses
- **MDX** - Markdown documentation with JSX support
- **Recipe Registry** - Single source of truth for all recipes (pure data + React components)
- **React 18 + TypeScript** - Type-safe component development
- **Vite** - Lightning-fast dev server and optimized production builds
- **React Router v7** - Auto-generated routing with lazy loading
- **Mantine v7** - Comprehensive UI component library with dark/light themes
- **SWR** - Lightweight data fetching with automatic caching
- **Vercel AI SDK** - Streaming chat UI with token-by-token responses
- **MDX** - Markdown documentation with JSX support
- **Recipe Registry** - Single source of truth for all recipes (pure data + React components)

#### Frontend Routes

- `/` - Recipe index
- `/:slug` - Recipe demo (interactive UI)
- `/:slug/readme` - Recipe documentation
- `/:slug/code` - Recipe source code view
- `/` - Recipe index
- `/:slug` - Recipe demo (interactive UI)
- `/:slug/readme` - Recipe documentation
- `/:slug/code` - Recipe source code view

## Documentation

- [Contributing Guide](docs/contributing.md) - Architecture, patterns, and how to add recipes
- [Docker Deployment Guide](docs/docker.md) - Container deployment with MAX
- [Contributing Guide](docs/contributing.md) - Architecture, patterns, and how to add recipes
- [Docker Deployment Guide](docs/docker.md) - Container deployment with MAX

## License

Expand Down
3 changes: 2 additions & 1 deletion backend/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from fastapi.staticfiles import StaticFiles

from src.core import endpoints, models
from src.recipes import batch_text_classification, image_captioning, multiturn_chat
from src.recipes import batch_text_classification, image_captioning, image_generation, multiturn_chat

# Load environment variables from .env.local
env_path = Path(__file__).parent.parent / ".env.local"
Expand Down Expand Up @@ -38,6 +38,7 @@
app.include_router(batch_text_classification.router)
app.include_router(multiturn_chat.router)
app.include_router(image_captioning.router)
app.include_router(image_generation.router)


@app.get("/api/health")
Expand Down
225 changes: 225 additions & 0 deletions backend/src/recipes/image_generation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
"""
Image Generation with Text-to-Image Diffusion Models

This recipe demonstrates how to generate images from text prompts using
OpenAI-compatible endpoints with Modular MAX's FLUX.2 diffusion models.
Users provide a text description and optional generation parameters, and
receive a generated image with performance metrics.

Key features:
- Text-to-image generation: Create images from natural language descriptions
- Configurable parameters: Resolution, inference steps, guidance scale
- Performance metrics: Total generation duration tracking
- Negative prompts: Specify content to avoid in generated images
- OpenAI-compatible: Works with any endpoint supporting the images API

Architecture:
- FastAPI endpoint: Receives generation requests with prompt and parameters
- AsyncOpenAI client: Handles image generation via client.responses.create()
- MAX-specific parameters: Passed via extra_body.provider_options.image
- Performance tracking: Measures total generation time in milliseconds

Request Format:
- endpointId: Which LLM endpoint to use
- modelName: Which model to use (e.g., "flux2-dev-fp4")
- prompt: Text description of the image to generate
- width/height: Output image dimensions (default 1024x1024)
- steps: Number of denoising iterations (default 28)
- guidance_scale: Prompt adherence strength (default 3.5)
- negative_prompt: Content to avoid in the generated image

Response Format:
- JSON object with base64-encoded image data and generation metrics
- Fields: image_b64, width, height, duration
"""

import time

import httpx
from fastapi import APIRouter, HTTPException
from fastapi.responses import Response
from pydantic import BaseModel

from ..core.endpoints import get_cached_endpoint
from ..core.code_reader import read_source_file

router = APIRouter(prefix="/api/recipes", tags=["recipes"])


# ============================================================================
# Types and Models
# ============================================================================


class ImageGenerationRequest(BaseModel):
"""
Request body for image generation.

The frontend sends the endpoint ID, model name, a text prompt, and
optional generation parameters. The backend looks up the actual API
credentials from the endpoint ID and generates the image.
"""
endpointId: str
modelName: str
prompt: str
width: int = 1024
height: int = 1024
steps: int = 28
guidance_scale: float = 3.5
negative_prompt: str = ""


class ImageGenerationResult(BaseModel):
"""
Result of generating an image from a text prompt.

Contains the base64-encoded image data along with the dimensions
and performance metrics (duration in milliseconds).
"""
image_b64: str
width: int
height: int
duration: int


# ============================================================================
# API Endpoints
# ============================================================================


@router.post("/image-generation")
async def image_generation(request: ImageGenerationRequest) -> ImageGenerationResult:
"""
Image generation endpoint using OpenAI-compatible images API.

Accepts a text prompt and generation parameters, then returns a
base64-encoded image along with performance metrics.

The endpoint uses client.responses.create() from the OpenAI SDK,
which maps to the /v1/responses API (Modular Open Responses standard).
MAX-specific parameters are passed via extra_body.provider_options.image.

Args:
request: ImageGenerationRequest with prompt and generation parameters

Returns:
ImageGenerationResult with base64 image data, dimensions, and duration

Raises:
HTTPException: If endpoint not found, invalid configuration, or
upstream API failure
"""
# Get endpoint configuration from cache. The endpoint ID comes from the
# frontend and maps to a full endpoint configuration (baseUrl, apiKey)
# stored in .env.local. This keeps API keys secure on the server side.
endpoint = get_cached_endpoint(request.endpointId)
if not endpoint:
raise HTTPException(
status_code=400,
detail=f"Endpoint not found: {request.endpointId}"
)

base_url = endpoint.get("baseUrl")
api_key = endpoint.get("apiKey")

if not base_url or not api_key:
raise HTTPException(
status_code=500,
detail="Invalid endpoint configuration: missing baseUrl or apiKey"
)

# Build provider_options for MAX-specific generation parameters.
# The Modular MAX API uses the Open Responses standard (/v1/responses)
# with image parameters nested under provider_options.image.
image_options: dict = {
"width": request.width,
"height": request.height,
"steps": request.steps,
"guidance_scale": request.guidance_scale,
}
if request.negative_prompt:
image_options["negative_prompt"] = request.negative_prompt

payload = {
"model": request.modelName,
"input": request.prompt,
"provider_options": {"image": image_options},
}

try:
# Start timing for duration measurement
start_time = time.time()

# Use httpx directly to avoid OpenAI SDK response parsing incompatibility
# with the Modular Open Responses API (/v1/responses).
async with httpx.AsyncClient() as http_client:
resp = await http_client.post(
f"{base_url.rstrip('/')}/responses",
json=payload,
headers={"Authorization": f"Bearer {api_key}"},
timeout=300,
)

if resp.status_code != 200:
raise HTTPException(
status_code=502,
detail=f"Upstream error {resp.status_code}: {resp.text}",
)

# Calculate total generation duration in milliseconds
duration_ms = int((time.time() - start_time) * 1000)

# Extract base64 image data from output[0].content[0].image_data
data = resp.json()
try:
image_b64 = data["output"][0]["content"][0]["image_data"]
except (KeyError, IndexError, TypeError):
raise HTTPException(
status_code=502,
detail="No image data in response from upstream endpoint"
)

if not image_b64:
raise HTTPException(
status_code=502,
detail="No image data in response from upstream endpoint"
)

return ImageGenerationResult(
image_b64=image_b64,
width=request.width,
height=request.height,
duration=duration_ms,
)

except HTTPException:
# Re-raise our own HTTP exceptions (like the empty response check above)
raise
except Exception as error:
# Catch upstream API errors (connection failures, rate limits, etc.)
# and return a 502 to indicate the upstream service failed.
raise HTTPException(
status_code=502,
detail=f"Image generation failed: {str(error)}"
)


@router.get("/image-generation/code")
async def get_image_generation_code():
"""
Get the source code for the image generation recipe.

Returns the Python source code of this file as plain text.
This enables the frontend's "Code" view to display the backend implementation.
"""
try:
# Use __file__ to get the path to this source file, then read it.
# This allows the frontend to display the actual backend code for
# educational purposes.
code_data = read_source_file(__file__)
return Response(content=code_data, media_type="text/plain")
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error reading source code: {str(e)}"
)
Loading