Skip to content

rafael-vaz/vision-caption

Repository files navigation

Vision Caption

Generate image captions and audio descriptions with AI, in a fast and intuitive way.

Deploy Vercel React Vite TypeScript Gemini

Vision Caption preview

LayoutFeaturesGetting StartedUseful ScriptsSupported ImagesHow to UseCommon IssuesPreview

🎨 Layout / Figma

Project prototype:

⚡ Features

  • Upload or image URL
  • Caption generation with Gemini (Google GenAI)
  • Easy copy to clipboard
  • Audio description (voice)

🛠️ Getting Started

  1. Install dependencies:
npm install
  1. Create a .env file in the project root:
VITE_GOOGLE_GENAI_API_KEY=your_key_here

(Get your key from Google AI Studio)

  1. Start in dev mode:
npm run dev

📦 Useful Scripts

  • npm run dev — start local server
  • npm run build — production build
  • npm run preview — preview production build
  • npm run lint — code linting

🖼️ Supported Images

  • JPG, PNG, GIF, WEBP, BMP
  • Up to 5 MB

🚀 How to Use

  1. Enter the image URL or upload a file
  2. Click "Generate descriptions"
  3. Copy the text or listen to the audio description

❓ Common Issues

  • Error generating caption: check VITE_GOOGLE_GENAI_API_KEY in .env
  • Image not loading: make sure the URL is public
  • Audio not playing: use a modern browser (Web Speech API)

🔗 Preview

Access the live app here

About

Web app that generates AI-powered image captions and audio descriptions using Google Gemini, with support for image upload or URL input.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors