GitHub - rafael-vaz/vision-caption: Web app that generates AI-powered image captions and audio descriptions using Google Gemini, with support for image upload or URL input.

Generate image captions and audio descriptions with AI, in a fast and intuitive way.

Layout • Features • Getting Started • Useful Scripts • Supported Images • How to Use • Common Issues • Preview

🎨 Layout / Figma

Project prototype:

Vision Caption layout (Figma)

⚡ Features

Upload or image URL
Caption generation with Gemini (Google GenAI)
Easy copy to clipboard
Audio description (voice)

🛠️ Getting Started

Install dependencies:

npm install

Create a .env file in the project root:

VITE_GOOGLE_GENAI_API_KEY=your_key_here

(Get your key from Google AI Studio)

Start in dev mode:

npm run dev

📦 Useful Scripts

npm run dev — start local server
npm run build — production build
npm run preview — preview production build
npm run lint — code linting

🖼️ Supported Images

JPG, PNG, GIF, WEBP, BMP
Up to 5 MB

🚀 How to Use

Enter the image URL or upload a file
Click "Generate descriptions"
Copy the text or listen to the audio description

❓ Common Issues

Error generating caption: check VITE_GOOGLE_GENAI_API_KEY in .env
Image not loading: make sure the URL is public
Audio not playing: use a modern browser (Web Speech API)

🔗 Preview

Access the live app here

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
public/img		public/img
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎨 Layout / Figma

⚡ Features

🛠️ Getting Started

📦 Useful Scripts

🖼️ Supported Images

🚀 How to Use

❓ Common Issues

🔗 Preview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎨 Layout / Figma

⚡ Features

🛠️ Getting Started

📦 Useful Scripts

🖼️ Supported Images

🚀 How to Use

❓ Common Issues

🔗 Preview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages