Skip to content

yoshi-4/cardsense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

🔍 CardSense — Next-Gen Business Intelligence Dashboard (Prototype)

AI-powered business card scanner that instantly delivers company intelligence, strategic insights, and contact profiles — all from a single photo. React Python Gemini FastAPI Tailwind CSS


✨ What It Does

CardSense transforms a business card photo into a full intelligence briefing in seconds. Designed for sales professionals, recruiters, and business developers who need instant context before meetings.

Demo Flow

  1. 📸 Scan — Snap a photo of a business card (or enter details manually)
  2. 🤖 AI Analysis — Three specialized agents research the person and company in real-time
  3. 📊 Dashboard — View structured intelligence across multiple dimensions

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                    Frontend (React 18)               │
│         Babel Standalone + Tailwind CSS              │
│                  Mobile-First SPA                    │
└──────────────────┬──────────────────────────────────┘
                   │ REST API
┌──────────────────▼──────────────────────────────────┐
│                Backend (FastAPI + Uvicorn)            │
│                                                      │
│  ┌────────────┐  ┌────────────────────────────────┐  │
│  │  VLM Scan  │  │     Agent Orchestrator         │  │
│  │  (OCR)     │  │                                │  │
│  │            │  │  ┌──────┐ ┌────────┐ ┌──────┐  │  │
│  │ Image →    │  │  │Macro │ │Strategy│ │Person│  │  │
│  │ Structured │  │  │Agent │ │Agent   │ │Agent │  │  │
│  │ Data       │  │  └──┬───┘ └───┬────┘ └──┬───┘  │  │
│  └────────────┘  │     │         │         │      │  │
│                  └─────┼─────────┼─────────┼──────┘  │
│                        │         │         │         │
└────────────────────────┼─────────┼─────────┼─────────┘
                         ▼         ▼         ▼
               Google Gemini API + Search Grounding

VLM Business Card Scanner

The Vision Language Model (VLM) module extracts structured data from business card images:

  • Company name, person name (Kanji + Romaji), title, department
  • Contact details (email, phone, URL, address)
  • "Vibe" analysis — Classifies card design aesthetics (e.g., "Innovative & Ambitious", "Traditional & Solid") to infer corporate culture
  • Confidence scoring (0.0–1.0) based on image clarity

Three-Agent Intelligence System

Each agent leverages Google Search Grounding for real-time web research:

Agent Role Output
🏢 Macro Agent Company IR Analyst Company overview, financials, latest news (last 3 months)
🔧 Strategy Agent Tech Strategy Consultant Tech stack analysis, hiring pattern insights, growth direction
👤 Person Agent Executive Headhunter Professional profile, career timeline, social media links

Identity Lock — Hallucination Prevention

The Person Agent implements strict identity verification to prevent AI hallucination:

Identity Lock Protocol:
1. Search queries MUST include "{name}" AND "{company}"
2. Cross-reference temporal consistency (career timeline)
3. Verify domain expertise alignment
4. If uncertain → explicitly state "limited public information found"
5. Never fabricate or guess — absent data = "not found"
6. Past affiliations are noted as "previously at {company}" with caveats

This ensures the AI never confuses the target person with someone who has the same name, and never fills in gaps with invented information.

🛠️ Tech Stack

Layer Technology Purpose
Frontend React 18 + TypeScript Single-page application (transpiled via Babel Standalone)
Styling Tailwind CSS 3.x Utility-first CSS with dark mode, glassmorphism effects
Backend Python 3.11+ / FastAPI REST API server with async support
AI Model Google Gemini 2.0 Flash VLM scanning + agent reasoning (configurable via .env)
Search Google Search Grounding Real-time web research for each agent
Server Uvicorn ASGI server with hot-reload

🚀 Getting Started

Prerequisites

  • Python 3.11+
  • Node.js (for optional tunnel access)
  • Google Gemini API Key (Get one here)

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/cardsense.git
cd cardsense
# Install Python dependencies
cd backend
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Running

# Start the server
cd backend
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Open http://localhost:8000 in your browser.

Mobile Testing (Optional)

To test on a mobile device with camera access (requires HTTPS):

npx localtunnel --port 8000
# Opens a public HTTPS URL you can access from your phone

⚙️ Configuration

All model settings are managed via backend/.env:

GEMINI_API_KEY=your_api_key_here
MODEL_ID=gemini-2.0-flash           # Main model for all agents
DEEP_RESEARCH_MODEL_ID=gemini-2.0-flash  # Model for deep research feature

You can swap models without changing any code — just edit .env and the server auto-reloads.

📁 Project Structure

cardsense/
├── frontend/
│   ├── index.html          # Entry point (loads React + Babel via CDN)
│   └── index.tsx           # Full SPA (components, state, routing)
├── backend/
│   ├── main.py             # FastAPI app, routes, static file serving
│   ├── models.py           # Pydantic data models
│   ├── vlm.py              # Vision Language Model — business card OCR
│   ├── requirements.txt    # Python dependencies
│   ├── .env.example        # Environment template
│   └── agents/
│       ├── __init__.py     # Agent module exports
│       ├── macro_agent.py  # Company overview + news agent
│       ├── strategy_agent.py # Tech & hiring strategy agent
│       └── person_agent.py # Person profile + career agent

📝 Design Decisions

  • No build step required — Frontend uses Babel Standalone for in-browser TypeScript transpilation, making it instantly deployable without npm/webpack configuration
  • Sequential agent execution — Agents run one at a time with configurable delays to respect API rate limits on free-tier plans
  • Mobile-first UI — Large camera button, touch-friendly interface optimized for on-the-go use at business events
  • Graceful degradation — Each agent handles errors independently; if one fails, others still display results

📄 License

MIT License — feel free to use, modify, and distribute.

About

AI-powered business card scanner with multi-agent intelligence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors