Skip to content

upadhyayraman22/ShoeScribe-AI

Repository files navigation

πŸ‘Ÿ ShoeScribe AI

Product descriptions that don't just describe β€” they sell.

A sequential multi-agent AI system that researches, writes, and evaluates high-converting e-commerce product descriptions for footwear β€” fully automated, end-to-end.

πŸ”— Live Demo: web-production-c9ce1.up.railway.app


πŸ“‹ Table of Contents

  1. Problem Statement
  2. Why an Agentic Approach
  3. System Architecture
  4. Agent Workflow
  5. Task Decomposition
  6. LLM-as-Judge
  7. Technology Stack
  8. Key Features
  9. Setup & Installation
  10. Deployment
  11. Example Output
  12. Project Structure
  13. Demo Video
  14. Team

πŸ“Œ Problem Statement

Writing compelling product descriptions for footwear is time-consuming, inconsistent, and requires both domain knowledge and copywriting expertise. E-commerce sellers and brands often resort to generic, uninspiring copy that fails to convert browsers into buyers.

ShoeScribe AI solves this by deploying a pipeline of four specialized AI agents that:

  1. Research real competitor data and product features from the web
  2. Extract marketing insights β€” features, pain points, benefits, and USPs
  3. Write polished, conversion-optimized product descriptions
  4. Evaluate the output across five quality dimensions using an LLM-as-Judge

🧠 Why an Agentic Approach

A single LLM prompt cannot reliably accomplish this task because the workflow involves four fundamentally different reasoning capabilities that must execute in a strict dependency chain:

  • Web Retrieval β€” fetching real, grounded product data from the internet (requires an external tool)
  • Structured Extraction β€” converting raw web text into actionable marketing categories
  • Creative Generation β€” producing persuasive, audience-specific product copy from structured data
  • Critical Evaluation β€” scoring output quality across professional dimensions with specific justifications

Each agent depends entirely on the output of its predecessor. This sequential dependency makes a pipeline architecture the only viable design. Additionally, an agentic approach allows each agent to be independently prompted, validated, and improved β€” a modularity that a monolithic prompt cannot provide.


πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    USER  (Streamlit UI)                  β”‚
β”‚              Inputs: product_name + category             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚  run_pipeline(name, category)
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              ORCHESTRATOR  (orchestrator.py)             β”‚
β”‚          Coordinates sequential agent execution          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚         AGENT 1             │◄──── πŸ”Ž Tavily Search API
          β”‚      Research Agent         β”‚          tavily_tool.py
          β”‚    research_agent.py        β”‚
          β”‚  Queries web for product    β”‚
          β”‚  features & competitors     β”‚
          β”‚                             β”‚
          β”‚  OUTPUT:                    β”‚
          β”‚  { query: string,           β”‚
          β”‚    summary: string[] }      β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ { query, summary[] }
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚         AGENT 2              │◄──── πŸ€– Groq API
          β”‚       Insight Agent          β”‚    llama-3.1-8b-instant
          β”‚      insight_agent.py        β”‚
          β”‚  Extracts features, pain     β”‚
          β”‚  points, benefits, USPs      β”‚
          β”‚                              β”‚
          β”‚  OUTPUT:                     β”‚
          β”‚  { features: string[],       β”‚
          β”‚    pain_points: string[],    β”‚
          β”‚    benefits: string[],       β”‚
          β”‚    usp_ideas: string[] }     β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ { features, pain_points... }
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚         AGENT 3              │◄──── πŸ€– Groq API
          β”‚     Copywriting Agent        β”‚    llama-3.1-8b-instant
          β”‚   copywriting_agent.py       β”‚
          β”‚  Generates short desc,       β”‚
          β”‚  long desc, USP bullets      β”‚
          β”‚                              β”‚
          β”‚  OUTPUT:                     β”‚
          β”‚  { short_description,        β”‚
          β”‚    long_description,         β”‚
          β”‚    usp_bullets: string[] }   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ { short_desc, long_desc... }
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚         AGENT 4              │◄──── πŸ€– Groq API
          β”‚  Judge Agent (LLM-as-Judge)  β”‚    llama-3.1-8b-instant
          β”‚      judge_agent.py          β”‚
          β”‚  Scores copy on 5 dimensions β”‚
          β”‚  with 1–5 integer rubric     β”‚
          β”‚                              β”‚
          β”‚  OUTPUT:                     β”‚
          β”‚  { scores: {                 β”‚
          β”‚    clarity: {score, reason}, β”‚
          β”‚    persuasiveness: {...},    β”‚
          β”‚    differentiation: {...},   β”‚
          β”‚    feature_relevance: {...}, β”‚
          β”‚    conversion_potential:{...}β”‚
          β”‚  }, overall_feedback }       β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ Final result object
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               STREAMLIT UI β€” Final Rendered Output        β”‚
β”‚   Market Insights  β€’  Product Description (short + long  β”‚
β”‚   + USP bullets)  β€’  Quality Evaluation (scores +        β”‚
β”‚   per-dimension reasons + overall feedback)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Full visual diagram: Submission_Files/Architecture_diagram.png


πŸ”„ Agent Workflow

Step Agent Input Tool Output
1 Research Agent product_name, category Tavily Search API { query, summary[] }
2 Insight Agent Research summary Groq LLM { features[], pain_points[], benefits[], usp_ideas[] }
3 Copywriting Agent Insight JSON Groq LLM { short_description, long_description, usp_bullets[] }
4 Judge Agent Copy JSON Groq LLM { scores: { dimension: {score, reason} }, overall_feedback }

πŸ“ Task Decomposition

Each agent has a clearly defined, independent responsibility:

Agent 1 β€” Research Agent (research_agent.py)

  • Constructs a targeted search query from product name and category
  • Calls Tavily Search API and retrieves up to 5 real web results
  • Returns structured { query, summary[] } for downstream use

Agent 2 β€” Insight Agent (insight_agent.py)

  • Transforms raw web text into structured marketing intelligence
  • Enforces schema: minimum 5 items per category with category-aware fallbacks
  • Strips markdown code fences from LLM output before JSON parsing
  • Extracts: features, pain_points, benefits, usp_ideas

Agent 3 β€” Copywriting Agent (copywriting_agent.py)

  • Generates short description (max 30 words), long description (max 100 words), and 4–6 USP bullets
  • Persona-primed as an expert e-commerce copywriter
  • Grounded only in Insight Agent output β€” no hallucinated claims
  • Robust JSON parsing with schema validation and safe defaults on failure

Agent 4 β€” Judge Agent (judge_agent.py)

  • Evaluates generated copy across 5 professional dimensions
  • Returns integer score (1–5) + one-sentence reason per dimension
  • Provides overall actionable feedback for improvement

πŸ“„ Full specs: task_decomposition_specifications.md


βš–οΈ LLM-as-Judge

The Judge Agent acts as an automated quality reviewer, evaluating copy without any human involvement:

Dimension What It Measures
Clarity Readability and absence of jargon for a general buyer
Persuasiveness Emotional hooks and benefit emphasis that drive consideration
Differentiation Uniqueness vs. generic phrases β€” does this stand out?
Feature Relevance Importance of highlighted features to footwear buyers
Conversion Potential CTA strength, urgency, and clarity of value proposition

Scoring Rubric: 1 = Poor Β· 2 = Weak Β· 3 = Adequate Β· 4 = Good Β· 5 = Excellent

The judge is strictly prompted to cite evidence from the copy in each reason, preventing generic evaluations.


πŸ› οΈ Technology Stack

Layer Technology
LLM Inference Groq API (llama-3.1-8b-instant)
Web Search Tool Tavily Search API
UI Framework Streamlit
Deployment Railway
Config Management python-dotenv
HTTP Client requests
Language Python 3.10+

✨ Key Features

  • 4-agent sequential pipeline β€” each agent specialised for one reasoning task
  • Real web research via Tavily β€” no hallucinated product claims
  • Structured JSON outputs with schema validation and safe fallbacks at every stage
  • LLM-as-Judge β€” automated 5-dimension quality evaluation with actionable feedback
  • Live pipeline tracker β€” Streamlit UI shows Research β†’ Insights β†’ Writing β†’ Evaluate in real time
  • Regenerate button β€” re-run the full pipeline with one click
  • Deployment-ready β€” live on Railway with a public URL

πŸš€ Setup & Installation

Prerequisites

1. Clone the repository

git clone https://github.com/upadhyayraman22/ShoeScribe-AI.git
cd ShoeScribe-AI

2. Install dependencies

pip install -r requirements.txt

3. Set up environment variables

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

4. Run the app

streamlit run app.py

Open http://localhost:8501 in your browser.

How to Use:

  1. Enter a Product Name (e.g., Puma Softride Pro Echo Consonance)
  2. Enter the Category (e.g., Walking Shoes)
  3. Click Generate
  4. Watch the 4-stage pipeline execute in real time
  5. Get your ready-to-publish product description + quality scores

🌐 Deployment

The app is deployed on Railway and accessible at:

πŸ”— web-production-c9ce1.up.railway.app

Environment Variables (set in Railway dashboard)

Variable Description
GROQ_API_KEY Your Groq API key for LLM inference
TAVILY_API_KEY Your Tavily API key for web search

⚠️ Never commit your .env file. It is listed in .gitignore.


πŸ“Š Example Output

Input: Puma Softride Premier GlideKnit | Walking Shoes

Market Insights extracted:

  • Features: Lightweight cushioning, Engineered knit upper, One-piece construction, Slip-on closure, Soft cushioning
  • Pain Points: Foot pain, Sweating, Poor fit, Lack of cushioning, Discomfort
  • Benefits: All-day comfort, Reduces fatigue, Improves performance, Enhanced comfort, Provides support
  • USP Ideas: Cutting-edge cushioning tech, Environmentally-friendly materials, Ergonomic shoe design for better fit

Short Description:

"Experience unparalleled comfort with the Puma Softride Premier GlideKnit. Enjoy all-day comfort and support on your feet."

Key Selling Points:

  • Lightweight Cushioning β€” Reduces fatigue and improves performance
  • Engineered Knit Upper β€” Provides a secure, easy fit
  • One-Piece Construction β€” Offers enhanced comfort and support
  • Slip-On Closure β€” Easy to put on and take off
  • Soft Cushioning β€” Provides comfort and support all day
  • Cutting-edge Cushioning Tech β€” Improves overall comfort and performance

Quality Evaluation:

Dimension Score Reason
Clarity 4/5 Simple language and logical flow, easy to understand
Persuasiveness 4/5 Effectively emphasizes comfort and support with emotional hooks
Differentiation 3/5 Highlights unique features but some language is generic
Feature Relevance 5/5 Lightweight cushioning and slip-on closure are highly relevant
Conversion Potential 4/5 Clear CTA tone but lacks urgency or specific value mention
Overall 4.0 / 5

Feedback: Consider adding more nuanced language to differentiate the product and including specific values or benefits to reinforce the CTA.


πŸ“ Project Structure

ShoeScribe-AI/
β”œβ”€β”€ app.py                                  # Streamlit frontend
β”œβ”€β”€ orchestrator.py                         # Pipeline coordinator
β”œβ”€β”€ config.py                               # Groq client setup
β”œβ”€β”€ style.css                               # Custom UI styling
β”œβ”€β”€ requirements.txt                        # Python dependencies
β”œβ”€β”€ Procfile                                # Railway deployment config
β”œβ”€β”€ task_decomposition_specifications.md    # Full agent specs & design doc
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ research_agent.py                   # Agent 1: Tavily web search
β”‚   β”œβ”€β”€ insight_agent.py                    # Agent 2: Feature extraction
β”‚   β”œβ”€β”€ copywriting_agent.py                # Agent 3: Copy generation
β”‚   └── judge_agent.py                      # Agent 4: LLM-as-Judge
β”œβ”€β”€ tools/
β”‚   └── tavily_tool.py                      # Tavily API wrapper
└── Submission_Files/
    β”œβ”€β”€ README.md                           # Demo video link
    β”œβ”€β”€ Problem_Statement_ShoeScribe_AI.docx
    β”œβ”€β”€ task_decomposition_specifications.md
    └── Architecture_Diagram.excalidraw

πŸŽ₯ Demo Video

πŸ“Ή Loom walkthrough: https://loom.com/share/5527f294cfe8456c88af9710d69766cc

The video covers:

  • Problem statement and motivation
  • End-to-end demo with a real product
  • LLM-as-Judge evaluation in action

πŸ‘₯ Team

Role Member Roll No.
Role A β€” Architect & Integrator S. Devanshu Murthy 11
Role B β€” Builder & Deployer Raman Upadhyay 10

Semester: IV Β· B.Tech ECE-B Department: Electronics and Communication Engineering Date: 24/04/2026


πŸ“„ Documentation


Built for AI Agent Systems Design course project.

About

A multi-agent AI pipeline that researches, writes, and evaluates conversion-optimised product descriptions for footwear e-commerce. Built with Groq, Tavily, and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors