A sequential multi-agent AI system that researches, writes, and evaluates high-converting e-commerce product descriptions for footwear β fully automated, end-to-end.
π Live Demo: web-production-c9ce1.up.railway.app
- Problem Statement
- Why an Agentic Approach
- System Architecture
- Agent Workflow
- Task Decomposition
- LLM-as-Judge
- Technology Stack
- Key Features
- Setup & Installation
- Deployment
- Example Output
- Project Structure
- Demo Video
- Team
Writing compelling product descriptions for footwear is time-consuming, inconsistent, and requires both domain knowledge and copywriting expertise. E-commerce sellers and brands often resort to generic, uninspiring copy that fails to convert browsers into buyers.
ShoeScribe AI solves this by deploying a pipeline of four specialized AI agents that:
- Research real competitor data and product features from the web
- Extract marketing insights β features, pain points, benefits, and USPs
- Write polished, conversion-optimized product descriptions
- Evaluate the output across five quality dimensions using an LLM-as-Judge
A single LLM prompt cannot reliably accomplish this task because the workflow involves four fundamentally different reasoning capabilities that must execute in a strict dependency chain:
- Web Retrieval β fetching real, grounded product data from the internet (requires an external tool)
- Structured Extraction β converting raw web text into actionable marketing categories
- Creative Generation β producing persuasive, audience-specific product copy from structured data
- Critical Evaluation β scoring output quality across professional dimensions with specific justifications
Each agent depends entirely on the output of its predecessor. This sequential dependency makes a pipeline architecture the only viable design. Additionally, an agentic approach allows each agent to be independently prompted, validated, and improved β a modularity that a monolithic prompt cannot provide.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER (Streamlit UI) β
β Inputs: product_name + category β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β run_pipeline(name, category)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR (orchestrator.py) β
β Coordinates sequential agent execution β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β AGENT 1 ββββββ π Tavily Search API
β Research Agent β tavily_tool.py
β research_agent.py β
β Queries web for product β
β features & competitors β
β β
β OUTPUT: β
β { query: string, β
β summary: string[] } β
ββββββββββββββββ¬βββββββββββββββ
β { query, summary[] }
βΌ
ββββββββββββββββββββββββββββββββ
β AGENT 2 ββββββ π€ Groq API
β Insight Agent β llama-3.1-8b-instant
β insight_agent.py β
β Extracts features, pain β
β points, benefits, USPs β
β β
β OUTPUT: β
β { features: string[], β
β pain_points: string[], β
β benefits: string[], β
β usp_ideas: string[] } β
ββββββββββββββββ¬ββββββββββββββββ
β { features, pain_points... }
βΌ
ββββββββββββββββββββββββββββββββ
β AGENT 3 ββββββ π€ Groq API
β Copywriting Agent β llama-3.1-8b-instant
β copywriting_agent.py β
β Generates short desc, β
β long desc, USP bullets β
β β
β OUTPUT: β
β { short_description, β
β long_description, β
β usp_bullets: string[] } β
ββββββββββββββββ¬ββββββββββββββββ
β { short_desc, long_desc... }
βΌ
ββββββββββββββββββββββββββββββββ
β AGENT 4 ββββββ π€ Groq API
β Judge Agent (LLM-as-Judge) β llama-3.1-8b-instant
β judge_agent.py β
β Scores copy on 5 dimensions β
β with 1β5 integer rubric β
β β
β OUTPUT: β
β { scores: { β
β clarity: {score, reason}, β
β persuasiveness: {...}, β
β differentiation: {...}, β
β feature_relevance: {...}, β
β conversion_potential:{...}β
β }, overall_feedback } β
ββββββββββββββββ¬ββββββββββββββββ
β Final result object
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β STREAMLIT UI β Final Rendered Output β
β Market Insights β’ Product Description (short + long β
β + USP bullets) β’ Quality Evaluation (scores + β
β per-dimension reasons + overall feedback) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Full visual diagram:
Submission_Files/Architecture_diagram.png
| Step | Agent | Input | Tool | Output |
|---|---|---|---|---|
| 1 | Research Agent | product_name, category |
Tavily Search API | { query, summary[] } |
| 2 | Insight Agent | Research summary | Groq LLM | { features[], pain_points[], benefits[], usp_ideas[] } |
| 3 | Copywriting Agent | Insight JSON | Groq LLM | { short_description, long_description, usp_bullets[] } |
| 4 | Judge Agent | Copy JSON | Groq LLM | { scores: { dimension: {score, reason} }, overall_feedback } |
Each agent has a clearly defined, independent responsibility:
Agent 1 β Research Agent (research_agent.py)
- Constructs a targeted search query from product name and category
- Calls Tavily Search API and retrieves up to 5 real web results
- Returns structured
{ query, summary[] }for downstream use
Agent 2 β Insight Agent (insight_agent.py)
- Transforms raw web text into structured marketing intelligence
- Enforces schema: minimum 5 items per category with category-aware fallbacks
- Strips markdown code fences from LLM output before JSON parsing
- Extracts:
features,pain_points,benefits,usp_ideas
Agent 3 β Copywriting Agent (copywriting_agent.py)
- Generates short description (max 30 words), long description (max 100 words), and 4β6 USP bullets
- Persona-primed as an expert e-commerce copywriter
- Grounded only in Insight Agent output β no hallucinated claims
- Robust JSON parsing with schema validation and safe defaults on failure
Agent 4 β Judge Agent (judge_agent.py)
- Evaluates generated copy across 5 professional dimensions
- Returns integer score (1β5) + one-sentence reason per dimension
- Provides overall actionable feedback for improvement
π Full specs:
task_decomposition_specifications.md
The Judge Agent acts as an automated quality reviewer, evaluating copy without any human involvement:
| Dimension | What It Measures |
|---|---|
| Clarity | Readability and absence of jargon for a general buyer |
| Persuasiveness | Emotional hooks and benefit emphasis that drive consideration |
| Differentiation | Uniqueness vs. generic phrases β does this stand out? |
| Feature Relevance | Importance of highlighted features to footwear buyers |
| Conversion Potential | CTA strength, urgency, and clarity of value proposition |
Scoring Rubric: 1 = Poor Β· 2 = Weak Β· 3 = Adequate Β· 4 = Good Β· 5 = Excellent
The judge is strictly prompted to cite evidence from the copy in each reason, preventing generic evaluations.
| Layer | Technology |
|---|---|
| LLM Inference | Groq API (llama-3.1-8b-instant) |
| Web Search Tool | Tavily Search API |
| UI Framework | Streamlit |
| Deployment | Railway |
| Config Management | python-dotenv |
| HTTP Client | requests |
| Language | Python 3.10+ |
- 4-agent sequential pipeline β each agent specialised for one reasoning task
- Real web research via Tavily β no hallucinated product claims
- Structured JSON outputs with schema validation and safe fallbacks at every stage
- LLM-as-Judge β automated 5-dimension quality evaluation with actionable feedback
- Live pipeline tracker β Streamlit UI shows Research β Insights β Writing β Evaluate in real time
- Regenerate button β re-run the full pipeline with one click
- Deployment-ready β live on Railway with a public URL
- Python 3.10+
- A Groq API key
- A Tavily API key
git clone https://github.com/upadhyayraman22/ShoeScribe-AI.git
cd ShoeScribe-AIpip install -r requirements.txtCreate a .env file in the project root:
GROQ_API_KEY=your_groq_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
streamlit run app.pyOpen http://localhost:8501 in your browser.
How to Use:
- Enter a Product Name (e.g.,
Puma Softride Pro Echo Consonance) - Enter the Category (e.g.,
Walking Shoes) - Click Generate
- Watch the 4-stage pipeline execute in real time
- Get your ready-to-publish product description + quality scores
The app is deployed on Railway and accessible at:
π web-production-c9ce1.up.railway.app
| Variable | Description |
|---|---|
GROQ_API_KEY |
Your Groq API key for LLM inference |
TAVILY_API_KEY |
Your Tavily API key for web search |
β οΈ Never commit your.envfile. It is listed in.gitignore.
Input: Puma Softride Premier GlideKnit | Walking Shoes
Market Insights extracted:
- Features: Lightweight cushioning, Engineered knit upper, One-piece construction, Slip-on closure, Soft cushioning
- Pain Points: Foot pain, Sweating, Poor fit, Lack of cushioning, Discomfort
- Benefits: All-day comfort, Reduces fatigue, Improves performance, Enhanced comfort, Provides support
- USP Ideas: Cutting-edge cushioning tech, Environmentally-friendly materials, Ergonomic shoe design for better fit
Short Description:
"Experience unparalleled comfort with the Puma Softride Premier GlideKnit. Enjoy all-day comfort and support on your feet."
Key Selling Points:
- Lightweight Cushioning β Reduces fatigue and improves performance
- Engineered Knit Upper β Provides a secure, easy fit
- One-Piece Construction β Offers enhanced comfort and support
- Slip-On Closure β Easy to put on and take off
- Soft Cushioning β Provides comfort and support all day
- Cutting-edge Cushioning Tech β Improves overall comfort and performance
Quality Evaluation:
| Dimension | Score | Reason |
|---|---|---|
| Clarity | 4/5 | Simple language and logical flow, easy to understand |
| Persuasiveness | 4/5 | Effectively emphasizes comfort and support with emotional hooks |
| Differentiation | 3/5 | Highlights unique features but some language is generic |
| Feature Relevance | 5/5 | Lightweight cushioning and slip-on closure are highly relevant |
| Conversion Potential | 4/5 | Clear CTA tone but lacks urgency or specific value mention |
| Overall | 4.0 / 5 |
Feedback: Consider adding more nuanced language to differentiate the product and including specific values or benefits to reinforce the CTA.
ShoeScribe-AI/
βββ app.py # Streamlit frontend
βββ orchestrator.py # Pipeline coordinator
βββ config.py # Groq client setup
βββ style.css # Custom UI styling
βββ requirements.txt # Python dependencies
βββ Procfile # Railway deployment config
βββ task_decomposition_specifications.md # Full agent specs & design doc
βββ agents/
β βββ research_agent.py # Agent 1: Tavily web search
β βββ insight_agent.py # Agent 2: Feature extraction
β βββ copywriting_agent.py # Agent 3: Copy generation
β βββ judge_agent.py # Agent 4: LLM-as-Judge
βββ tools/
β βββ tavily_tool.py # Tavily API wrapper
βββ Submission_Files/
βββ README.md # Demo video link
βββ Problem_Statement_ShoeScribe_AI.docx
βββ task_decomposition_specifications.md
βββ Architecture_Diagram.excalidraw
πΉ Loom walkthrough: https://loom.com/share/5527f294cfe8456c88af9710d69766cc
The video covers:
- Problem statement and motivation
- End-to-end demo with a real product
- LLM-as-Judge evaluation in action
| Role | Member | Roll No. |
|---|---|---|
| Role A β Architect & Integrator | S. Devanshu Murthy | 11 |
| Role B β Builder & Deployer | Raman Upadhyay | 10 |
Semester: IV Β· B.Tech ECE-B Department: Electronics and Communication Engineering Date: 24/04/2026
- Task Decomposition & Specifications β Full agent specs, input/output schemas, error handling, and design decisions.
- Submission Files β Architecture diagram, problem statement, and demo video.
Built for AI Agent Systems Design course project.