Shiken Jirei - AI Test Case Generator

An intelligent, RAG-powered Streamlit application that generates comprehensive test cases from user stories in multiple languages using advanced LLMs and vector embeddings.

Features

Generate test cases from user stories using AI
Multi-language support: English, Hindi, Tamil, Telugu, Malayalam, Kannada
RAG (Retrieval-Augmented Generation): Context-aware generation using stored embeddings
LanceDB integration: Vector store for semantic search and retrieval
Professional UI: Built with Streamlit for ease of use
Export options: Download test cases as JSON or Excel (XLSX)
Batch processing: Configurable batch sizes for test case generation
Data ingestion pipeline: Extract text from PDFs, DOCX, images, CSV, XLSX files

Project Structure

.
 streamlit/
    streamlit_app.py          # Main Streamlit web application
 llm/
    run_app.py                # Startup script with environment checks
    datapipeline.py           # Data ingestion & LanceDB embedding pipeline
    check_database.py          # Utility to inspect LanceDB table contents
 lancedb_data/
    testcases.lance/          # LanceDB vector index (RAG data)
 requirements.txt               # Python package dependencies
 run_app.py                     # Convenience wrapper to launch the app
 .env.example                   # Environment variables template
 .gitignore                     
 README.md

File Descriptions

Core Application

streamlit/streamlit_app.py

Purpose: Main web UI for the test case generator
Key functions:
- initialize_components(): Load SentenceTransformer and connect to LanceDB
- retrieve_context(): RAG function to fetch relevant examples from LanceDB
- generate_testcases_rag(): Generate test cases using LLM with RAG context
- create_excel_file(): Export test cases to XLSX format
- parse_structured_testcase(): Parse raw LLM output into structured test cases
Dependencies: streamlit, lancedb, sentence_transformers, requests, pandas, openpyxl

Backend & Utilities

llm/run_app.py

Purpose: Application launcher with dependency and environment checks
Checks:
- Verifies API keys (HF_TOKEN1, opcode) are set
- Validates Python packages are installed
- Locates and runs streamlit/streamlit_app.py
Usage: python run_app.py from repo root

llm/datapipeline.py

Purpose: Data ingestion pipeline for building the RAG database
Processes:
- Extracts text from multiple file types (PDF, DOCX, TXT, images, CSV, XLSX)
- Chunks text into semantic segments (500-word chunks with 50-word overlap)
- Generates embeddings using SentenceTransformer (all-mpnet-base-v2)
- Stores embeddings in LanceDB testcases table
Folder handling: input/, success/, failure/ for file organization
Usage: python llm/datapipeline.py (requires files in input/ folder)

llm/check_database.py

Purpose: Quick diagnostics for LanceDB
Shows:
- Number of rows in testcases table
- Column names and data types
- NaN and empty value counts
Usage: python llm/check_database.py

Configuration & Dependencies

requirements.txt

Lists all Python package dependencies
Key packages:
- streamlit: Web UI framework
- lancedb: Vector database
- sentence-transformers: Embeddings model
- transformers: LLM model libraries
- requests: HTTP client for API calls
- pandas, openpyxl: Data processing & Excel export
- PyPDF2, python-docx, Pillow: File parsing

run_app.py

Convenience wrapper that calls llm/run_app.main()
Simplifies startup: python run_app.py

.env.example

Template for required environment variables

Must be copied to .env and filled with real values:

HF_TOKEN1=your_huggingface_api_key
opcode=your_openrouter_api_key

lancedb_data/testcases.lance/

LanceDB vector index containing stored test case embeddings
Used by RAG retrieval to find contextually relevant examples
Pre-populated with sample test cases for immediate use
Can be extended by running llm/datapipeline.py

Quick Start

1. Clone the Repository

git clone https://github.com/Carol-here/Testcase-Generation-using-RAG-based-AI.git
cd Testcase-Generation-using-RAG-based-AI

2. Create Virtual Environment

python -m venv .venv
.\.venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Environment Variables

copy .env.example .env
# Edit .env and add your API keys:
#   HF_TOKEN1 = Your Hugging Face API token (https://huggingface.co/settings/tokens)
#   opcode = Your OpenRouter API key (https://openrouter.ai/keys)

5. Run the Application

Option A (Recommended):

python run_app.py

Option B (Direct):

streamlit run streamlit/streamlit_app.py

The app will open at http://localhost:8501

How to Use

Select Language: Choose from English, Hindi, Tamil, Telugu, Malayalam, Kannada
Configure Generation: Set number of test cases (5-50) and batch size (1-10)
Enter User Story: Paste a user story or feature description in the text area
Generate: Click "Generate Test Cases" to start AI-powered generation
Export: Download results as JSON or Excel

RAG (Retrieval-Augmented Generation) Workflow

User enters a user story
retrieve_context() encodes the story and searches LanceDB for similar test cases
Top-5 relevant test cases are retrieved from the vector index
LLM is prompted with both the user story AND the retrieved context
LLM generates new, contextually-aware test cases
Results are parsed, formatted, and displayed

Tech Stack

Core Framework

Streamlit (v1.28.1): Web application framework
Python 3.10+: Programming language

Machine Learning & NLP

Transformers (v4.56.0): Large language models (HuggingFace)
Sentence Transformers (v5.1.0): Text embeddings (all-mpnet-base-v2)
PyTorch (v2.8.0): Deep learning backend
Scikit-learn (v1.7.1): ML utilities
NumPy (v2.2.6): Numerical computing

Vector Database

LanceDB (v0.24.3): Vector index for RAG

Data Processing

Pandas (v2.0.3): Data manipulation
PyArrow (v21.0.0): Data serialization
OpenPyXL (v3.1.2): Excel file handling
PyPDF2 (v3.0.1): PDF parsing
python-docx (v1.2.0): DOCX parsing
Pillow (v11.3.0): Image processing
pytesseract: OCR for image text extraction

APIs & External Services

Hugging Face API: English test case generation (Llama 3.2)
OpenRouter API: Multi-language LLM access

Development & Utilities

python-dotenv (v1.0.0): Environment variable management
Requests (v2.32.5): HTTP client for API calls

Future Enhancements (Roadmap)

Near-term (v1.1)

Custom embeddings model selection: Allow users to choose embedding models
Database persistence UI: Add admin panel to view/manage LanceDB
Test case templates: Provide domain-specific templates (Web, Mobile, API, etc.)
Batch upload: Support uploading multiple user stories at once
Test case evaluation: Add scoring/rating of generated test cases

Medium-term (v1.2)

Multi-user collaboration: Shared workspaces and version control
Custom LLM models: Support for local or self-hosted LLMs
Test case validation: Automated checks for duplicate/conflicting test cases
Analytics dashboard: Track generation metrics, success rates, language performance
API endpoint: REST API for programmatic access

Long-term (v2.0)

Test execution integration: Connect to testing frameworks (Selenium, Cypress, pytest)
Automated test generation: AI-driven test code generation (Gherkin/BDD)
Requirements traceability: Link test cases to requirements documents
ML model fine-tuning: Domain-specific model training for improved quality
Multi-modal support: Accept diagrams, screenshots, and video as input

Configuration & Environment Variables

Required environment variables (set in .env):

# Hugging Face API (for English test case generation)
HF_TOKEN1=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

# OpenRouter API (for multi-language support)
opcode=sk-or-v1-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Optional environment variables:

# LanceDB path (default: lancedb_data)
LANCEDB_PATH=lancedb_data

# Streamlit port (default: 8501)
STREAMLIT_SERVER_PORT=8501

Troubleshooting

App won't start

Check all dependencies: pip install -r requirements.txt
Verify .env file exists with valid API keys
Ensure you're running from the repo root directory

LanceDB connection errors

Verify lancedb_data/testcases.lance/ directory exists
Run python llm/check_database.py to diagnose
If corrupted, run python llm/datapipeline.py to rebuild

API errors (Hugging Face / OpenRouter)

Verify API keys in .env are correct and active
Check internet connection
Review API quota/rate limits

Test case generation is slow

Reduce batch size to 1-3 to make API calls smaller
Reduce number of test cases to 5-10
Check internet connection speed

License

This project is provided as-is for educational and development purposes.

Support

For issues, questions, do contact me.

Built for intelligent test case generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shiken Jirei - AI Test Case Generator

Features

Project Structure

File Descriptions

Core Application

Backend & Utilities

Configuration & Dependencies

Quick Start

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Set Up Environment Variables

5. Run the Application

How to Use

RAG (Retrieval-Augmented Generation) Workflow

Tech Stack

Core Framework

Machine Learning & NLP

Vector Database

Data Processing

APIs & External Services

Development & Utilities

Future Enhancements (Roadmap)

Near-term (v1.1)

Medium-term (v1.2)

Long-term (v2.0)

Configuration & Environment Variables

Troubleshooting

App won't start

LanceDB connection errors

API errors (Hugging Face / OpenRouter)

Test case generation is slow

License

Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Shiken Jirei - AI Test Case Generator

Features

Project Structure

File Descriptions

Core Application

Backend & Utilities

Configuration & Dependencies

Quick Start

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Set Up Environment Variables

5. Run the Application

How to Use

RAG (Retrieval-Augmented Generation) Workflow

Tech Stack

Core Framework

Machine Learning & NLP

Vector Database

Data Processing

APIs & External Services

Development & Utilities

Future Enhancements (Roadmap)

Near-term (v1.1)

Medium-term (v1.2)

Long-term (v2.0)

Configuration & Environment Variables

Troubleshooting

App won't start

LanceDB connection errors

API errors (Hugging Face / OpenRouter)

Test case generation is slow

License

Support