A lightweight, local AI-powered assistant that combines Retrieval-Augmented Generation (RAG) with the Gemini API and GitHub repository integration. Upload reference documents, connect to your GitHub repositories, and get intelligent responses for process documentation, SOX compliance, MLOps workflows, DevOps pipelines, and more.
- π§ RAG-Powered Responses: Upload documents (.txt, .pdf, .docx) to create a knowledge base
- π€ Gemini AI Integration: Leverages Google's Gemini Pro for intelligent responses
- π GitHub Repository Connection: Access PRs, issues, workflow runs, and repository files
- β‘ GitHub Actions Control: Manually trigger workflows directly from the interface
- οΏ½ Word Document Generation: Create professionally formatted process documentation
- π» Browser-Based UI: Clean, professional interface with light blue and white theme
- πͺΆ Lightweight & Local: Runs entirely on your machine with minimal resource usage
- π ChromaDB Vector Storage: Efficient document embedding and retrieval
- π Secure Configuration: Environment-based secrets management
- π― Multi-Template Support: SOX audits, MLOps workflows, DevOps pipelines, and generic documentation
- Document internal controls and procedures
- Generate 5-section SOX control analysis reports
- Track testing procedures and results
- Create audit-ready Word documents
- Document machine learning pipelines
- Track model training and validation
- Generate deployment documentation
- Monitor ML workflow processes
- Document CI/CD pipelines
- Track build and deployment processes
- Generate pipeline documentation
- Monitor infrastructure changes
- Create structured process documentation
- Generate professional Word reports
- Track project workflows
- Document best practices and procedures
- Python 3.8+
- Gemini API Key (Get one here)
- GitHub Personal Access Token (optional, for GitHub features)
- Git (for cloning the repository)
git clone <your-repo-url>
cd github-process-manager# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtCopy the template and edit with your credentials:
# Windows
copy .env.template .env
# macOS/Linux
cp .env.template .envEdit .env file:
# Required: Gemini API Key
GEMINI_API_KEY=your_gemini_api_key_here
# Optional: GitHub Integration
GITHUB_TOKEN=your_github_personal_access_token_here
GITHUB_REPO_URL=https://github.com/username/repository
# Flask Configuration
FLASK_SECRET_KEY=your_secret_key_here
FLASK_DEBUG=TrueGetting Your API Keys:
- Gemini API Key: Visit Google AI Studio
- GitHub Token: Go to GitHub Settings β Developer Settings β Personal Access Tokens β Generate new token
- Required scopes:
repo,workflow(for triggering actions)
- Required scopes:
python app.pyThe application will be available at: http://localhost:5000
For a consistent, isolated environment, use Docker:
# 1. Configure environment
cp .env.template .env
# Edit .env with your API keys
# 2. Start the application
docker-compose up -d
# 3. View logs
docker-compose logs -f app
# 4. Access at http://localhost:5000- Install Remote - Containers extension
- Open project in VS Code
- Press
F1β "Remote-Containers: Reopen in Container" - Environment is automatically configured with all dependencies
# Stop the application
docker-compose down
# Rebuild after changes
docker-compose up -d --build
# Production mode
docker-compose -f docker-compose.prod.yml up -d
# View container shell
docker-compose exec app /bin/bashFor detailed Docker setup, see README.docker.md
- Navigate to the main Chat page
- Click "Choose File" in the upload section
- Select a document (.txt, .pdf, or .docx)
- Click "Upload" to process the document
- The document will be chunked, embedded, and stored in ChromaDB
- Go to the Settings page
- Enter your GitHub repository URL (e.g.,
https://github.com/username/repo) - Click "Connect Repository"
- Once connected, the chatbot can access PRs, issues, and workflows
- Type your question in the chat input
- The chatbot will:
- Retrieve relevant document chunks from your uploaded files
- Fetch related GitHub repository data (if connected)
- Generate a response using Gemini AI with all context
- Responses cite sources from documents and GitHub data
- Go to Settings β GitHub Actions
- Click "Load Workflows"
- Click "Trigger" on any workflow to manually start it
The application supports customizable system prompts to tailor AI responses to your needs:
- Go to Settings β AI System Prompt Configuration
- Select a template from the dropdown:
- Default - Balanced assistant for general queries
- Technical Expert - Deep technical explanations with code examples
- Security Auditor - Security-focused analysis and compliance
- Developer Assistant - Code-heavy responses with best practices
- Data Analyst - Structured analysis with metrics and insights
- Technical Educator - Clear explanations for learning purposes
- Click "Update Prompt" to apply (changes last for your session)
- See the preview to verify the selected template
- Go to Settings β AI System Prompt Configuration
- Select "Custom Prompt" from the dropdown
- Write your own system instruction in the text editor
- Click "Update Prompt" to apply
- Example custom prompt:
You are a helpful assistant specializing in cloud infrastructure. Focus on AWS best practices, security, and cost optimization. Provide actionable recommendations with specific service names.
For persistent customization across server restarts:
- Edit your
.envfile - Set one of these variables:
# Use a pre-defined template SYSTEM_PROMPT_TEMPLATE=technical_expert # Or set a custom prompt CUSTOM_SYSTEM_PROMPT="Your custom system instruction here"
- Restart the application
Available Templates: default, technical_expert, security_auditor, developer_assistant, data_analyst, technical_educator
Note: Session-based changes (via UI) take priority over .env settings until the server restarts.
The application supports configurable Word document templates with custom branding:
- SOX Audit - 5-section compliance reports (Control Objective, Risks, Testing, Results, Conclusion)
- MLOps Workflow - ML pipeline documentation (Model Overview, Data Pipeline, Training, Validation, Deployment)
- DevOps Pipeline - CI/CD documentation (Pipeline Overview, Build Steps, Quality Gates, Deployment, Monitoring)
- Generic - General purpose documentation (Overview, Components, Procedures, Results, Recommendations)
Edit your .env file to personalize generated documents:
# Project name for document headers
PROJECT_NAME=GitHub Process Manager
# Optional: Add company name to headers
COMPANY_NAME=Your Company Name
# Brand color (hex format #RRGGBB)
BRAND_COLOR=#4A90E2
# Optional: Add logo to document headers (.png, .jpg, .jpeg)
DOCUMENT_LOGO_PATH=/path/to/your/logo.png
# Default template type
DEFAULT_TEMPLATE_TYPE=genericModify document_templates.json to add new templates:
{
"templates": {
"your_template": {
"name": "Your Template Name",
"report_title": "Your Report Title",
"sections": [
{"number": 1, "title": "Section 1", "key": "Section 1"},
{"number": 2, "title": "Section 2", "key": "Section 2"}
],
"keywords": ["keyword1", "keyword2"]
}
}
}Template Features:
- Custom section structures (3-7 sections recommended)
- Keyword-based auto-detection
- Configurable headers and colors
- Optional logo support
- Professional formatting (Calibri, proper spacing, page numbers)
The application includes specialized MLOps templates and workflows for managing machine learning operations.
Located in templates/mlops/, these guides provide comprehensive MLOps best practices:
-
mlops_guide.md - Complete MLOps lifecycle guide covering:
- Model development and version control
- Experiment tracking (MLflow, Weights & Biases)
- Training best practices and reproducibility
- Model validation strategies
- Deployment strategies (Blue-Green, Canary, Shadow)
- Monitoring and drift detection
- Model retraining triggers
-
model_validation_template.md - Structured validation report template:
- Model overview and business context
- Validation methodology (unit, integration, performance, regression)
- Performance metrics and comparison with baseline
- Bias and fairness analysis
- Failure pattern analysis
- Deployment recommendations
-
deployment_checklist.md - Comprehensive pre-deployment checklist:
- Model readiness verification
- Security and compliance checks
- Monitoring and observability setup
- Testing requirements (functional, performance, integration)
- Deployment strategy selection
- Rollback procedures
-
monitoring_guide.md - Production monitoring strategies:
- Performance metrics tracking
- Data drift detection methods
- Infrastructure monitoring
- Alert configuration
- Incident response procedures
- Navigate to the Chat page
- Upload MLOps template files from
templates/mlops/ - Ask questions about ML workflows:
- "What metrics should I track for a classification model?"
- "How do I implement canary deployment for my model?"
- "What are the best practices for detecting data drift?"
- "Create a validation checklist for my model deployment"
Located in .github/workflows/mlops/, trigger workflows for automated documentation:
Model Validation Report (mlops-model-validation.yml):
- Navigate to Settings β GitHub Actions
- Select "MLOps Model Validation Report"
- Provide inputs:
- Model Name: Your model identifier
- Model Version: Semantic version (e.g., 1.2.0)
- Validation Type: unit, integration, performance, or regression
- Metrics JSON:
{"accuracy": 0.95, "f1": 0.93, "precision": 0.94}
- Click Trigger to generate a validation report document
- Download from GitHub Actions artifacts
Deployment Documentation (mlops-deployment-doc.yml):
- Select "MLOps Deployment Documentation"
- Provide inputs:
- Model Name: Model to deploy
- Model Version: Version number
- Deployment Target: staging, production, canary, or development
- Deployment Strategy: blue-green, canary, rolling, or shadow
- Generated document includes deployment plan and rollback procedures
Try these queries with MLOps templates uploaded:
Model Training:
"Document the training process for a fraud detection model with 95% accuracy"
Deployment Planning:
"Create a deployment checklist for deploying a recommendation model to production"
Monitoring Setup:
"What alerts should I configure for monitoring a prediction model in production?"
Validation Reporting:
"Generate a validation report for model version 2.1.0 with accuracy 94.2%, precision 93.8%, recall 94.5%"
The MLOps templates include guidance for integrating with popular ML platforms:
- MLflow: Experiment tracking, model registry, deployment
- Weights & Biases: Real-time metrics visualization
- TensorBoard: TensorFlow/PyTorch metrics
- Kubeflow: Kubernetes-native ML workflows
- AWS SageMaker, Google Vertex AI, Azure ML: Cloud ML platforms
Export metrics from these tools and use the GitHub Actions workflows to generate documentation with your actual performance data.
- Version Everything: Code, data, models, configurations
- Track All Experiments: Log hyperparameters, metrics, and artifacts
- Validate Before Deploying: Run all tests (unit, integration, performance)
- Monitor Continuously: Set up drift detection and performance alerts
- Document Thoroughly: Use templates for consistency
- Plan Rollbacks: Always have a tested rollback strategy
github-process-manager/
βββ app.py # Main Flask application
βββ config.py # Configuration management
βββ logger.py # Logging setup
βββ rag_engine.py # RAG document processing
βββ gemini_client.py # Gemini API integration
βββ github_client.py # GitHub API integration
βββ word_generator.py # Word document generation
βββ requirements.txt # Python dependencies
βββ .env.template # Environment variable template
βββ .gitignore # Git ignore rules
βββ document_templates.json # Document template configuration
βββ templates/
β βββ base.html # Base template
β βββ index.html # Chat interface
β βββ settings.html # Settings page
βββ static/
β βββ css/
β βββ style.css # Application styling
βββ .github/
β βββ workflows/
β βββ process-analysis-doc.yml # Generic process workflow
β βββ sox-analysis-doc.yml # SOX-specific workflow (legacy)
βββ chroma_db/ # ChromaDB storage (auto-created)
βββ uploads/ # Temporary upload folder (auto-created)
βββ generated_reports/ # Generated Word documents (auto-created)
βββ README.md # This file
Edit config.py or set environment variables:
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | Required |
GEMINI_TEMPERATURE |
AI response randomness (0.0-1.0) | 0.7 |
GEMINI_MAX_TOKENS |
Maximum response length | 2048 |
SYSTEM_PROMPT_TEMPLATE |
Pre-defined prompt template | default |
CUSTOM_SYSTEM_PROMPT |
Custom system instruction | None |
COMPANY_NAME |
Company name for documents | None |
BRAND_COLOR |
Document brand color (hex) | #4A90E2 |
DOCUMENT_LOGO_PATH |
Path to logo for documents | None |
DEFAULT_TEMPLATE_TYPE |
Default document template | generic |
DOCUMENT_TEMPLATES_PATH |
Template config file path | document_templates.json |
GITHUB_REPO_URL |
GitHub repository URL | Optional |
FLASK_SECRET_KEY |
Flask session secret | Auto-generated |
CHROMA_DB_PATH |
ChromaDB storage location | ./chroma_db |
CHUNK_SIZE |
Characters per document chunk | 800 |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
TOP_K_RESULTS |
RAG chunks to retrieve | 3 |
MLOPS_FEATURES_ENABLED |
Enable MLOps features | false |
MLOPS_TEMPLATES_DIR |
MLOps templates directory | templates/mlops |
MLOPS_WORKFLOWS_DIR |
MLOps workflows directory | .github/workflows/mlops |
POST /api/chat- Send query and get AI response
POST /api/upload- Upload document for RAGGET /api/rag/stats- Get RAG database statisticsPOST /api/rag/clear- Clear all documents
POST /api/github/connect- Connect to repositoryGET /api/github/info- Get repository infoGET /api/github/workflows- List workflowsPOST /api/github/workflow/trigger- Trigger workflowGET /api/github/pulls- Get pull requestsGET /api/github/issues- Get issues
GET /api/prompts/templates- Get available prompt templatesGET /api/prompts/current- Get current active promptPOST /api/prompts/update- Update system prompt (session-based)POST /api/prompts/reset- Reset to default prompt
GET /api/mlops/status- Check MLOps feature availability and configurationPOST /api/mlops/parse-metrics- Parse and format ML metrics JSONPOST /api/mlops/validate-metrics- Validate ML metrics against schemaGET /api/mlops/templates- List available MLOps documentation templates
GET /health- Health check endpoint
- Make sure you've created a
.envfile from.env.template - Add your Gemini API key to the
.envfile - Restart the application
- Check file format (.txt, .pdf, .docx only)
- Ensure file size is under 16MB
- Check
app.logfor detailed error messages
- Verify your GitHub token has correct permissions (
repo,workflow) - Check that the repository URL is correct
- Ensure the token hasn't expired
- Delete the
chroma_db/folder and restart the application - This will clear all uploaded documents
- Automatically chunks documents into manageable pieces
- Generates embeddings using Gemini Embedding API
- Stores vectors in ChromaDB for fast similarity search
- Retrieves top-K most relevant chunks for each query
- Uses Gemini Pro for natural language understanding
- Combines RAG context with GitHub data in prompts
- Configurable temperature and token limits
- Robust error handling and retries
- Read repository metadata
- List and search pull requests and issues
- Access workflow run history
- Trigger workflows with custom inputs
- Retrieve repository files and structure
This is a personal project, but suggestions and improvements are welcome!
This project is provided as-is for educational and personal use.
- Google Gemini API for AI capabilities
- ChromaDB for vector storage
- PyGithub for GitHub integration
- Flask for the web framework
For issues or questions, please check the logs in app.log or review the troubleshooting section above.
Built with β€οΈ using Python, Flask, and ChromaDB