Skip to content

asturwebs/RAG-Converter-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Converter Tool

Convert Office documents to Markdown ready for RAG pipelines.

English | Español | 简体中文 | 한국어

License: MIT PowerShell 7+ [Windows [Release v2.0.0


Converts .doc, .docx and .pptx files into structured Markdown optimized for RAG systems, with AI-powered image analysis, quality validation and certification report generation.

Born as an internal tool validated in production. Released as open source for the community to benefit.


What it does

Capability Description
Conversion Office to Markdown with hierarchical structure, table of contents and anchors
AI Vision Analysis of embedded images: OCR, spatial analysis, pedagogical value
Automatic QA Batch validation with NORM_OK or NORM_WITH_ERRORS status
Reports Generation of commercial and technical reports with real metrics
Multi-client Independent configuration per client with .env.<client>.<environment>
Idempotent Skips already processed files; -Reprocess forces reprocessing

Current limitations

  • Windows with Microsoft Word and PowerPoint installed (COM automation)
  • PowerShell 7+ required
  • Requires an API key from a vision model provider (OpenRouter, OpenAI, etc.)

The roadmap includes plans for cross-platform support (Python, Docker) and more formats (PDF, XLSX, images).


Structure

RAG_Converter_Tool/
├── Convert-OfficeToRAG.ps1     # Main conversion and QA engine
├── Run-RAG.ps1                # Launcher with .env support
├── Enable-RagAlias.ps1         # Session aliases (rag, rr, rag-report)
├── Gen-Report.ps1             # Report generator
├── .env.example              # Configuration template
├── DEV_GUIDE.md              # Full technical guide
├── ROADMAP.md                # Project roadmap
├── LICENSE                   # MIT
├── NOTICE.md                  # Attribution for commercial use
├── CITATION.cff              # Academic citation
└── docs/                     # Additional documentation

Installation

No installation required. Clone the repository and configure your API key:

git clone https://github.com/asturwebs/RAG-Converter-Tool.git
cd RAG_Converter_Tool
Copy-Item ".env.example" ".env"

Edit .env and add your OPENROUTER_API_KEY.

Quick start

# Load aliases in the current session
. ".\Enable-RagAlias.ps1"

# Convert all documents in a folder
rag -Target "C:\Path\Documents"

# Convert a specific file
rag -Target "C:\Path\Report.docx" -Reprocess

# Generate certification report
rag-report -Modo comercial -Cliente "Acme Corp"
rag-report -Modo tecnico -Cliente "Acme Corp"

Multi-client

Manage multiple clients with independent environment files:

# Create configuration per client
Copy-Item ".env.example" ".env.acme.prod"
Copy-Item ".env.example" ".env.contoso.staging"

# Run per client
rag -EnvFile ".env.acme.prod" -Target "C:\Path\Documents"

Certification reports

The tool generates automatic reports with real execution metrics:

  • Commercial: Executive summary for client delivery
  • Technical: Forensic audit with detailed metrics

Both modes include: processed files, analyzed images, QA status, timings and responsible signature.

Profiles

Three predefined profiles with tuned model configuration:

Profile Use case
default Development and testing
staging Pre-production with conservative parameters
prod Production with maximum analysis quality

License

MIT. See LICENSE.

Commercial use: visible attribution to the author is appreciated. See NOTICE.md.

Author

Pedro Luis Cuevas Villarrubia — Innovation Practitioner & AI Agent Architect

About

Office-to-Markdown converter for RAG pipelines with multimodal image analysis, QA controls, and certification reporting. Windows, macOS, Linux.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors