Project Doc Compiler

This pipeline uses a local LMStudio model to collate project documents from many multi-format source files. The system can be run to generate the doc, or can be set to monitor for changes. When the script detects a saved change to a source file it will automatically generate a new version of the collated doc and accompanying knowledge graph.

The knowledge graph acts as a version of the doc in which we can see which components of the source files contributed to which sections of the doc.

Setup

Download LMStudio
Use LMStudio to download an LLM
Load your LLM
Start the LLM Server inside LMStudio
Take the access IP/Port and write to your config.yaml
Add the source files to the /sources directory, or point to desired source directory in config.yaml
Edit the initial_request.txt with your doc creation specifications
Create a .venv

python -m venv .venv

Install requirements and package

pip install -r requirements.txt
pip install -e.

Now you can run:

# Run and create the doc and watch:
rag_runner --init
# Or: 
python -m rag_runner.run --init

# Run only the watcher:
rag-runner

System Flow: Automated RAG Manual Generator

Overview

The rag-runner system automates creation and continuous maintenance of technical manuals using a Retrieval-Augmented Generation (RAG) pipeline connected to a local LLM (e.g., LM Studio or Ollama). It monitors a directory of source files (Word docs, spreadsheets, CSVs, XLSX data, etc.), extracts their content, embeds them into a vector store, and generates an updated Markdown manual whenever those sources change.

Components and Their Roles

Module	Purpose
run.py	The orchestrator. Manages startup, initialization (`--init`), watching for source updates, regenerating manuals, and invoking downstream modules.
parsers.py	Extracts structured text “facts” from supported file types (.docx, .csv, .xlsx, .txt). Each extracted fact becomes a chunk stored in the vector database.
embed_store.py	Manages embeddings and vector storage using ChromaDB. Uses LM Studio / Ollama HTTP API or local SentenceTransformer fallback for embeddings.
template_gen.py	Generates the initial manual template by prompting the LLM based on the user’s `initial_request.txt` and available source files.
prompts.py	Holds prompt templates for various phases: RAG system, section updates, changelog generation, and template creation.
regenerate.py	Selectively updates sections of the manual using retrieved context. Handles diffing, citation maps, and changelog entries.
knowledge_graph.py	Builds a visual knowledge graph (`knowledge_graph.html`) linking source documents to the manual sections that reference them. Also generates an annotated HTML manual with clickable source highlights.
config.yaml	Configuration file specifying paths, model settings, watch directories, and update intervals.
initial_request.txt	The user’s instruction file for what kind of manual or summary to create (“technical specification,” “summary report,” etc.).

Continuous Monitoring

File Watcher
- watchdog observes the configured sources/ directory.
- On any file creation/modification/deletion:
  - Extract and re-embed the updated content.
  - Compare to previous version (diff).
  - Trigger selective regeneration for affected manual sections.
Selective Update
- regenerate.py identifies which manual sections cite the changed source.
- Queries the updated embeddings for relevant context.
- Sends a structured update prompt (UPDATE_SECTIONS_PROMPT) to the LLM.
- The LLM rewrites only the affected section while preserving citations and unchanged text.
Changelog Generation
- Every update produces a structured changelog (CHANGELOG_SUMMARY_PROMPT) containing:
  - Changed sources and versions.
  - Sections touched.
  - Equations altered.
  - Summary of impact.
- Changelog is appended to a JSON or Markdown file in /logs/.
Knowledge Graph Refresh
- After any manual regeneration, the graph files are rebuilt so the visualization stays current.
- Clicking a source node highlights affected sections in the manual viewer.

Data Flow Diagram (Simplified)

 ┌────────────────────┐
 │   Source Files     │ (.docx, .csv, .xlsx)
 └────────┬───────────┘
          │
   [Extract Facts]
          │
   parsers.py
          ▼
   [Embeddings]
          │
   embed_store.py ──► chroma vector DB
          │
          ▼
 [Query Context & Write Manual]
    regenerate.py / run.py
          │
          ▼
 [Manual Markdown] ───► manuals/
          │
          ├──► knowledge_graph.py ─► HTML Graph
          │
          └──► changelog.json

Key Design Features
- RAG-based Contextualization: Uses a vector search per section for precise retrieval.
- Source Traceability: Inline citations ([^refN]) link every fact to its originating file and version.
- Automatic Changelog: Each regeneration logs what changed and why.
- Visual Provenance: The D3.js graph shows dependencies and lets users inspect which manual sections rely on each source.
- Classification Filtering: Template generation can skip classified sources (PUBLIC/INTERNAL/SECRET).
- Model-Agnostic: Works with LM Studio, Ollama, or local Hugging Face models using OpenAI API format.
- Local-Only Operation: No cloud calls required; all embeddings, models, and storage can run offline.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/rag-runner		src/rag-runner
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Doc Compiler

Setup

System Flow: Automated RAG Manual Generator

Overview

Components and Their Roles

Continuous Monitoring

Data Flow Diagram (Simplified)

About

Uh oh!

Releases

Packages

Languages

S-Mahoney/Compopulate

Folders and files

Latest commit

History

Repository files navigation

Project Doc Compiler

Setup

System Flow: Automated RAG Manual Generator

Overview

Components and Their Roles

Continuous Monitoring

Data Flow Diagram (Simplified)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages