Generate LLM-friendly documentation files from any documentation website.
This tool downloads a documentation site, extracts the useful content, and produces a clean text file optimized for AI agents and LLMs.
This project follows the concept proposed by the llms.txt initiative — a simple standard for providing LLM-friendly context files for AI systems.
Learn more: https://llmstxt.org/
The process follows a simple pipeline:
Documentation Website
↓
Mirror site (HTTrack)
↓
Extract clean content (Trafilatura)
↓
Build LLM-friendly documentation file
The output can be used as context for AI agents, copilots, and chat assistants.
Install the following dependencies.
- HTTrack
- Python 3.10+
- pip
pip install trafilatura tqdmClone the repository:
git clone https://github.com/vreoo/llms-dot-txt.git
cd llms-dot-txtMake the script executable:
chmod +x run.shRun the script with:
./run.sh --name <project-name> --url <docs-url>
Example:
./run.sh --name rabbitmq --url https://rabbitmq.com/docsAfter the process finishes, the following structure will be created:
llm-docs/
└── rabbitmq/
├── site/ # mirrored documentation website
├── extracted/ # extracted text content
└── docs/
└── rabbitmq-llm.txt
└── rabbitmq-rag.jsonl
The generated file:
rabbitmq-llm.txt
contains clean documentation formatted for LLM consumption.
Example prompt:
Context:
docs/rabbitmq-llm.txt
Task:
Create a Python producer that sends messages to a RabbitMQ queue.
This allows AI agents to reason using official documentation as context.
Generated documentation is derived from the original documentation website and follows the license of the source documentation.
Scripts in this repository are licensed under the MIT License.