Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 66 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,72 +6,75 @@ A lightweight multimodal RAG (Retrieval-Augmented Generation) library that uses

## 🌟 Features

- **Vision-First Approach**: Documents processed as images using PyMuPDF, preserving visual information and formatting
- **No Vector Database Required**: Eliminates the complexity of embeddings and vector storage
- **Adaptive RAG Agent**: Single intelligent agent that dynamically plans tasks and selects relevant pages
- **Multi-Provider Support**: Works with OpenAI GPT-4V, Anthropic Claude, and OpenRouter
- **Modern CLI Interface**: Beautiful terminal UI built with Textual
- **Conversation Aware**: Maintains context across multiple queries
- **Pluggable Storage**: Local filesystem or in-memory storage backends
- **Vision-First Approach**: Documents processed as images using PyMuPDF, preserving visual information and formatting.
- **No Vector Database Required**: Eliminates the complexity of embeddings and vector storage.
- **Adaptive RAG Agent**: An intelligent agent that dynamically plans and executes tasks to answer your queries.
- **Multi-Provider Support**: Works with OpenAI GPT-4V, Anthropic Claude, and any model supported by OpenRouter.
- **Modern CLI Interface**: A beautiful and intuitive terminal UI built with Textual.
- **Conversation Aware**: Maintains context across multiple queries for a natural chat experience.
- **Pluggable Storage**: Supports local filesystem and in-memory storage backends.

## 🚀 Quick Start

### Installation

```bash
# use uv (recommended)
# Using uv (recommended)
uv pip install docpixie

# or pip
# Using pip
pip install docpixie
```

Try the CLI:
Then, launch the CLI:
```bash
docpixie
```

### Basic Usage
### Basic Usage (as a library)

```python
import asyncio
from docpixie import DocPixie

async def main():
# Initialize with your API key
# Initialize DocPixie, which will use environment variables for API keys.
# For example, set OPENROUTER_API_KEY for the default OpenRouter provider.
docpixie = DocPixie()

# Add a document
# Add a document to the system.
document = await docpixie.add_document("path/to/your/document.pdf")
print(f"Added document: {document.name}")

# Query the document
# Query the document with a question.
result = await docpixie.query("What are the key findings?")
print(f"Answer: {result.answer}")
print(f"Pages used: {result.page_numbers}")

# Run the example
# Run the asynchronous main function.
asyncio.run(main())
```

### Using the CLI

Start the interactive terminal interface:
Start the interactive terminal interface with a single command:

```bash
docpixie
```

The CLI provides:
The CLI provides a rich user experience with:
- Interactive document chat
- Document management
- Document management (indexing, deletion)
- Conversation history
- Model configuration
- Command palette with shortcuts
- A command palette with shortcuts for all major actions

## 🛠️ Configuration

DocPixie uses environment variables for API key configuration:
DocPixie can be configured via environment variables or directly in code.

### Environment Variables

```bash
# For OpenAI (default)
Expand All @@ -80,95 +83,94 @@ export OPENAI_API_KEY="your-openai-key"
# For Anthropic Claude
export ANTHROPIC_API_KEY="your-anthropic-key"

# For OpenRouter (supports many models)
# For OpenRouter (recommended for access to a wide range of models)
export OPENROUTER_API_KEY="your-openrouter-key"
```

You can also specify the provider:
### In Code

You can also specify the provider and models directly when initializing `DocPixie`:

```python
from docpixie import DocPixie, DocPixieConfig

# Example configuration for Anthropic's Claude 3 Opus
config = DocPixieConfig(
provider="anthropic", # or "openai", "openrouter"
provider="anthropic",
model="claude-3-opus-20240229",
vision_model="claude-3-opus-20240229"
)

# Initialize DocPixie with the custom configuration
docpixie = DocPixie(config=config)
```

## 📚 Supported File Types

- **PDF files** (.pdf) - Full multipage support
- More file types coming soon
- **PDF files** (`.pdf`): Full multipage support.
- **Image files** (`.jpg`, `.jpeg`, `.png`, `.webp`): Each image is treated as a single-page document.
- More file types are coming soon!

## 🏗️ Architecture

DocPixie uses a clean, modular architecture:
DocPixie is built on a clean, modular architecture:

```
📁 Core Components
├── 🧠 Adaptive RAG Agent - Dynamic task planning and execution
├── 👁️ Vision Processing - Document-to-image conversion via PyMuPDF
├── 🔌 Provider System - Unified interface for AI providers
├── 💾 Storage Backends - Local filesystem or in-memory storage
└── 🖥️ CLI Interface - Modern terminal UI with Textual
├── 🧠 Adaptive RAG Agent: Dynamically plans and executes tasks.
├── 👁️ Vision Processing: Converts documents to images using PyMuPDF.
├── 🔌 Provider System: A unified interface for different AI providers.
├── 💾 Storage Backends: Pluggable storage for local or in-memory data.
└── 🖥️ CLI Interface: A modern terminal UI powered by Textual.

📁 Processing Flow
1. Document → Images (PyMuPDF)
2. Vision-based summarization
3. Adaptive query processing
4. Intelligent page selection
5. Response synthesis
1. Document → Images (via PyMuPDF for PDFs)
2. Vision-based summarization of the document.
3. Adaptive query processing by the RAG agent.
4. Intelligent page selection using vision models.
5. Synthesis of the final response from task results.
```

### Key Design Principles

- **Provider-Agnostic**: Generic model configuration works across all providers
- **Image-Based Processing**: All documents converted to images, preserving visual context
- **Business Logic Separation**: Raw API operations separate from workflow logic
- **Adaptive Intelligence**: Single agent mode that dynamically adjusts based on findings

## 🎯 Use Cases

- **Research & Analysis**: Query academic papers, reports, and research documents
- **Document Q&A**: Interactive questioning of PDFs, contracts, and manuals
- **Content Discovery**: Find specific information across large document collections
- **Visual Document Processing**: Handle documents with charts, diagrams, and complex layouts
- **Research & Analysis**: Query academic papers, reports, and research documents.
- **Document Q&A**: Interactively question PDFs, contracts, and manuals.
- **Content Discovery**: Find specific information across large document collections.
- **Visual Document Processing**: Handle documents with charts, diagrams, and complex layouts that traditional text-based RAG systems struggle with.

## 🌍 Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key | None |
| `ANTHROPIC_API_KEY` | Anthropic API key | None |
| `OPENROUTER_API_KEY` | OpenRouter API key | None |
| `DOCPIXIE_PROVIDER` | AI provider | `openai` |
| `DOCPIXIE_STORAGE_PATH` | Storage directory | `./docpixie_data` |
| `DOCPIXIE_JPEG_QUALITY` | Image quality (1-100) | `90` |
| `OPENAI_API_KEY` | Your OpenAI API key. | None |
| `ANTHROPIC_API_KEY` | Your Anthropic API key. | None |
| `OPENROUTER_API_KEY` | Your OpenRouter API key. | None |
| `DOCPIXIE_PROVIDER` | The AI provider to use. | `openai` |
| `DOCPIXIE_STORAGE_PATH` | The directory for local storage. | `./docpixie_data` |
| `DOCPIXIE_JPEG_QUALITY` | The image quality for JPEG conversion (1-100). | `90` |

## 📖 Documentation

- [Getting Started Guide](docs/getting-started.md) - Detailed examples and tutorials
- [CLI Tool Guide](docs/cli-tool.md) - Complete CLI documentation
For more detailed information, please refer to the docstrings within the source code.

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
We welcome contributions! Please follow these steps:

1. Fork the repository.
2. Create a feature branch (`git checkout -b feature/your-amazing-feature`).
3. Commit your changes (`git commit -m 'Add your amazing feature'`).
4. Push to the branch (`git push origin feature/your-amazing-feature`).
5. Open a Pull Request.

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built with [PyMuPDF](https://pymupdf.readthedocs.io/) for PDF processing
- CLI powered by [Textual](https://textual.textualize.io/)
- Supports OpenAI, Anthropic, and OpenRouter APIs
- Built with [PyMuPDF](https://pymupdf.readthedocs.io/) for high-performance PDF processing.
- The beautiful CLI is powered by [Textual](https://textual.textualize.io/).
- Supports APIs from [OpenAI](https://openai.com/), [Anthropic](https://www.anthropic.com/), and [OpenRouter](https://openrouter.ai/).

---
Loading