Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,17 @@ pip install -r requirements.txt

Install these extras when you need richer file handling:

- `PyPDF2` – enables text extraction from PDF attachments so their contents can be sent to the selected backend.
- `python-docx` – parses DOCX files and pulls paragraph text for inclusion in prompts.
- `PyPDF2` – used with `--extract-text` to pull PDF contents into prompts before sending them to the selected backend.
- `python-docx` – used with `--extract-text` to parse DOCX files and include paragraph text inline.
- `opencv-python` – extracts representative PNG frames from video files when using `--frame-by-frame` processing.
- `pypdfium2` + `Pillow` – render PDFs into PNG previews that are uploaded to vision-capable backends when `--extract-text` is **not** supplied.
- `Pillow` (alone) – renders DOCX snapshots so word-processing files can be viewed as images when skipping text extraction.

Install the preview toolchain (for PDFs and DOCX files) with:

```bash
pip install pillow pypdfium2 python-docx
```

## Usage

Expand Down Expand Up @@ -68,3 +76,10 @@ send extracted video frames individually and concatenate the responses.
Use `-o`/`--output` to save the model response to a file. When no
filename is supplied, the first attached file name with `.txt` appended is
used; if no files are attached, `response.txt` is created.

By default the CLI renders PDF and DOCX files into PNG preview images and sends
those to vision-capable backends (such as Ollama multimodal models). When the
preview toolchain is unavailable, the binary document is attached instead with a
note explaining how to enable previews. Use `--extract-text` when you prefer to
run local extraction tools (PDF via PyPDF2, DOCX via python-docx) before
embedding the contents into the prompt.
Loading