Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,7 @@ Install *spacypdfreader* using pip:
pip install spacypdfreader
```

To install with the required pytesseract dependencies:

```bash
pip install 'spacypdfreader[pytesseract]'
```
For details on how to install and use *spacypdfreader* with pytesseract see the docs: [https://samedwardes.github.io/spacypdfreader/parsers/#pytesseract](https://samedwardes.github.io/spacypdfreader/parsers/#pytesseract).

## Usage

Expand Down
29 changes: 28 additions & 1 deletion docs/parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,34 @@ You can install most of the dependencies by pip installing *spacypdfreader* with
pip install 'spacypdfreader[pytesseract]'
```

Unfortunately this will not always install all of the dependencies because some of them are non-python related. I find that installing pytesseract can be a little bit tricky for beginners. Please refer to [https://github.com/madmaze/pytesseract#installation](https://github.com/madmaze/pytesseract#installation) for details on how to install *pytesseract* if the above does not work.
For pytesseract to work you have to install some additional tools. Installing pytesseract can be a little bit tricky for beginners. Please refer to [https://github.com/madmaze/pytesseract#installation](https://github.com/madmaze/pytesseract#installation) for details on how to install *pytesseract* if the above does not work.

### Linux

```bash
sudo apt-get install poppler-utils
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
```

### Mac

```bash
brew install poppler
brew install tesseract
```

### Windows

To install poppler see the instructions here [https://stackoverflow.com/a/53960829](https://stackoverflow.com/a/53960829).

Then install tesseract with:

```bash
scoop install tesseract
```

Or you can follow the instructions here to install tesseract using the windows installer: [https://github.com/UB-Mannheim/tesseract/wiki](https://github.com/UB-Mannheim/tesseract/wiki).

**Usage**

Expand Down