From b9107d38199c86b5cdfe7a866f4d338d525c3b38 Mon Sep 17 00:00:00 2001 From: Sam Edwardes Date: Mon, 24 Jan 2022 15:01:11 -0800 Subject: [PATCH 1/2] More install instructions for pytesseract --- docs/parsers.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/parsers.md b/docs/parsers.md index bcbd152..907e2d3 100644 --- a/docs/parsers.md +++ b/docs/parsers.md @@ -69,6 +69,28 @@ You can install most of the dependencies by pip installing *spacypdfreader* with pip install 'spacypdfreader[pytesseract]' ``` +For pytesseract to work you have to install some additional tools. + +### Linux + +```bash +sudo apt-get install poppler-utils +sudo apt install tesseract-ocr +sudo apt install libtesseract-dev +``` + +### Mac + +```bash +# in progress +``` + +### Windows + +```bash +# in progress +``` + Unfortunately this will not always install all of the dependencies because some of them are non-python related. I find that installing pytesseract can be a little bit tricky for beginners. Please refer to [https://github.com/madmaze/pytesseract#installation](https://github.com/madmaze/pytesseract#installation) for details on how to install *pytesseract* if the above does not work. **Usage** From a954d37a79144cbb21fde9ecd838236c8790b786 Mon Sep 17 00:00:00 2001 From: Sam Edwardes Date: Mon, 24 Jan 2022 15:41:35 -0800 Subject: [PATCH 2/2] Add description to install with windows, mac, and linux --- README.md | 6 +----- docs/parsers.md | 13 +++++++++---- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 00f9ec2..8e97f81 100644 --- a/README.md +++ b/README.md @@ -49,11 +49,7 @@ Install *spacypdfreader* using pip: pip install spacypdfreader ``` -To install with the required pytesseract dependencies: - -```bash -pip install 'spacypdfreader[pytesseract]' -``` +For details on how to install and use *spacypdfreader* with pytesseract see the docs: [https://samedwardes.github.io/spacypdfreader/parsers/#pytesseract](https://samedwardes.github.io/spacypdfreader/parsers/#pytesseract). ## Usage diff --git a/docs/parsers.md b/docs/parsers.md index 907e2d3..0d90cd2 100644 --- a/docs/parsers.md +++ b/docs/parsers.md @@ -69,7 +69,7 @@ You can install most of the dependencies by pip installing *spacypdfreader* with pip install 'spacypdfreader[pytesseract]' ``` -For pytesseract to work you have to install some additional tools. +For pytesseract to work you have to install some additional tools. Installing pytesseract can be a little bit tricky for beginners. Please refer to [https://github.com/madmaze/pytesseract#installation](https://github.com/madmaze/pytesseract#installation) for details on how to install *pytesseract* if the above does not work. ### Linux @@ -82,16 +82,21 @@ sudo apt install libtesseract-dev ### Mac ```bash -# in progress +brew install poppler +brew install tesseract ``` ### Windows +To install poppler see the instructions here [https://stackoverflow.com/a/53960829](https://stackoverflow.com/a/53960829). + +Then install tesseract with: + ```bash -# in progress +scoop install tesseract ``` -Unfortunately this will not always install all of the dependencies because some of them are non-python related. I find that installing pytesseract can be a little bit tricky for beginners. Please refer to [https://github.com/madmaze/pytesseract#installation](https://github.com/madmaze/pytesseract#installation) for details on how to install *pytesseract* if the above does not work. +Or you can follow the instructions here to install tesseract using the windows installer: [https://github.com/UB-Mannheim/tesseract/wiki](https://github.com/UB-Mannheim/tesseract/wiki). **Usage**