Skip to content

ability to use different ocrengine as enum, update of example.py to include ocrengine, clean up of README.md, include .vscode/ in .gitignore, update setup.py to require requests#13

Open
NegassaB wants to merge 28 commits into
ErikBoesen:masterfrom
NegassaB:master

Conversation

@NegassaB

@NegassaB NegassaB commented Jan 3, 2022

Copy link
Copy Markdown

ability to use different ocrengine as enum, update of example.py to include ocrengine, clean up of README.md, include .vscode/ in .gitignore, update setup.py to require requests

.gitignore
- update to include .vscode/ files

main.py
- import Enum and unique from enum module to declare for the new
OCREngine_VAL enum class
- declare OCREngine_VAL enum class and insert two values, engine_1 which
is equal to 1 and engine_2 which is equal to 2
- use the declared enum class in API.__init__, make engine_1 default
- check if the passed value for ocrengine is an instance of
OCREngine_VAL (to enforce usage of the enum class) and the value is no
different from 1 or 2.
- use the ocrengine variable inside payload

example.py
- update to use the OCREngine_VAL enum class when wanting to change the
ocrengine of the api

README.md
- clean up the README.md file with spaces and well defined programming
languages for the code blocks
- insert a description on how to change the ocrengine with example
main.py
- correct the typo when checking if it's an instance of OCREngine_VAL
main.py
- give better description when throwing exception

example.py
- include example using api_with_engine_two
main.py
- change the error raised for isinstance() failure from ValueError
TypeError
- change the error raise for wrong value from Exception() to
ValueError()
setup.py
- include the dependecy needed for the lib inside setup()
README.md
- update README to describe getting apikey from Free OCR API.

main.py
- refactor processing of files into it's own method named query_api().
- pass timeout parameter to request.post() inside query_api() to
specifiy for how to wait before raise a Timeout error instead of hanging
the process for undefined amount of time.
- call query_api() in all the ocr processors.
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread requirements.txt Outdated
main.py
- rename OCREngine_VAL to Engine
- capitalize the values inside class Engine
- rename parameter ocrengine inside __init__() to engine
- simplify raised TypeError's message
- rename parameter pic_file to image_file in query_api()

example.py
- update example.py in accordance with main.py
main.py
- rename OCREngine_VAL to Engine
- capitalize the values inside class Engine
- rename parameter ocrengine inside __init__() to engine
- simplify raised TypeError's message
- rename parameter pic_file to image_file in query_api()

example.py
- update example.py in accordance with main.py
requirements.txt
- remove unnecessarily listed dependencies
- remove specific versioning of requests

main.py
- remove unnecessary except blocks
updated in par with 2nd review
updated in par with 2nd review
requirements.txt
- remove unnecessarily listed dependencies
- remove specific versioning of requests

main.py
- remove unnecessary except blocks
requirements.txt
- remove unnecessarily listed dependencies
- remove specific versioning of requests

main.py
- remove unnecessary except blocks
@NegassaB NegassaB requested a review from ErikBoesen January 5, 2022 15:54
main.py
- remove key api_key from payload dict
- insert new self attribute named api_key
- add a headers parameter in requests.post() and place api_key in
headers

This was done in accordance with the documentation and examples provided by Free OCR API https://ocr.space/ocrapi#postman
Comment thread ocrspace/main.py Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated


def ocr_file(self, fp):
def query_api(self, url_data=None, image_file=None):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you can just leave url_data. I shy away from using data in names, but here I think it's appropriate.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, hold on—is there anything preventing us from separating this into two parameters, one for base64 image data and one for a URL?

@NegassaB NegassaB Jan 5, 2022

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it wouldn't make sense to separate them cause the query they conduct is handled by the same elif statement inside query_api()

elif url_data:
    r = requests.post(
        self.endpoint,
        data=url_data,
        timeout=30
    )

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but we could very easily recombine them; this would just provide syntactic sugar.

@NegassaB NegassaB Jan 6, 2022

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea but separating something just to recombine it later on...seems kinda clunky as well as inefficient

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I it's well worth it to have the meaning of the parameter be clear. url_data is pretty ambiguous and could mean a lot of things. One or statement to allow the meaning of both these parameters to be self-documenting is a worthy compromise.

Comment thread ocrspace/main.py
@NegassaB NegassaB requested a review from ErikBoesen January 5, 2022 16:45
main.py
- remove Engine: from docstring for Engine
- rename ocreengine to engine in docstring for API
- remove import & from ValueError message
- update query_api() to if/elif/else statement when querying OCR api

README.md
- use clear & concise wording in the Use section
- rename OCREngine_VAL to Engine
- rename OCRENGINE_VAL.val_1 & OCRENGINE_VAL.val_2 to Engine.ENGINE_2 &
ENGINE.ENGINE_1
* It might solve the TimeoutError

main.py
- remove key api_key from payload dict
- insert new self attribute named api_key
- add a headers parameter in requests.post() and place api_key in
    headers

N.B. This was done in accordance with the documentation and examples provided by Free OCR API https://ocr.space/ocrapi#postman

@ErikBoesen ErikBoesen left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready to go!

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread ocrspace/main.py Outdated


def ocr_file(self, fp):
def query_api(self, url_data=None, image_file=None):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but we could very easily recombine them; this would just provide syntactic sugar.

Comment thread README.md Outdated
Comment thread example.py
@NegassaB NegassaB requested a review from ErikBoesen January 6, 2022 16:28
README.md
- use single quotes instead of triple codes for describing the apikey in the Use section
- remove description and import of Engine, use instead full package name
- update example code to use full package name for Engine

@ErikBoesen ErikBoesen left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your help! Just a few more things to fix.

Comment thread ocrspace/main.py Outdated


def ocr_file(self, fp):
def query_api(self, url_data=None, image_file=None):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I it's well worth it to have the meaning of the parameter be clear. url_data is pretty ambiguous and could mean a lot of things. One or statement to allow the meaning of both these parameters to be self-documenting is a worthy compromise.

Comment thread LICENSE Outdated
Comment thread README.md Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread ocrspace/main.py Outdated
Comment thread setup.py Outdated
Comment thread LICENSE Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread ocrspace/main.py
:param fp: A path or pointer to your file
:return: Result in JSON format
"""
with (open(fp, 'rb') if type(fp) == str else fp) as f:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of dense, can we reorganize to be a bit clearer?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify on this. Do you want to reorganize the code or the documentation? and if so how do you want to reorganize the code or documentation? Dense doesn't really communicate what you want to do

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code. As in, split the ternary if/else into a full if/else statement across multiple lines so that we don't have a ton of logic in a single line making it difficult to read and understand at a glance

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify on this. Do you want to reorganize the code or the documentation? and if so how do you want to reorganize the code or documentation? Dense doesn't really communicate what you want to do

The code could be made more readable (or more verbose).

Comment thread setup.py Outdated
Comment thread ocrspace/main.py Outdated
LICENSE
- remove h1 tag from header

README.md
- remove duplicates of Engine
- remove spaces from the authors' names & urls

setup.py
- bump version to 2.4.0
- restore author's email

main.py
- remove unique and paranthesis from enum import
- replace Enum import by IntEnum
- replace double quotes with single quotes
- rename url_data to image_url with a better description in the docs
@NegassaB NegassaB requested a review from ErikBoesen May 21, 2022 21:53

@ErikBoesen ErikBoesen left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, wrote these comments up a long time ago and then never submitted the review. Looks like I was only looking for some small changes so should be straightforward

Comment thread LICENSE
@@ -1,7 +1,8 @@
MIT License
MIT License

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for leading space either

Comment thread ocrspace/main.py
"""
if not isinstance(engine, Engine):
raise TypeError('engine must be an instance of Engine')
if engine.value != 1 and engine.value != 2:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is needed; it's impossible to make an IntEnum class with an invalid value as far as I know

Comment thread ocrspace/main.py
:param fp: A path or pointer to your file
:return: Result in JSON format
"""
with (open(fp, 'rb') if type(fp) == str else fp) as f:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code. As in, split the ternary if/else into a full if/else statement across multiple lines so that we don't have a ton of logic in a single line making it difficult to read and understand at a glance

Comment thread ocrspace/main.py
Process image from a local path.
:param fp: A path or pointer to your file
Process the provided parameter.
:param image_url: An Image url or base64image encoded string

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have forgotten to use this parameter in the function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants