Skip to content

Standardize spacy#347

Draft
AbinayaM02 wants to merge 15 commits intoGEM-benchmark:mainfrom
AbinayaM02:standardize_spacy
Draft

Standardize spacy#347
AbinayaM02 wants to merge 15 commits intoGEM-benchmark:mainfrom
AbinayaM02:standardize_spacy

Conversation

@AbinayaM02
Copy link
Collaborator

Fixes #339.



def initialize_models():
def initialize_models(model: str = "spacy", lang: str = "en"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are making it generic, it would be great if we can create an enum for all heavy models which we want to load.
Because in future it may increase.

Something like:

LoadOnceModel.SPACY,
LoadOnceModel.GLOVE

elif model == "glove":
# load glove
glove = vocab.GloVe(name="6B", dim="100")

Copy link
Collaborator

@aadesh11 aadesh11 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an 'else' block also where we can throw an exception with an unsupported message. (if it doesn't match any model name)

# load glove
glove = vocab.GloVe(name = "6B", dim = "100")
if model == "spacy":
if lang == "en":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetic Change (line 36-45): Better to create a map of 'lang' vs 'spacy model name' which will eliminate multiple lines of code.

spacy_nlp = spacy.load("en_core_web_sm")
elif lang == "es":
spacy_nlp = spacy.load("es_core_news_sm")
elif lang == "zh":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more informative, can we add a log message of whatever model we are loading as there are multiple models?
Something like:
"Loading zh_core_web_sm model of spacy......."

@AbinayaM02
Copy link
Collaborator Author

Sure @aadesh11. Will address the comments. This PR might take a while to complete :)

@kaustubhdhole
Copy link
Collaborator

@AbinayaM02 should we close this PR?

@AbinayaM02
Copy link
Collaborator Author

@AbinayaM02 should we close this PR?

Hi @kaustubhdhole: I wanted to standardize the loading of all spacy models in a single place. Not finding time to finish it. Let it be on draft and I'll try to close it once the first release is done (hopefully soon)! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize loading of different spacy models

3 participants