Conversation
|
Hi @pithysr I will leave some comments in the code. Correctness:All checks have failed, please fix this (maybe fetch upstream is enough) |
|
|
||
| class FontChange(SentenceOperation): | ||
| tasks = [TaskType.TEXT_CLASSIFICATION] | ||
| languages = ["en"] |
There was a problem hiding this comment.
Why just English? I believe that this transformation works well for many other languages
There was a problem hiding this comment.
It can work with other languages in its current form but the dictionary I used (reference in the code and in the README file) does not cover the accented letters or characters outside the English alphabet. If such a resource is added, we can use this for other languages without skipping those characters.
| for ttc in tokens_to_change: | ||
| while True: | ||
| font = random.sample(list(fonts.keys()), 1)[0] | ||
| if font != "normal": |
There was a problem hiding this comment.
Why not just removing "normal" from the dictionary in the JSON file?
There was a problem hiding this comment.
It was by choice. I did not want any discrepancy between my dictionary and the original resource. Of course, it can be removed.
|
Hello @marco-digio |
| from tasks.TaskTypes import TaskType | ||
|
|
||
| nltkdl("stopwords") | ||
| nltkdl("punkt") |
There was a problem hiding this comment.
This could be moved inside the constructor I think..
| nltkdl("maxent_ne_chunker") | ||
| nltkdl("punkt") | ||
| nltkdl("averaged_perceptron_tagger") | ||
| nltkdl("stopwords") |
There was a problem hiding this comment.
Even this could go inside the constructor..
|
|
||
| def __init__(self, seed=666, max_outputs=1): | ||
| super().__init__(seed, max_outputs=max_outputs) | ||
| self.nlp = spacy.load("en_core_web_sm") |
There was a problem hiding this comment.
Instead of loading new spacy, please use global spacy like below:
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")
| return perturbed_texts | ||
|
|
||
|
|
||
| class HashtagifyTransformation(SentenceOperation): |
There was a problem hiding this comment.
Please add a docstring for all the classes and functions with a proper description of params and return type.
| return perturbed_texts | ||
|
|
||
|
|
||
| class FontChange(SentenceOperation): |
There was a problem hiding this comment.
Same here, add docstrings for classes and functions.
| TaskType.TEXT_CLASSIFICATION, | ||
| TaskType.TEXT_TAGGING, | ||
| ] | ||
| languages = [ |
There was a problem hiding this comment.
As the list is very long, better to create a separate file for this and import it here.
|
The conflicts are with my other submission that is already approved. Should I open a new PR? |
Hi @pithysr: You can continue fixing the conflict in this same PR. Thanks! |
| ## Target Tasks | ||
|
|
||
| This transformation can be used for data augmentation in text classification tasks. | ||
|
|
There was a problem hiding this comment.
You should add a Data And Code Provenance section to point out the correct source of all the files.
There was a problem hiding this comment.
@kaustubhdhole Hi, I had this information in the README file but I added a separate section. However, now I cannot commit due to an error (No module named 'torchtext'). I did not have this error previously when submitting the code. Can I submit with git commit --no-verify?
Inspired by social media posts, this transformation add noise to an input sentence by randomly changing the font of words
in a sentence.