Font change by pithysr · Pull Request #262 · GEM-benchmark/NL-Augmenter

pithysr · 2021-09-01T01:26:45Z

Inspired by social media posts, this transformation add noise to an input sentence by randomly changing the font of words
in a sentence.

marco-digio · 2021-09-14T14:20:50Z

Hi @pithysr
I was assigned as one of your reviewers.
Thanks for this contribution!
I really like this idea and I believe it will be really helpful.

I will leave some comments in the code.

Correctness:All checks have failed, please fix this (maybe fetch upstream is enough)
Interface: I think you have chosen the correct interface(s).
Applicable Tasks & Keywords: I think you have chosen the relevant task but why not TaskType.TEXT_TO_TEXT_GENERATION, TaskType.TEXT_TAGGING ? Please also insert keywords as explained here
Specificity: The change is very general.
Novelty: This transformation is not yet implemented in NL-Augmenter.
Adding New Libraries: It requires nltk==3.6.2 as explained in requirements.txt
Description: The README is clear and explains what the transformation is aiming to do.
Data and code source: Data and code source missing. Please add references in the README
Test Cases: 6 test cases have been added.
Evaluating Robustness: No robustness evaluation has been performed.
Languages other than English: I believe that the transformation works also for other languages even if the author specify just english.

marco-digio · 2021-09-14T14:23:20Z

transformations/font_change/transformation.py

+
+class FontChange(SentenceOperation):
+    tasks = [TaskType.TEXT_CLASSIFICATION]
+    languages = ["en"]


Why just English? I believe that this transformation works well for many other languages

It can work with other languages in its current form but the dictionary I used (reference in the code and in the README file) does not cover the accented letters or characters outside the English alphabet. If such a resource is added, we can use this for other languages without skipping those characters.

marco-digio · 2021-09-14T14:26:05Z

transformations/font_change/transformation.py

+        for ttc in tokens_to_change:
+            while True:
+                font = random.sample(list(fonts.keys()), 1)[0]
+                if font != "normal":


Why not just removing "normal" from the dictionary in the JSON file?

It was by choice. I did not want any discrepancy between my dictionary and the original resource. Of course, it can be removed.

pithysr · 2021-09-15T05:32:45Z

Hello @marco-digio
Thank you very much for your review.
I agree that we can add the other two tasks. Should I add them and commit again?

…into font_change

kaustubhdhole · 2021-09-20T16:52:56Z

transformations/font_change/transformation.py

+from tasks.TaskTypes import TaskType
+
+nltkdl("stopwords")
+nltkdl("punkt")


This could be moved inside the constructor I think..

kaustubhdhole · 2021-09-20T16:54:19Z

transformations/hashtagify/transformation.py

+nltkdl("maxent_ne_chunker")
+nltkdl("punkt")
+nltkdl("averaged_perceptron_tagger")
+nltkdl("stopwords")


Even this could go inside the constructor..

aadesh11 · 2021-09-30T06:19:09Z

transformations/hashtagify/transformation.py

+
+    def __init__(self, seed=666, max_outputs=1):
+        super().__init__(seed, max_outputs=max_outputs)
+        self.nlp = spacy.load("en_core_web_sm")


Instead of loading new spacy, please use global spacy like below:

self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

aadesh11 · 2021-09-30T06:21:23Z

transformations/hashtagify/transformation.py

+    return perturbed_texts
+
+
+class HashtagifyTransformation(SentenceOperation):


Please add a docstring for all the classes and functions with a proper description of params and return type.

aadesh11 · 2021-09-30T06:26:24Z

transformations/font_change/transformation.py

+    return perturbed_texts
+
+
+class FontChange(SentenceOperation):


Same here, add docstrings for classes and functions.

aadesh11 · 2021-09-30T06:28:02Z

transformations/font_change/transformation.py

+        TaskType.TEXT_CLASSIFICATION,
+        TaskType.TEXT_TAGGING,
+    ]
+    languages = [


As the list is very long, better to create a separate file for this and import it here.

pithysr · 2021-10-04T16:12:13Z

The conflicts are with my other submission that is already approved. Should I open a new PR?

AbinayaM02 · 2021-10-07T12:38:50Z

The conflicts are with my other submission that is already approved. Should I open a new PR?

Hi @pithysr: You can continue fixing the conflict in this same PR. Thanks!

kaustubhdhole · 2021-10-28T20:47:23Z

transformations/font_change/README.md

+## Target Tasks
+
+This transformation can be used for data augmentation in text classification tasks.
+


You should add a Data And Code Provenance section to point out the correct source of all the files.

@kaustubhdhole Hi, I had this information in the README file but I added a separate section. However, now I cannot commit due to an error (No module named 'torchtext'). I did not have this error previously when submitting the code. Can I submit with git commit --no-verify?

pithysr added 5 commits August 31, 2021 20:51

font change transformation added post-opt

8758186

readme and class name changed

5f951e0

further changes in README

0c3cca6

further changes in README

cd61e02

Even further changes in README

599284e

kaustubhdhole added the transformation label Sep 3, 2021

marco-digio reviewed Sep 14, 2021

View reviewed changes

Merge branch 'GEM-benchmark:main' into font_change

0f29142

pithysr added 2 commits September 18, 2021 05:06

changes after the reviews

2b2f60c

Merge branch 'font_change' of https://github.com/pithysr/NL-Augmenter …

b7cbe62

…into font_change

kaustubhdhole reviewed Sep 20, 2021

View reviewed changes

constructor updated

a51022b

aadesh11 reviewed Sep 30, 2021

View reviewed changes

pithysr and others added 2 commits October 1, 2021 02:15

docstring added, list of languages added

92b3f78

Merge branch 'GEM-benchmark:main' into font_change

e74c46f

Merge branch 'main' into font_change

74c5711

kaustubhdhole reviewed Oct 28, 2021

View reviewed changes

data and code section added to README

9c82836

		return perturbed_texts


		class HashtagifyTransformation(SentenceOperation):

		## Target Tasks

		This transformation can be used for data augmentation in text classification tasks.

Conversation

pithysr commented Sep 1, 2021

Uh oh!

marco-digio commented Sep 14, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pithysr commented Sep 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pithysr commented Oct 4, 2021

Uh oh!

AbinayaM02 commented Oct 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants