-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the stopword removal transformation #268
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,14 @@ | |||
# Stopword Removal | |||
Removes stopwords from a piece of text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @juanyiloke please add your name, email and affiliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! @kaustubhdhole
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
"sentence": "OMG!!! jUSTin is AmAZEballs!!!" | ||
}, | ||
"outputs": [{ | ||
"sentence": "OMG!!! jUSTin is AmAZEballs!!!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why isn't the stopword "is" removed in this particular example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed!
|
||
|
||
def stopword_remove(text, max_outputs=1): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the max_outputs
argument isn't used anywhere in the stopword_remove
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
super().__init__(seed, max_outputs=max_outputs) | ||
|
||
def generate(self, raw_text: str): | ||
pertubed_text = stopword_remove( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cosmetic change: pertubed_text -> perturbed_text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed!
TaskType.TEXT_TO_TEXT_GENERATION, | ||
] | ||
languages = ["en"] | ||
heavy = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can mark this transformation as light i.e heavy = False, as we are only using nltk package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
TaskType.TEXT_TO_TEXT_GENERATION, | ||
] | ||
languages = ["en"] | ||
heavy = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add the keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconded
Author: Juan Yi Loke | ||
Email: [email protected] | ||
Affliation: University of Toronto | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you would need to add the Robustness Evaluation as per the instructions in the email :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the transformation is well-implemented, however, I think it needs for being well-motivated description in order to show the usefulness of this transformation for the project.
Okay, this transformation looks great. I do have a suggestion though: first, if you remove all stopwords, that can be dangerous: example the shakespeare sentence that you added in the README gives a clear example. It might be better you provide a parameter to control the amount of change that should be permitted in a sentence, eg. how many stopwords can be removed at a single time. This way you might be able to generate multiple sentences too with little loss in meaning. Besides, please add appropriate keywords and a robustness evaluation to make this PR stronger. Also, you might want to mention any work relating to the influence of stopword removal. (Might also give better insights). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add your transformation name in test/mapper.py so that the test job can run your test cases.
Thanks so much for the reviews everyone, I'll get to them this weekend. It's midterms season for me but that should be over soon. |
No description provided.