Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the stopword removal transformation #268

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

juanyiloke
Copy link

No description provided.

@@ -0,0 +1,14 @@
# Stopword Removal
Removes stopwords from a piece of text.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @juanyiloke please add your name, email and affiliation.

Copy link
Author

@juanyiloke juanyiloke Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

"sentence": "OMG!!! jUSTin is AmAZEballs!!!"
},
"outputs": [{
"sentence": "OMG!!! jUSTin is AmAZEballs!!!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why isn't the stopword "is" removed in this particular example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed!



def stopword_remove(text, max_outputs=1):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the max_outputs argument isn't used anywhere in the stopword_remove function?

Copy link
Author

@juanyiloke juanyiloke Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

super().__init__(seed, max_outputs=max_outputs)

def generate(self, raw_text: str):
pertubed_text = stopword_remove(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cosmetic change: pertubed_text -> perturbed_text

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed!

TaskType.TEXT_TO_TEXT_GENERATION,
]
languages = ["en"]
heavy = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mark this transformation as light i.e heavy = False, as we are only using nltk package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@juanyiloke juanyiloke requested a review from aadesh11 September 18, 2021 20:00
TaskType.TEXT_TO_TEXT_GENERATION,
]
languages = ["en"]
heavy = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the keywords

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded

Author: Juan Yi Loke
Email: [email protected]
Affliation: University of Toronto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you would need to add the Robustness Evaluation as per the instructions in the email :)

Copy link
Contributor

@msobrevillac msobrevillac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the transformation is well-implemented, however, I think it needs for being well-motivated description in order to show the usefulness of this transformation for the project.

@kaustubhdhole
Copy link
Collaborator

Okay, this transformation looks great. I do have a suggestion though: first, if you remove all stopwords, that can be dangerous: example the shakespeare sentence that you added in the README gives a clear example. It might be better you provide a parameter to control the amount of change that should be permitted in a sentence, eg. how many stopwords can be removed at a single time. This way you might be able to generate multiple sentences too with little loss in meaning. Besides, please add appropriate keywords and a robustness evaluation to make this PR stronger. Also, you might want to mention any work relating to the influence of stopword removal. (Might also give better insights).

Copy link
Collaborator

@aadesh11 aadesh11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add your transformation name in test/mapper.py so that the test job can run your test cases.

@juanyiloke
Copy link
Author

Thanks so much for the reviews everyone, I'll get to them this weekend. It's midterms season for me but that should be over soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants