Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEXT Cleaning #49

Open
mgg-new opened this issue Jan 12, 2023 · 5 comments
Open

TEXT Cleaning #49

mgg-new opened this issue Jan 12, 2023 · 5 comments

Comments

@mgg-new
Copy link

mgg-new commented Jan 12, 2023

module 'emoji' has no attribute 'get_emoji_regexp'

@andrewtavis
Copy link
Owner

Hey @mgg-new! Thanks for letting me know :) This is likely something to do with a new version of emoji. kwx is set up with version 1.2.0, and they're now on 2.2.0. Would you have interest in helping with this? It should actually be an easy fix where we just figure out what the new name for get_emoji_regexp is and do the update :)

Thanks again!

@mgg-new
Copy link
Author

mgg-new commented Jan 13, 2023

https://carpedm20.github.io/emoji/docs/
The function get_emoji_regexp() was removed in 2.0.0. Internally the module no longer uses a regular expression when scanning for emoji in a string (e.g. in demojize()).
The regular expression was slow in Python 3 and it failed to correctly find certain combinations of long emoji (emoji consisting of multiple Unicode codepoints).
If you used the regular expression to remove emoji from strings, you can use replace_emoji() as shown in the examples above.
If you want to extract emoji from strings, you can use emoji_list() as a replacement.
If you want to keep using a regular expression despite its problems, you can create the expression yourself

@andrewtavis
Copy link
Owner

Thanks, @mgg-new! Appreciate you taking the time to detail it all. I think I'll have a bit more bandwidth in about a week or so to look into all this :) I'll be in touch! 😊

andrewtavis added a commit that referenced this issue Jan 28, 2023
andrewtavis added a commit that referenced this issue Jan 28, 2023
@andrewtavis
Copy link
Owner

@mgg-new, I updated emoji in the dependancies and changed the spot where get_emoji_regexp was used to the new method. Thanks for the research you put into this :) I just released v1.0.2 to account for this shift. There was an error in the tests for the PR, but then the local ones passed, so as of now I'm not really going to worry about it.

@andrewtavis
Copy link
Owner

Will leave this issue open in case there are future issues related to that failed test or other related issues 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants