-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port metrics to select number of LDA topics to Python from R #1275
Comments
Do we make use of the rpy2 bridge or write the code in Python from scratch? |
It is easier to write in Python from scratch |
@menshikh-iv There is an inplementation for Arun metric here- https://github.com/AdrienGuille/TOM/blob/master/tom_lib/nlp/topic_model.py#L63 Would it be fine to use the implementation for Gensim? |
The repo- https://github.com/WZBSocialScienceCenter/tmtoolkit implements Griffiths, Cao Juan and Arun metric, but can only use Cao Juan currently for Gensim. We can mention this somewhere in the docs and close the issue. |
@souravsingh TM evaluation typically isn't trivial, for this reason, we want to see it as a part of gensim (in current lib OR in the notebook, depends on "how difficult it is to calculate the metric". |
Hi, has this ever been done yet? |
An R package ldatuning implements 4 metrics to select the best number of topics.
The metrics are quite easy to implement.
An ipynb with graphs implementing those metrics would be great.
Also see MDL for LSI ticket #28
The text was updated successfully, but these errors were encountered: