Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port metrics to select number of LDA topics to Python from R #1275

Open
tmylk opened this issue Apr 11, 2017 · 6 comments
Open

Port metrics to select number of LDA topics to Python from R #1275

tmylk opened this issue Apr 11, 2017 · 6 comments
Labels
difficulty medium Medium issue: required good gensim understanding & python skills wishlist Feature request

Comments

@tmylk
Copy link
Contributor

tmylk commented Apr 11, 2017

An R package ldatuning implements 4 metrics to select the best number of topics.
The metrics are quite easy to implement.
An ipynb with graphs implementing those metrics would be great.

Also see MDL for LSI ticket #28

@tmylk tmylk added difficulty medium Medium issue: required good gensim understanding & python skills wishlist Feature request labels Apr 11, 2017
@souravsingh
Copy link
Contributor

Do we make use of the rpy2 bridge or write the code in Python from scratch?

@tmylk
Copy link
Contributor Author

tmylk commented Apr 13, 2017

It is easier to write in Python from scratch

@souravsingh
Copy link
Contributor

souravsingh commented Aug 3, 2017

@menshikh-iv There is an inplementation for Arun metric here- https://github.com/AdrienGuille/TOM/blob/master/tom_lib/nlp/topic_model.py#L63

Would it be fine to use the implementation for Gensim?

@souravsingh
Copy link
Contributor

souravsingh commented Nov 11, 2017

The repo- https://github.com/WZBSocialScienceCenter/tmtoolkit implements Griffiths, Cao Juan and Arun metric, but can only use Cao Juan currently for Gensim.

We can mention this somewhere in the docs and close the issue.

@menshikh-iv
Copy link
Contributor

@souravsingh TM evaluation typically isn't trivial, for this reason, we want to see it as a part of gensim (in current lib OR in the notebook, depends on "how difficult it is to calculate the metric".

@moyid
Copy link

moyid commented Nov 7, 2020

Hi, has this ever been done yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty medium Medium issue: required good gensim understanding & python skills wishlist Feature request
Projects
None yet
Development

No branches or pull requests

4 participants