-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online LDA with infinite vocabulary #213
Comments
There's also some Python code here: https://github.com/kzhai/InfVocLDA |
Nice. I am actually working on some research that definitely needs this, and was just about to do a literature search on the topic (this afternoon, too!). I would definitely be interested in working on a branch for bringing this to Gensim. |
How to do it on single machine? |
Memory out of memory issue if i have huge vocab with existing LDA in gensim. is this resolve that issue? |
@gauravkoradiya no, not related. Please stop hijacking unrelated issues. If you have some question, articulate it properly and use the mailing list. |
Potentially quite a biggie this one and I'm fully expecting a "patches welcome" response, but: when doing true online learning over document streams it's quite nice not to have to fix the vocabulary upfront. Also nice if you want to model the long tail of vocabulary to have a model whose update steps aren't linear in the vocabulary size.
There's a recent paper Online Latent Dirichlet Allocation with Infinite Vocabulary which extends the online variational inference approach from gensim's LdaModel to work in this setting, and could be a good starting point.
The text was updated successfully, but these errors were encountered: