-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDA model Incorrect treatment of damping factor when update() has multiple passes #298
Comments
Let me see if I follow you correctly: Passes effectively repeats the corpus given to This seems desirable to me; each pass through the corpus is damped more and more. However, you are right about using What would be a better way to approach this? Ensure that after a corpus has been through all of it's passes that |
Chris, As for a better way I think that rho(t) should be left for its intended purposes and another decaying variable rho(p) where p is for passes should be introduced. Then for the next "corpus" rho(t) will take the value as prescribed by Algo 2 while rho(p) is going to be reset to its initial value. Seems it is going to be a nice compromise and an improvement. Will you agree? |
Ping @wbogorad , @piskvorky Let me know what you think about this implementation. I haven't tested it on Wikipedia (yet), but the rho(t) resets as expected when passes are involved. Below are the rho values at the beginning of the passes loop, before the update. 3 updates with a single pass (the expected values):PASS 1: 1.0 3 updates with five passes (the test values):update 1:PASS 1: 1.0 update 2:PASS 1: 0.316227766017 update 3:PASS 1: 0.229415733871 Hence, for each update, the rho value resets, while increasing passes dampens the value for each pass. |
- Fixes update_alpha problem when calling rho(), pass_ wasn't in scope - Change rho: - To not rely on num_updates/chunksize: when updates from a corpus size < chunksize, rho remains 1.0 for most updates. - *add* the pass count with the number of updates - Change the num_updates to actually be number of updates, rather than number of documents seen. Matches paper now, this relates to the chunksize problem.
- Fixes update_alpha problem when calling rho(), pass_ wasn't in scope - Change rho to *add* the pass count with the number of updates - On update, set chunksize to be the corpus length if len(corpus) < chunksize and no chunksize was specified
Squashed the previous commits on the branch into more appropriate ones. The one specific to this issue is commit edc3ce5. |
@wbogorad let us know if this works better for you. I have no preference here. Cheers! |
Sorry guys for my slow replies. On Mon, Apr 27, 2015 at 5:20 AM, Radim Řehůřek [email protected]
|
Hi Chris and Radim, I have reviewed the code and I agree with Chris's changes. To summarize, num_updates get increased only during pass_=0 thus rho Just for convenience here are relevant lines from his code: where docs correspond to mini-batches roughly similar to chunks. Now I would like to add a few words about the differences between gensim for each pass I think that if one is OK with going through the data several times a Please note that we keep the same gamma matrix in the inner loop. This algorithm is a truly mini-batch gradient descent and the learning On Wed, Apr 29, 2015 at 8:35 AM, Walter Bogorad [email protected] wrote:
|
Great, I've merged in the changes thus far. Leaving this issue open to revisit the chunk/pass switch when time allows :) |
@wbogorad the multipass is a strict extension of the original single pass algorithm. If you want Hoffman's original version, set |
@cscorley Do you plan to continue working on the chunk/pass switch? |
I had worked on an implementation, but never got around to finishing it, for a couple reasons: Right now, I know scikit-learn also takes the same approach. However, it's only a part of their batch mode. I'm not sure it was a move beyond doing what gensim does. I think swapping them (especially in the online case) makes sense, as @wbogorad explains. Batch mode remains the same either way. But I'm also not sure if that sort of switch is going to be a big deal to end users or not. I'm guessing that it would be. |
Radim,
I think that rhot gets changed too frequently in ldamodel.update() than it should in online mode.
rhot is only supposed to damp new chunk of documents with time according to Hoffman's Algorithm 2.
However, in case of multiple passes it also damps the the updates progressively with each pass.
I am not sure that this is correct.
This is because do_mstep() is called during each pass and it increases self.num_updates variable that drives rhot
As another consequence new chunks get suppressed more than they should.
By the way, I appreciate that you introduced multiple passes, Hoffman does not have them and in my experiments they do help achieving more robust convergence.
What do you think?
The text was updated successfully, but these errors were encountered: