-
Notifications
You must be signed in to change notification settings - Fork 43
[WIP] Adagrad solver for arbitrary orders. #8
base: master
Are you sure you want to change the base?
Conversation
Returns | ||
------- | ||
Gram matrix : array of shape (n_samples_1, n_samples_2) | ||
""" | ||
if degree == 2: | ||
|
||
if degree > 3 or method == 'dp': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about degree == 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cases degree in (2, 3)
are dealt with using the old closed-form approach. My benchmarks show this is faster in batch settings like this.
|
||
cdef Py_ssize_t t, j, jj | ||
|
||
for jj in range(nnz + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython's numpy should support low level vectorized =
operations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean A[:, 0] = 1
? I think that works only for numpy arrays, not sure if it's implemented for generic memoryviews. I'll give it a try.
@@ -0,0 +1,67 @@ | |||
cdef inline double _fast_anova_kernel(double[::1, :] A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs the cython directives for efficiency, too. I added them but didn't get to commit yet, because I'm in the middle of some profiling: the adagrad solver is at the moment slower than I expected.
Is there anything I can do to help complete this PR? |
I'm a bit busy these days, but I plan to focus on this project and get a release out by the end of the year. The current problem with this PR is that the adagrad solver seems to take quite more time per epoch than CD. If you manage to pinpoint the issue, it would be great. --(Let me try to push the work-in-progress benchmark code that I have around.)-- edit: it was already up |
@vene Sounds good - I'm happy to take a look and see if I can improve the performance. |
Making P and P-like arrays (especially the gradient output) fortran-ordered makes this ~1.5x faster, but coordinate descent solvers are still faster per iteration, it seems. Also weird that degree 3 is faster than degree 2. I set a very low tolerance to prevent it from converging early, but I should check why. I just realized it might be because the alpha and beta regularization parameters have different meanings for the two solvers unless accounted for.
|
Indeed the regularization are not the same because of the 1/n factor in front of the loss when using stochastic gradient algorithms. |
Regularization scaling is now ON by default. I think this is sensible, because it keeps the choice independent of data split. Adagrad seems very sensitive to the initial norm of P, so I changed the init to have unit variance rather than 0.01. Makes benchmark more reasonable but norms are still weird. Finnicky tests (fm warm starts) had to be updated, but most things behave well.
Here's the performance after a bunch of tweaking and making the problem easier. I'm printing out the norm of P to emphasize the big inconsistency of the solutions, even after correctly setting regularization terms. This is weird. When initializing P with standard deviation 0.01, adagrad sets it to zero very quickly. (especially with lower learning rates).
|
Appveyor crash is not a test failure, for some reason I get |
Now supports explicit fitting of lower orders. No performance degradation but the code is a bit unrolled and could be written clearer. I'm a bit confused by the way adagrad reacts to the learning rate, especially in the example, and why it seems to shrink to 0 faster with lower learning rates. But the tests, at least, suggest things are sensible. |
On second thought, the windows crash is not a fluke. The exit code -1073740940, in hex, is 0xC0000374, which apparently means heap corruption It seems that my last commit fixed it. |
Implements the SG algorithm from Higher-order Factorization Machines.
Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata.
In Proceedings of Neural Information Processing Systems (NIPS), December 2016.
Todos