-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
final submission for issue #104 #158
base: master
Are you sure you want to change the base?
final submission for issue #104 #158
Conversation
Hi, please find below a review submitted by one of the reviewers: Score: 8 [Code]
[Communication with original authors]
[Hyperparameter Search]
[Ablation Study]
[Discussion on results]
[Recommendations for reproducibility]
[Overall organization and clarity]
Score: 8 (accept) |
Hi, please find below a review submitted by one of the reviewers: Score: 7 The reproducers seem to have understood the paper, but oddly their introduction manages to obscure somewhat the problem that the authors solved. It is somewhat misleading; The original paper makes it clear rather quickly that it is concerned with the dynamic adjustment of learning rates during learning. I had to read the report's introduction side-by-side with the original paper, so it is less than ideal. The proposed contributions and new algorithms for choosing the LR are relatively poorly explained, but the experiments are well explained. [Code] The authors re-implemented the idea from scratch, so this is an attempt at true reproduction rather than repeatability. The code is readable, well commented, and can be followed easily. I have not attempted to check for its detailed correctness. [Communication with original authors] The reproducers have not communicated with the original authors, or at least have not reported doing so. [Hyperparameter Search] The reproducers have attempted to recreate the plots of the authors using the same hyperparameters, and encountered problems. This raises a few questions about the authors' work, but is not automatically a red flag. Unfortunately, the reproducers did not attempt to contact the authors in an attempt at resolving the discrepancy; So we do not know whether there is a software bug (perhaps with more careful review of the code this could be found), or the paper is genuinely irreproducible. [Ablation studies] Ablation studies are not truly relevant to the present paper and report. The paper studies a complete, self-contained set of LR adaptation policies, presenting theoretical backing for them and then using them. [Discussion of results] The report describes a fair attempt at reproducing the work. For the cases the original authors drew most attention to (improved performance at risky high initial LRs), the reproducers have found the proposed LR adaptation policy not to perform as well as claimed. The reproducers do not provide suggestions for how this paper could be salvaged, but it is not clear whether they in fact could; The authors specifically designed their algorithms to perform better at higher learning rates (presumably to reduce time to convergence) than SGD, and the reproducers find that they must use lower LRs than the authors have claimed. Absent a bug in their code, this suggests that the authors may have had more luck, or put more effort, into tuning their own algorithms than SGD. [Overall organization and clarity] The mathematical typography leaves something to be desired, and the English is somewhat unidiomatic, especially near the beginning of the paper, making it harder to read and less well-polished than it could be, but this does not detract from its findings. |
Hi, please find below a review submitted by one of the reviewers: Score: 7 Regarding the result and discussion, the report excels in concise and to the point discussion. However, some of the claims seems to be not supported by experimental results. Section 4.1 for example, the report mentions that they tried fixing the initial learning rate for first few rounds - without any supporting results which decisively shows that it indeed didn't work. It is also unclear what the authors means by "bound values", probably they are referring to gradient clipping? If so, what was the clipping used and the result corresponding to each clipping value should be reported. In short, there was room for more hyperparameter search. In Section 4.2, the convergence times which are being compared against, it would be great if the learning rate would also be mentioned accordingly. The report also does not provide any suggestions for the original authors to help with the reproducibility effort. The overall organization is good except a few grammatical errors. |
#104