-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission for #97 #141
base: master
Are you sure you want to change the base?
Submission for #97 #141
Conversation
90fabd9
to
5f7590a
Compare
Hi, where can I see the review? |
Hi @JACKHAHA363, reviewer(s) has been just assigned to all the projects, the reviews will be posted by our bot @reproducibility-org in the respective Pull Requests, and you will have the opportunity to correct your submission and respond to the reviewers. |
Hi, please find below a review submitted by one of the reviewers: Score: General questions:
Some questions related to implementation:
* Edit: Reviewer updated score based on author feedback |
Hi, please find below a review submitted by one of the reviewers: Score: 7 In light of this, the authors restrict their analysis to the policy gradient baseline used in the original paper. This seems like a sensible scope for a reproduction study, as it focuses on one aspect of the problem that is frequently difficult to reproduce. However, this choice also limits the potential usefulness of the reproduction: the main result of the original paper is not that policy gradients produce the phenomenon of language drift (which was anticipated based on results elsewhere in the literature on language learning using self-play), but that the proposed grounding method reliably solves this. Results of this nature would most likely strengthen the impact of the reproduction. That being said, the authors very clearly specify the scope of the reproduction they are attempting, so I don’t feel this unduly limits the usefulness of this reproduction. Code Communication with original authors Hyperparameter search Ablation Study Discussion of results Recommendations for reproducibility Overall organization and clarity |
Hi, please find below a review submitted by one of the reviewers: Score: 7 |
Repsonse reviewer 3
I rechecked on Openreview and realized that the original comment was set to private. It is public now, and the reviewer should be able to see the exchange.
Using a more advanced policy gradient methods like PPO and TRPO is definitely worth trying, but we think it might be beyond the scope of this report because we want to stay close to the original paper. In addition, this policy gradient method (REINFORCE with learnt value baseline) is also widely employed in current self-play/RL in NLP community[1, 2], so we think confirming the language drift of this method should be representative. That being said, we are aware of this and we also implemented the PPO here https://github.com/JACKHAHA363/language_drift/blob/master/ld_research/training/finetune_ppo.py
Yes. We also try to use a linearly decaying learning rate, but that turns out to converge to suboptimal. The hyperparameters on the Agent B does not make much difference, but maybe we did not perform a thorough enough search on that. The motivation for our focus on learning rate and \alpha_ent is that 1) policy gradient is known to be very sensitive to the learning rate, which is the motivation of methods like TRPO, and 2) the effect of \alpha_ent to the finetune results is discussed by one of the reviewer and the authors. We update the paper to include more details on hyperparameter optimization.
We think it could be caused by learning rate schedule. In the original paper, the authors does not articulate these, even if they seem to perform , and we use the default one from OpenNMT. We think it could be worthwhile to reproduce the pretrained results, but the main focus of this report would be to confirm the language drift from pre-training to policy gradient fine-tuning on a new corpus. Response Reviewer 1
Yes we agree that we did not reproduce the main claim of the authors, but in our experiments we have some ongoing results of finetuning using language model, which is implemented here https://github.com/JACKHAHA363/language_drift/blob/master/ld_research/training/finetune.py#L531. We choose to restrict ourselves, since we would like to have a more thorough discussion, which is also kindly noticed in your review. Reference: [2] Bahdanau, Dzmitry, et al. "An actor-critic algorithm for sequence prediction." arXiv preprint arXiv:1607.07086 (2016). |
To all reviewersThank you for your time! we just update our paper 7d522be accordingly to include a section highlighting the potential pitfalls and challenge during our attempt to reproduce. @reproducibility-org |
@koustuvsinha Will the bot update my response? |
@reproducibility-org Any updates? |
Updates? |
Submission for #97