-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shouldn't manager_vf be function of x_t? #2
Comments
That's correct, we caught that late in this development cycle. We still
couldn't get anything to converge with that value function, but you're
correct that it should be built from the visual input.
…On Mon, Sep 11, 2017 at 2:17 AM, imbalu007 ***@***.***> wrote:
Right after eq.(7) in the paper <https://arxiv.org/pdf/1703.01161.pdf>,
the authors say V_t as a function of x_t. However, in the code it is a
function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALHLEtC3itFyh26hIcrDFt4nhyDlaemDks5shNBsgaJpZM4PSrxj>
.
|
Hi, Thanks |
I'll make a formal post about it as I haven't looked at this in a bit (been
busy elsewhere), but as of now I've talked with some researchers and we
have reason to believe that FeUdal networks will not converge as described
in the original paper.
In the coming weeks I'll try to consolidate the code-base and clean it up
as well as possible in case (1) new details are published on how to
actually train these networks or (2) someone else can figure out a magic
bullet.
…On Wed, Sep 13, 2017 at 9:25 AM, Lior Shani ***@***.***> wrote:
Hi,
I didn't understand your answer - Are you still trying to implement it or
did you abandon this repository?
Thanks
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALHLEk_RL2ZO9KS034nHLP3_R2q-llVwks5sh9evgaJpZM4PSrxj>
.
|
Thanks for the quick and detailed response. If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?).
|
I do have a dilated LSTM implemented, its in a branch (see dlstm_fix).
Hence the need for consolidating! You can find it in models/models.py
…On Wed, Sep 13, 2017 at 2:28 PM, Lior Shani ***@***.***> wrote:
Thanks for the quick and detalied response.
If I can offer any help, I haven't seen an implementation of the "dilated
lstm" in your code (or maybe I missed it?).
I think it's a core idea in this paper for two reasons:
1. Had it worked without it, I don't believe it was in the paper.
2. Without it, there is no mechanism that controls the time interval
c, so the goals objective is ill-defined. I believe it suppose to act as
some kind of a finite-state machine.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALHLEuxbIhqJcBQv2cTRXgc8gxB6QCuSks5siB7BgaJpZM4PSrxj>
.
|
Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it). |
@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there? |
Sorry this has been off my mind for a while. I doubt a post is coming. In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance. |
Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?
The text was updated successfully, but these errors were encountered: