Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong derivative of first layer in get_grad_and_error? #2

Open
simNN7 opened this issue Mar 7, 2015 · 3 comments
Open

Wrong derivative of first layer in get_grad_and_error? #2

simNN7 opened this issue Mar 7, 2015 · 3 comments

Comments

@simNN7
Copy link

simNN7 commented Mar 7, 2015

Hi Lars,

I have one question to the derivative you have used during back propagation.
I haven't quite understood why you are using the normalized data in your gradient evaluation for the weights of the input layer.
Shouldn't it be just the unnormalized x values?

In get_grad_and_error function of fine-tuning, you calculate

   x[:, :-1] = get_norm_x(x[:, :-1])
    ...
    for i in range(number_of_weights - 1, -1, -1):
        if i == number_of_weights - 1:
            ...
        elif i == 0:
            ...
            grad = dot(x.T, delta)  #  
        else:
           ...

Is it not:

  D = x[:, :-1] .sum(axis=1)
  D = D[:, numpy.newaxis]
   ...
    for i in range(number_of_weights - 1, -1, -1):
        if i == number_of_weights - 1:
            ...
        elif i == 0:
            ...
            grad = numpy.dot( numpy.append( x[:, :-1], D, axis = 1).T, delta)
        else:
           ...
@larsmaaloee
Copy link
Owner

Hello,

Sorry for the late reply. The reason for this is to make sure that the log in the cross-entropy cost function doesn’t complain. It has no effect on the performance of the model, but will ensure that you will get no division by zero error:

np.log(0)
main:1: RuntimeWarning: divide by zero encountered in log
-inf

Please let me know if you have any other questions.

Thanks.

Best regards


Lars Maaløe
PHD Student
DTU Compute
Technical University of Denmark (DTU)

Email: [email protected], [email protected]
Phone: 0045 2229 1010
Skype: lars.maaloe
LinkedIn http://dk.linkedin.com/in/larsmaaloe
DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 07 Mar 2015, at 15:58, simNN7 [email protected] wrote:

Hi Lars,

I have one question to the derivative you used during back propagation.
I haven't quite understood why you are using the normalized data in your gradient evaluation.
Shouldn't it be just the unnormalized x values?

in get_grad_and_error of fine-tuning:

x[:, :-1] = get_norm_x(x[:, :-1])
...
for i in range(number_of_weights - 1, -1, -1):
if i == number_of_weights - 1:
...
elif i == 0:
...
grad = dot(x.T, delta) # <--- unnormalized inputs here?
else:
...

Reply to this email directly or view it on GitHub #2.

@simNN7
Copy link
Author

simNN7 commented Mar 12, 2015

Hi,

yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?

@larsmaaloee
Copy link
Owner

Hello again,

It is a common trick to compare the probabilities to the normalised word counts to avoid sampling from the multinomial distribution.

Best regards


Lars Maaløe
PHD Student
DTU Compute
Technical University of Denmark (DTU)

Email: [email protected], [email protected]
Phone: 0045 2229 1010
Skype: lars.maaloe
LinkedIn http://dk.linkedin.com/in/larsmaaloe
DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 12 Mar 2015, at 21:37, simNN7 [email protected] wrote:

Hi,

yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?


Reply to this email directly or view it on GitHub #2 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants