-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong derivative of first layer in get_grad_and_error? #2
Comments
Hello, Sorry for the late reply. The reason for this is to make sure that the log in the cross-entropy cost function doesn’t complain. It has no effect on the performance of the model, but will ensure that you will get no division by zero error:
Please let me know if you have any other questions. Thanks. Best regards Lars Maaløe Email: [email protected], [email protected]
|
Hi, yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)? |
Hello again, It is a common trick to compare the probabilities to the normalised word counts to avoid sampling from the multinomial distribution. Best regards Lars Maaløe Email: [email protected], [email protected]
|
Hi Lars,
I have one question to the derivative you have used during back propagation.
I haven't quite understood why you are using the normalized data in your gradient evaluation for the weights of the input layer.
Shouldn't it be just the unnormalized x values?
In get_grad_and_error function of fine-tuning, you calculate
Is it not:
The text was updated successfully, but these errors were encountered: