Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any info for tweaking training settings for those with little background in LSTMs? #196

Open
broccolus opened this issue Jun 2, 2017 · 2 comments

Comments

@broccolus
Copy link

Hi All,

Not sure if this is the right place to post this, but I'm looking for a little extra info on how to choose parameters for model training. I have very little background in anything to do with neural networks or even any programming skills. I have been curious to try a little experiment as I find this software fascinating. I am tech-savvy enough that I have installed everything successfully and started training using the default settings. My data set is about 3,000,000 characters. Right now it seems I have reached a point of diminishing returns - the model is consistently underfitting and the loss value doesn't seem to be changing much at each checkpoint. By underfitting I mean that there are consistently many gibberish words and erratic sentence structures despite the structured nature of the data set. A few questions:

  1. How many epochs would training a model generally require to produce effective results? I made it to about 13/50 and it goes quite slow (cpu mode on a crap computer, this point took me >48hrs of constant running). Am I just being impatient? Could the loss value start to change again even after a perceived plateau? Is loss the be-all-end-all of evaluating a training run, or could the model be still improving even if the loss value doesn't change?

  2. If I am faced with underfitting, which model parameters should I change first to improve it? -rnn_size, -num_layers, -batch_size, something else?

  3. Does anybody have any resources designed for beginners that help explain the theory behind neural networks to help me understand exactly what is going on so I can improve my understanding and answer these questions myself?

Thanks all
J

@antihutka
Copy link
Contributor

I'd try increasing rnn_size and/or num_layers first. Also make sure loss doesn't drop every lr_decay_every epochs.
CPU training shouldn't take that long for a small network, there are some tricks that might help:

@broccolus
Copy link
Author

Fantastic info! I'll set up another run tonight with these considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants