Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use directly tf.Server to test ports availability #20

Open
jdlesage opened this issue Oct 13, 2018 · 3 comments
Open

use directly tf.Server to test ports availability #20

jdlesage opened this issue Oct 13, 2018 · 3 comments

Comments

@jdlesage
Copy link
Contributor

Instead of testing ports by opening a socket. Launch directly a tf.server that will do the same. It avoids to reconnect to the socket (and all bugs related to that...)

That's the solution used by dask.tensorflow to create a tensorflow cluster.

@superbobry
Copy link
Contributor

superbobry commented Oct 13, 2018

I have been considering this, but I am afraid it is not straightforward:

  • dask-tensorflow uses hardcoded port range [2222, ....) and assumes that all of the ports are free. If this is not the case, it would just crash. A simple fix would be to add a try-except and a while loop. However, for each failed attempt tf.train.Server would emit a message on stderr confusing the user.
  • An alternative to enumerating a hardcoded range of ports is to bind the server to port 0 but I am not sure it is possible with tf.train.Server.
  • I am also not sure if the cluster spec can be altered after the server has been created (this is needed for the current acquire-broadcast-start scheme).

@fhoering
Copy link
Contributor

Related tf issue created by @superbobry
tensorflow/tensorflow#21492

@fhoering
Copy link
Contributor

fhoering commented Apr 3, 2020

Discussed again in
tensorflow/tensorflow#35383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants