You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of testing ports by opening a socket. Launch directly a tf.server that will do the same. It avoids to reconnect to the socket (and all bugs related to that...)
That's the solution used by dask.tensorflow to create a tensorflow cluster.
The text was updated successfully, but these errors were encountered:
I have been considering this, but I am afraid it is not straightforward:
dask-tensorflow uses hardcoded port range [2222, ....) and assumes that all of the ports are free. If this is not the case, it would just crash. A simple fix would be to add a try-except and a while loop. However, for each failed attempt tf.train.Server would emit a message on stderr confusing the user.
An alternative to enumerating a hardcoded range of ports is to bind the server to port 0 but I am not sure it is possible with tf.train.Server.
I am also not sure if the cluster spec can be altered after the server has been created (this is needed for the current acquire-broadcast-start scheme).
Instead of testing ports by opening a socket. Launch directly a tf.server that will do the same. It avoids to reconnect to the socket (and all bugs related to that...)
That's the solution used by dask.tensorflow to create a tensorflow cluster.
The text was updated successfully, but these errors were encountered: