Be able to launch multiple evaluations in parallel #37

fhoering · 2019-07-30T16:25:55Z

Currently train_and_evaluate evaluation is done in a thread that always reads the latest model.
https://github.com/tensorflow/estimator/blob/master/tensorflow_estimator/python/estimator/training.py#L798

Using a distribution strategy for evaluation doesn't seem to work well.

We could split up the train_and_evaluate function, call distributed training with the cluster_spec and do evaluation separately via calling the estimator.evaluate function.

evaluate(
    input_fn,
    steps=None,
    hooks=None,
    checkpoint_path=None,
    name=None
)

This would allow to spawn many evaluators with skein, give each a different checkpoint path and do evaluation on different checkpoints in parallel.

The text was updated successfully, but these errors were encountered:

fhoering changed the title ~~Be able to launch multiple Evaluation in parallel~~ Be able to launch multiple evaluations in parallel Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be able to launch multiple evaluations in parallel #37

Be able to launch multiple evaluations in parallel #37

fhoering commented Jul 30, 2019

Be able to launch multiple evaluations in parallel #37

Be able to launch multiple evaluations in parallel #37

Comments

fhoering commented Jul 30, 2019