Missing temporary file in train step #33

johannes-hoeffer · 2023-09-13T14:11:22Z

Hi guys, I'trying out MLFlow Recipes for the first time in an Azure Databricks environment. Until yesterday, everything went fine from ingestion to prediction. Today however, I'm running into an Error saying that MLFlow can't find what looks to me like a temporary file while training my model. I really don't do anything fancy in all of these steps and want to use an LGBMClassifier for training. MLFlow version is 2.7.0, but it doesn't seem to work on any other version I tried.

experiment_name = "experiment_name"

if not mlflow.get_experiment_by_name(experiment_name):
    mlflow.create_experiment(name=experiment_name )
else:
    mlflow.set_experiment(experiment_name)
experiment = mlflow.get_experiment_by_name(experiment_name)

r = Recipe(profile="databricks")
r.clean()
r.inspect()
r.run("ingest")
r.run("split")
r.run("transform")
r.run("train")

Here's what my estimator function looks like in train.py. estimator_params are defined in recipe.yaml.

def estimator_fn(estimator_params: Dict[str, Any] = None):
    from lightgbm import LGBMClassifier

    if estimator_params is None:
        estimator_params = {}
        
    return LGBMClassifier(**estimator_params)

As I said, the same code worked fine for me yesterday, but today I'm running into this error:

Run MLFlow Recipe step: train
2023/09/13 11:09:36 INFO mlflow.recipes.step: Running step train...
2023/09/13 11:09:38 INFO mlflow.recipes.steps.train: Class imbalance of 0.50 is better than 0.3, no need to rebalance
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/recipes/step.py", line 132, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/recipes/steps/train.py", line 373, in _run
    logged_estimator = self._log_estimator_to_mlflow(fitted_estimator, X_train)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/recipes/steps/train.py", line 1270, in _log_estimator_to_mlflow
    return mlflow.sklearn.log_model(
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/sklearn/__init__.py", line 408, in log_model
    return Model.log(
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/models/model.py", line 568, in log
    with TempDir() as tmp:
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/utils/file_utils.py", line 383, in __enter__
    self._path = os.path.abspath(create_tmp_dir())
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f235e133-8940-41eb-b389-d9cf570c187a/lib/python3.10/site-packages/mlflow/utils/file_utils.py", line 830, in create_tmp_dir
    return tempfile.mkdtemp(dir=repl_local_tmp_dir)
  File "/usr/lib/python3.10/tempfile.py", line 507, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/repl_tmp_data/ReplId-68395-9c373-e0490-3/tmpuyeyu8co'
make: *** [Makefile:40: steps/train/outputs/model] Error 1

I really don't know what to do since the stacktrace seems to suggest some MLFlow internal error. Any help would be appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing temporary file in train step #33

Missing temporary file in train step #33

johannes-hoeffer commented Sep 13, 2023

Missing temporary file in train step #33

Missing temporary file in train step #33

Comments

johannes-hoeffer commented Sep 13, 2023