Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues getting this to work on Linux #2

Open
samr opened this issue Nov 17, 2024 · 0 comments
Open

Some issues getting this to work on Linux #2

samr opened this issue Nov 17, 2024 · 0 comments

Comments

@samr
Copy link

samr commented Nov 17, 2024

Overall, I really liked the write up on your blog. Thank you.

I ran into a few different issues that may or may not be related to getting this to work on Linux.

  1. sklearn.dataset.make_classification appears to return NaN values for the Result column when running dataset.py to create the dataset. (This is likely not related to Linux)
  2. The mlflow experiment's artifact_location defaults to "/C:" which is very windows specific.

My fixes for these were:

diff --git a/steps/clean.py b/steps/clean.py
index ecbcc4b..64d755f 100644
--- a/steps/clean.py
+++ b/steps/clean.py
@@ -24,5 +24,9 @@ class Cleaner:
         IQR = Q3 - Q1
         upper_bound = Q3 + 1.5 * IQR
         data = data[data['AnnualPremium'] <= upper_bound]
+
+        data['Result'] = data['Result'].fillna(0.0)

diff --git a/main.py b/main.py
index 71b6ce1..d8936ad 100644
--- a/main.py
+++ b/main.py
@@ -1,3 +1,4 @@
+import os
 import logging
 import yaml
 import mlflow
@@ -49,9 +50,16 @@ def train_with_mlflow():
     with open('config.yml', 'r') as file:
         config = yaml.safe_load(file)
 
-    mlflow.set_experiment("Model Training Experiment")
-    
-    with mlflow.start_run() as run:
+    experiment_name = "Model Training Experiment #1"
+    try:
+        experiment = mlflow.get_experiment_by_name(experiment_name)
+        experiment_id = experiment.experiment_id
+    except AttributeError:
+        print(f"Creating experiment: {experiment_name}")
+        artifact_path = os.path.join(os.path.dirname(__file__), "mlruns")
+        experiment_id = mlflow.create_experiment(experiment_name, artifact_location=artifact_path)
+
+    with mlflow.start_run(experiment_id=experiment_id) as run:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant