You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unlike the JVM binding, where, after training the XGBoost Spark model, we can retrieve a summary of the training and evaluation sets passed, this functionality is not currently available in the PySpark Python XGBoost binding.
In the JVM package, this is defined through this class. We obtain the summary while training the model, which first passes through this method, and in the end reaches this line. After that, when creating the Spark[Classification|Regression|Ranker]Model, we pass through a constructor like this one.
I followed a similar approach to expose and retrieve these metrics in the PySpark XGBoost binding, and I will be submitting a review soon. Any feedback is welcome.
The text was updated successfully, but these errors were encountered:
Thank you for the PR. Do you plan to work on the summary class? I think the JVM package is mimicking the spark ML summary structure, there are similar classes for pyspark ml.
Hello Jiaming, thanks for your quick response. I implemented the summary class and pushed the review.
Indeed ! And in PySpark they implemente a wrapper around the JVM one . Here, I implemented everything from scratch because the PySpark XGBoost binding is not a wrapper around the XGBoost JVM package binding.
Unlike the JVM binding, where, after training the XGBoost Spark model, we can retrieve a summary of the training and evaluation sets passed, this functionality is not currently available in the PySpark Python XGBoost binding.
In the JVM package, this is defined through this class. We obtain the summary while training the model, which first passes through this method, and in the end reaches this line. After that, when creating the Spark[Classification|Regression|Ranker]Model, we pass through a constructor like this one.
I followed a similar approach to expose and retrieve these metrics in the PySpark XGBoost binding, and I will be submitting a review soon. Any feedback is welcome.
The text was updated successfully, but these errors were encountered: