diff --git a/doc/jvm/index.rst b/doc/jvm/index.rst index 0a2e947ea586..be14a563e71a 100644 --- a/doc/jvm/index.rst +++ b/doc/jvm/index.rst @@ -38,6 +38,7 @@ Contents XGBoost4J-Spark-GPU Tutorial Code Examples API docs + How to migrate from XGBoost Spark 3.x .. note:: diff --git a/doc/jvm/xgboost_spark_migration.rst b/doc/jvm/xgboost_spark_migration.rst new file mode 100644 index 000000000000..6c5a2b313362 --- /dev/null +++ b/doc/jvm/xgboost_spark_migration.rst @@ -0,0 +1,140 @@ +###################################################### +Migration Guide: How to migrate from XGBoost Spark 3.x +###################################################### + +XGBoost Spark underwent significant modifications beginning with version 3.0, +which may cause compatibility issues with existing user code. + +This guide will walk you through the process of updating your code to ensure +it's compatible with XGBoost Spark 3.0 and later versions. + +********************** +XGBoost Spark Packages +********************** + +XGBoost Spark 3.0 introduced a single uber package named xgboost-spark_2.12-3.0.0.jar, which bundles +both xgboost4j and xgboost4j-spark. This means you can now simply use `xgboost-spark`` for your application. + +.. code-block:: xml + + + ml.dmlc + xgboost-spark_${scala.binary.version} + 3.0.0 + + +When submitting the XGBoost application to the Spark cluster, you only need to specify the single `xgboost-spark` package. + +.. code-block:: bash + + spark-submit \ + --jars xgboost-spark_2.12-3.0.0.jar \ + ... \ + +************** +XGBoost Ranking +************** + +The ability to handle ranking problems using XGBoostRegressor has been discontinued. +As an alternative, we have introduced XGBoostRanker, which is specifically designed +to support ranking algorithms. + +.. code-block:: scala + + // before 3.0 + val regressor = new XGBoostRegressor().setObjective("rank:ndcg") + + // after 3.0 + val ranker = new XGBoostRanker() + +****************************** +XGBoost Constructor Parameters +****************************** + +XGBoost Spark now categorizes parameters into two groups: XGBoost-Spark parameters and XGBoost parameters. +When constructing an XGBoost estimator, only XGBoost-specific parameters are permitted. XGBoost-Spark specific +parameters must be configured using the estimator's setter methods. It's worth noting that +`XGBoost Parameters `_ +can be set both during construction and through the estimator's setter methods. + +.. code-block:: scala + + // before 3.0 + val xgboost_paras = Map( + "eta" -> "1", + "max_depth" -> "6", + "objective" -> "binary:logistic", + "num_round" -> 5, + "num_workers" -> 1, + "features" -> "feature_column", + "label" -> "label_column", + ) + val classifier = new XGBoostClassifier(xgboost_paras) + + + // after 3.0 + val xgboost_paras = Map( + "eta" -> "1", + "max_depth" -> "6", + "objective" -> "binary:logistic", + ) + val classifier = new XGBoostClassifier(xgboost_paras) + .setNumRound(5) + .setNumWorkers(1) + .setFeaturesCol("feature_column") + .setLabelCol("label_column") + + // Or you can use setter to set all parameters + val classifier = new XGBoostClassifier() + .setNumRound(5) + .setNumWorkers(1) + .setFeaturesCol("feature_column") + .setLabelCol("label_column") + .setEta(1) + .setMaxDepth(6) + .setObjective("binary:logistic") + +***************** +Unused Parameters +***************** + +Starting from 3.0, below parameters are not used anymore. + +- cacheTrainingSet + + If you wish to cache the training dataset, you have the option to implement caching + in your code prior to fitting the data to an estimator. + + .. code-block:: scala + + val df = input.cache() + val model = new XGBoostClassifier().fit(df) + +- trainTestRatio + + The following method can be employed to do the evaluation. + + .. code-block:: scala + + val Array(train, eval) = trainDf.randomSplit(Array(0.7, 0.3)) + val classifier = new XGBoostClassifer().setEvalDataset(eval) + val model = classifier.fit(train) + +- tracker_conf + + The following method can be used to configure RabitTracker. + + .. code-block:: scala + + val classifier = new XGBoostClassifer() + .setRabitTrackerTimeout(100) + .setRabitTrackerHostIp("192.168.0.2") + .setRabitTrackerPort(19203) + +- rabitRingReduceThreshold +- rabitTimeout +- rabitConnectRetry +- singlePrecisionHistogram +- lambdaBias +- interactionConstraints +- objectiveType