MOD: complex model

kanyun-inc · May 19, 2017 · 7da4793 · 7da4793
1 parent aecc9a2
commit 7da4793
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/performance_guide.md b/docs/performance_guide.md
@@ -4,7 +4,7 @@
 - If the task contains lots of categorical features, and  most of the features are sparse, you can filter the features whose frequency of occurrence is less than "feature.filter_threshold", or provide a custom feature dictionary  via setting "model.dict_path" and "model.need_dict".
 - If your feature dim is huge, training will procede very slowly. You can use feature hash to reduce dim at a fraction of cost via setting "feature.feature_hash" except for gbdt. 
 - If the classification task has serious imbalanced dataset，you can set "data.y_sampling" in model configuration to reduce the number of samples. For instance, CTR prediction scene usually has large proportion of negative samples，“y_sampling : [[email protected]]” means that it reserves 10% of negative samples randomly, and those reserved negative samples enlarge 10x weight as compensation.
-- Complex models(fm, ffm, gbst) have more parameters to optimize,  so before training complex models, you can use linear model with l1 regularization to perform feature selection.
+- Complex models(fm, ffm, gbst) have more parameters to optimize,  so before training complex models, you can use linear model with l1 regularization to perform feature selection, set "model.dict_path" with linear model dict(saved in the same directory with model file).
 - In GBDT, If  the number features is large, use feature-parallel training.  If the number of data is large, use data-parallel training. In mose cases, when you use data-parallel training, the smaller the number of feature bins is, the faster the training process will be, meanwhile, the model will be less accurate.
 
 ### Model Accuracy