Skip to content

Commit

Permalink
Merge pull request dmlc#288 from pommedeterresautee/master
Browse files Browse the repository at this point in the history
small changes in RMarkdown
  • Loading branch information
hetong007 committed May 5, 2015
2 parents 937a75b + 8aa739d commit 3b46977
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions demo/kaggle-otto/understandingXGBoostModel.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ train[1:6, ncol(train), with = F]
nameLastCol <- names(train)[ncol(train)]
```

The class are provided as character string in the `ncol(train)`th column called `nameLastCol`. As you may know, **XGBoost** doesn't support anything else than numbers. So we will convert classes to integers. Moreover, according to the documentation, it should start at 0.
The class are provided as character string in the **`r ncol(train)`**th column called **`r nameLastCol`**. As you may know, **XGBoost** doesn't support anything else than numbers. So we will convert classes to integers. Moreover, according to the documentation, it should start at 0.

For that purpose, we will:

Expand Down Expand Up @@ -138,7 +138,7 @@ Model understanding
Feature importance
------------------

So far, we have built a model made of `nround` trees.
So far, we have built a model made of **`r nround`** trees.

To build a tree, the dataset is divided recursively several times. At the end of the process, you get groups of observations (here, these observations are properties regarding **OTTO** products).

Expand Down Expand Up @@ -212,3 +212,12 @@ We are just displaying the first two trees here.

On simple models the first two trees may be enough. Here, it might not be the case. We can see from the size of the trees that the intersaction between features is complicated.
Besides, XGBoost generate `k` trees at each round for a `k`-classification problem. Therefore the two trees illustrated here are trying to classify data into different classes.

Going deeper
============

There are two documents you may want to check to go deeper:

* [xgboostPresentation.Rmd](https://github.com/dmlc/xgboost/blob/master/R-package/vignettes/xgboostPresentation.Rmd): general presentation
* [discoverYourData.Rmd](https://github.com/dmlc/xgboost/blob/master/R-package/vignettes/discoverYourData.Rmd): explaining feature analysus
* [Feature Importance Analysis with XGBoost in Tax audit](http://fr.slideshare.net/MichaelBENESTY/feature-importance-analysis-with-xgboost-in-tax-audit): use case

0 comments on commit 3b46977

Please sign in to comment.