Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing typo #43

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions .ipynb_checkpoints/classification-checkpoint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explore Categorical Variables to Classify Company Status"
"# Exploratory Analysis #6: Explore Categorical Variables to Classify Company Status"
]
},
{
Expand Down Expand Up @@ -1606,7 +1606,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember I picked the hyperparameters of this model based on the AUC score, so let's look at our model performance bu looking at the ROC curve and AUC statistic."
"Remember I picked the hyperparameters of this model based on the AUC score, so let's look at our model performance but looking at the ROC curve and AUC statistic."
]
},
{
Expand Down Expand Up @@ -1693,7 +1693,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"I spent a lot of time reading sklearn documentation for this project on models I was familiar with, and several time I just theme suggest reading the ExtraTreesClassifier. I had not heard of an Extra Trees model before, so I did some research and read some of [this](http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf) paper from 2006 introducing Extremeley Randomized Trees, or in sklearn speak ExtraTrees. \n",
"I spent a lot of time reading sklearn documentation for this project on models I was familiar with, and several times I came across something called an ExtraTreesClassifier. I had not heard of an Extra Trees model before, so I did some research and read some of [this](http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf) paper from 2006 introducing Extremeley Randomized Trees, or in sklearn speak ExtraTrees. \n",
"\n",
"Extremeley Randomized Trees are very similar to Random Forests, and sklearn sets up the user input up in a very similar way. Extremeley Randomized Trees are similar to Random Forests in that they take a random rubsample of features, but drops the idea of bootstraping many trees samples in order to find optimal cut off points for feature node splits, and instead randomizes the picks a decision boundary at random for these node splits. This is why they are \"extremeley\" random, and as far as the bias-variance tradeoff is concerned, the model's increase in randomness seeks to further lower the variance of a model. Based on what I read, the performance of Extremley Randomized Trees can be similar, if not usually better, than that of a Random Forest."
]
Expand All @@ -1705,6 +1705,15 @@
"I am going to use GridSearchCV to tune the hyperparameters of the model again, however, this time I am not tuning the number of trees, or n_estimators, of the model. There are several reasons for this, for one, in general as the number of trees increases, generally model accurately increases at the sake of increases runtime, and I this notebook already has several cells that can take a few minutes to run. In addition, as the number of estimators increases, so does generally the chance of overfitting, and I am purposely using ExtraTrees for its ability to reduce model variance. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 85,
Expand Down
13 changes: 11 additions & 2 deletions classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1606,7 +1606,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember I picked the hyperparameters of this model based on the AUC score, so let's look at our model performance bu looking at the ROC curve and AUC statistic."
"Remember I picked the hyperparameters of this model based on the AUC score, so let's look at our model performance but looking at the ROC curve and AUC statistic."
]
},
{
Expand Down Expand Up @@ -1693,7 +1693,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"I spent a lot of time reading sklearn documentation for this project on models I was familiar with, and several time I just theme suggest reading the ExtraTreesClassifier. I had not heard of an Extra Trees model before, so I did some research and read some of [this](http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf) paper from 2006 introducing Extremeley Randomized Trees, or in sklearn speak ExtraTrees. \n",
"I spent a lot of time reading sklearn documentation for this project on models I was familiar with, and several times I came across something called an ExtraTreesClassifier. I had not heard of an Extra Trees model before, so I did some research and read some of [this](http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf) paper from 2006 introducing Extremeley Randomized Trees, or in sklearn speak ExtraTrees. \n",
"\n",
"Extremeley Randomized Trees are very similar to Random Forests, and sklearn sets up the user input up in a very similar way. Extremeley Randomized Trees are similar to Random Forests in that they take a random rubsample of features, but drops the idea of bootstraping many trees samples in order to find optimal cut off points for feature node splits, and instead randomizes the picks a decision boundary at random for these node splits. This is why they are \"extremeley\" random, and as far as the bias-variance tradeoff is concerned, the model's increase in randomness seeks to further lower the variance of a model. Based on what I read, the performance of Extremley Randomized Trees can be similar, if not usually better, than that of a Random Forest."
]
Expand All @@ -1705,6 +1705,15 @@
"I am going to use GridSearchCV to tune the hyperparameters of the model again, however, this time I am not tuning the number of trees, or n_estimators, of the model. There are several reasons for this, for one, in general as the number of trees increases, generally model accurately increases at the sake of increases runtime, and I this notebook already has several cells that can take a few minutes to run. In addition, as the number of estimators increases, so does generally the chance of overfitting, and I am purposely using ExtraTrees for its ability to reduce model variance. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 85,
Expand Down