Skip to content

Commit

Permalink
docs(examples): add intro and install instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
deepyaman committed May 30, 2024
1 parent b0e9ab0 commit 54eecde
Showing 1 changed file with 27 additions and 11 deletions.
38 changes: 27 additions & 11 deletions examples/Preprocess your data with recipes.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,33 @@
"source": [
"## Introduction\n",
"\n",
"..."
"In this article, we'll explore [`Recipe`](/reference/core.html#ibis_ml.Recipe)s, which are designed to help you preprocess your data before training your model. Recipes are built as a series of preprocessing steps, such as:\n",
"\n",
"- converting qualitative predictors to indicator variables (also known as dummy variables),\n",
"\n",
"- transforming data to be on a different scale (e.g., taking the logarithm of a variable),\n",
"\n",
"- transforming whole groups of predictors together,\n",
"\n",
"- extracting key features from raw variables (e.g., getting the day of the week out of a date variable),\n",
"\n",
"and so on. If you are familiar with [scikit-learn's dataset transformations](https://scikit-learn.org/stable/data_transforms.html), a lot of this might sound familiar and like what a transformer already does. Recipes can be used to do many of the same things, but they can scale your workloads on any [Ibis](https://ibis-project.org/)-supported backend. This article shows how to use recipes for modeling.\n",
"\n",
"To use code in this article, you will need to install the following packages: Ibis, IbisML, and your modeling library.\n",
"\n",
"```bash\n",
"pip install 'ibis-framework[duckdb,examples]' ibis-ml [scikit-learn | 'xgboost[scikit-learn]' | skorch torch]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "4377a22e-f751-4a12-b07a-c6b31e3c48d9",
"metadata": {},
"source": [
"## The New York City flight data\n",
"\n",
"Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:"
]
},
{
Expand Down Expand Up @@ -203,16 +229,6 @@
"weather"
]
},
{
"cell_type": "markdown",
"id": "4377a22e-f751-4a12-b07a-c6b31e3c48d9",
"metadata": {},
"source": [
"## The New York City flight data\n",
"\n",
"Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:"
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand Down

0 comments on commit 54eecde

Please sign in to comment.