Skip to content

Commit

Permalink
docs: acknowledge original source for the tutorial (#120)
Browse files Browse the repository at this point in the history
  • Loading branch information
deepyaman authored Jun 20, 2024
1 parent 26b704a commit f5eff58
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 18 deletions.
3 changes: 3 additions & 0 deletions docs/tutorial/_acknowledgments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Acknowledgments

This tutorial is derived from the [tidymodels article of the same name](https://www.tidymodels.org/start/recipes/). The transformation logic is very similar, and much of the text is copied verbatim.
6 changes: 4 additions & 2 deletions docs/tutorial/pytorch.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pip install 'ibis-framework[duckdb,examples]' ibis-ml skorch torch

## The New York City flight data

Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:
Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This dataset contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:

```{python}
#| output: false
Expand Down Expand Up @@ -107,7 +107,7 @@ flight_data = (
flight_data
```

We can see that about 16% of the flights in this data set arrived more than 30 minutes late.
We can see that about 16% of the flights in this dataset arrived more than 30 minutes late.

```{python}
flight_data.arr_delay.value_counts().rename(n="arr_delay_count").mutate(
Expand Down Expand Up @@ -240,3 +240,5 @@ X_test = test_data.drop("arr_delay")
y_test = test_data.arr_delay
pipe.score(X_test, y_test)
```

{{< include _acknowledgments.md >}}
12 changes: 5 additions & 7 deletions docs/tutorial/scikit-learn.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pip install 'ibis-framework[duckdb,examples]' ibis-ml scikit-learn

## The New York City flight data

Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:
Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This dataset contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:

```{python}
#| output: false
Expand Down Expand Up @@ -81,8 +81,7 @@ weather
flight_data = (
flights.mutate(
# Convert the arrival delay to a factor
# By default, PyTorch expects the target to have a Long datatype
arr_delay=ibis.ifelse(flights.arr_delay >= 30, 1, 0).cast("int64"),
arr_delay=ibis.ifelse(flights.arr_delay >= 30, 1, 0),
# We will use the date (not date-time) in the recipe below
date=flights.time_hour.date(),
)
Expand All @@ -107,7 +106,7 @@ flight_data = (
flight_data
```

We can see that about 16% of the flights in this data set arrived more than 30 minutes late.
We can see that about 16% of the flights in this dataset arrived more than 30 minutes late.

```{python}
flight_data.arr_delay.value_counts().rename(n="arr_delay_count").mutate(
Expand Down Expand Up @@ -167,9 +166,6 @@ flights_rec = ml.Recipe(
ml.DropZeroVariance(ml.everything()),
ml.MutateAt("dep_time", ibis._.hour() * 60 + ibis._.minute()),
ml.MutateAt(ml.timestamp(), ibis._.epoch_seconds()),
# By default, PyTorch requires that the type of `X` is `np.float32`.
# https://discuss.pytorch.org/t/mat1-and-mat2-must-have-the-same-dtype-but-got-double-and-float/197555/2
ml.Cast(ml.numeric(), "float32"),
)
```

Expand Down Expand Up @@ -211,3 +207,5 @@ X_test = test_data.drop("arr_delay")
y_test = test_data.arr_delay
pipe.score(X_test, y_test)
```

{{< include _acknowledgments.md >}}
12 changes: 5 additions & 7 deletions docs/tutorial/xgboost.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pip install 'ibis-framework[duckdb,examples]' ibis-ml 'xgboost[scikit-learn]'

## The New York City flight data

Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:
Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This dataset contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:

```{python}
#| output: false
Expand Down Expand Up @@ -81,8 +81,7 @@ weather
flight_data = (
flights.mutate(
# Convert the arrival delay to a factor
# By default, PyTorch expects the target to have a Long datatype
arr_delay=ibis.ifelse(flights.arr_delay >= 30, 1, 0).cast("int64"),
arr_delay=ibis.ifelse(flights.arr_delay >= 30, 1, 0),
# We will use the date (not date-time) in the recipe below
date=flights.time_hour.date(),
)
Expand All @@ -107,7 +106,7 @@ flight_data = (
flight_data
```

We can see that about 16% of the flights in this data set arrived more than 30 minutes late.
We can see that about 16% of the flights in this dataset arrived more than 30 minutes late.

```{python}
flight_data.arr_delay.value_counts().rename(n="arr_delay_count").mutate(
Expand Down Expand Up @@ -167,9 +166,6 @@ flights_rec = ml.Recipe(
ml.DropZeroVariance(ml.everything()),
ml.MutateAt("dep_time", ibis._.hour() * 60 + ibis._.minute()),
ml.MutateAt(ml.timestamp(), ibis._.epoch_seconds()),
# By default, PyTorch requires that the type of `X` is `np.float32`.
# https://discuss.pytorch.org/t/mat1-and-mat2-must-have-the-same-dtype-but-got-double-and-float/197555/2
ml.Cast(ml.numeric(), "float32"),
)
```

Expand Down Expand Up @@ -211,3 +207,5 @@ X_test = test_data.drop("arr_delay")
y_test = test_data.arr_delay
pipe.score(X_test, y_test)
```

{{< include _acknowledgments.md >}}
14 changes: 12 additions & 2 deletions examples/Preprocess your data with recipes.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"source": [
"## The New York City flight data\n",
"\n",
"Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This data set contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:"
"Let's use the [nycflights13 data](https://github.com/hadley/nycflights13) to predict whether a plane arrives more than 30 minutes late. This dataset contains information on 325,819 flights departing near New York City in 2013. Let's start by loading the data and making a few changes to the variables:"
]
},
{
Expand Down Expand Up @@ -317,7 +317,7 @@
"id": "722b2213-3b84-4f03-9006-59bf72591613",
"metadata": {},
"source": [
"We can see that about 16% of the flights in this data set arrived more than 30 minutes late."
"We can see that about 16% of the flights in this dataset arrived more than 30 minutes late."
]
},
{
Expand Down Expand Up @@ -1221,6 +1221,16 @@
"y_test = test_data.arr_delay\n",
"pipe.score(X_test, y_test)"
]
},
{
"cell_type": "markdown",
"id": "cc21b842-b85c-4ed9-af03-1feace909172",
"metadata": {},
"source": [
"## Acknowledgments\n",
"\n",
"This tutorial is derived from the [tidymodels article of the same name](https://www.tidymodels.org/start/recipes/). The transformation logic is very similar, and much of the text is copied verbatim."
]
}
],
"metadata": {
Expand Down

0 comments on commit f5eff58

Please sign in to comment.