Skip to content

Commit

Permalink
Merge pull request #1113 from DilumAluthge/dpa/repo-transfer
Browse files Browse the repository at this point in the history
Updates now that MLJ.jl has been moved to the JuliaAI GitHub organization
  • Loading branch information
ablaom authored May 5, 2024
2 parents f0ddfd9 + 5e02d5a commit ec5af95
Show file tree
Hide file tree
Showing 9 changed files with 833 additions and 707 deletions.
347 changes: 237 additions & 110 deletions examples/lightning_tour/lightning_tour.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion examples/lightning_tour/lightning_tour.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# # Lightning tour of MLJ

# *For a more elementary introduction to MLJ, see [Getting
# Started](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/).*
# Started](https://juliaai.github.io/MLJ.jl/dev/getting_started/).*

# **Note.** Be sure this file has not been separated from the
# accompanying Project.toml and Manifest.toml files, which should not
Expand Down
24 changes: 12 additions & 12 deletions examples/telco/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"metadata": {},
"source": [
"An application of the [MLJ\n",
"toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the\n",
"toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the\n",
"Telco Customer Churn dataset, aimed at practicing data scientists\n",
"new to MLJ (Machine Learning in Julia). This tutorial does not\n",
"cover exploratory data analysis."
Expand All @@ -31,9 +31,9 @@
"metadata": {},
"source": [
"For other MLJ learning resources see the [Learning\n",
"MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)\n",
"MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)\n",
"section of the\n",
"[manual](https://alan-turing-institute.github.io/MLJ.jl/dev/)."
"[manual](https://juliaai.github.io/MLJ.jl/dev/)."
]
},
{
Expand Down Expand Up @@ -132,7 +132,7 @@
"the notebook, package instantiation and pre-compilation may take a\n",
"minute or so to complete. **This step will fail** if the [correct\n",
"Manifest.toml and Project.toml\n",
"files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)\n",
"files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)\n",
"are not in the same directory as this notebook."
]
},
Expand Down Expand Up @@ -203,7 +203,7 @@
"metadata": {},
"source": [
"This section is a condensed adaption of the [Getting Started\n",
"example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)\n",
"example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)\n",
"in the MLJ documentation."
]
},
Expand Down Expand Up @@ -448,7 +448,7 @@
"metadata": {},
"source": [
"A machine stores some other information enabling [warm\n",
"restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)\n",
"restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)\n",
"for some models, but we won't go into that here. You are allowed to\n",
"access and mutate the `model` parameter:"
]
Expand Down Expand Up @@ -1140,7 +1140,7 @@
"metadata": {},
"source": [
"For tools helping us to identify suitable models, see the [Model\n",
"Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)\n",
"Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)\n",
"section of the manual. We will build a gradient tree-boosting model,\n",
"a popular first choice for structured data like we have here. Model\n",
"code is contained in a third-party package called\n",
Expand Down Expand Up @@ -1379,7 +1379,7 @@
"source": [
"Note that the component models appear as hyper-parameters of\n",
"`pipe`. Pipelines are an implementation of a more general [model\n",
"composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)\n",
"composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)\n",
"interface provided by MLJ that advanced users may want to learn about."
]
},
Expand Down Expand Up @@ -2152,7 +2152,7 @@
"metadata": {},
"source": [
"We choose a `StratifiedCV` resampling strategy; the complete list of options is\n",
"[here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies)."
"[here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies)."
]
},
{
Expand Down Expand Up @@ -2393,7 +2393,7 @@
"metadata": {},
"source": [
"First, we select appropriate controls from [this\n",
"list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):"
"list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):"
]
},
{
Expand Down Expand Up @@ -2559,7 +2559,7 @@
"wanting to visualize the effect of changes to a *single*\n",
"hyper-parameter (which could be an iteration parameter). See, for\n",
"example, [this section of the\n",
"manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)\n",
"manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)\n",
"or [this\n",
"tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb)."
]
Expand Down Expand Up @@ -2689,7 +2689,7 @@
"metadata": {},
"source": [
"Next, we choose an optimization strategy from [this\n",
"list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):"
"list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):"
]
},
{
Expand Down
25 changes: 12 additions & 13 deletions examples/telco/notebook.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# # MLJ for Data Scientists in Two Hours

# An application of the [MLJ
# toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the
# toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the
# Telco Customer Churn dataset, aimed at practicing data scientists
# new to MLJ (Machine Learning in Julia). This tutorial does not
# cover exploratory data analysis.
Expand All @@ -10,9 +10,9 @@
# deep-learning).

# For other MLJ learning resources see the [Learning
# MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)
# MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)
# section of the
# [manual](https://alan-turing-institute.github.io/MLJ.jl/dev/).
# [manual](https://juliaai.github.io/MLJ.jl/dev/).

# **Topics covered**: Grabbing and preparing a dataset, basic
# fit/predict workflow, constructing a pipeline to include data
Expand Down Expand Up @@ -78,7 +78,7 @@
# the notebook, package instantiation and pre-compilation may take a
# minute or so to complete. **This step will fail** if the [correct
# Manifest.toml and Project.toml
# files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)
# files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)
# are not in the same directory as this notebook.

using Pkg
Expand All @@ -94,7 +94,7 @@ Pkg.instantiate()
# don't fully grasp should become clearer in the Telco study.

# This section is a condensed adaption of the [Getting Started
# example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
# example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
# in the MLJ documentation.

# First, using the built-in iris dataset, we load and inspect the features
Expand Down Expand Up @@ -137,7 +137,7 @@ fit!(mach, rows=train_rows)
fitted_params(mach)

# A machine stores some other information enabling [warm
# restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)
# restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)
# for some models, but we won't go into that here. You are allowed to
# access and mutate the `model` parameter:

Expand Down Expand Up @@ -292,7 +292,7 @@ const ytest, Xtest = unpack(df_test, ==(:Churn), !=(:customerID));
# > Introduces: `@load`, `input_scitype`, `target_scitype`

# For tools helping us to identify suitable models, see the [Model
# Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)
# Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)
# section of the manual. We will build a gradient tree-boosting model,
# a popular first choice for structured data like we have here. Model
# code is contained in a third-party package called
Expand Down Expand Up @@ -340,7 +340,7 @@ pipe = ContinuousEncoder() |> booster

# Note that the component models appear as hyperparameters of
# `pipe`. Pipelines are an implementation of a more general [model
# composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
# composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
# interface provided by MLJ that advanced users may want to learn about.

# From the above display, we see that component model hyperparameters
Expand Down Expand Up @@ -464,7 +464,7 @@ plot!([0, 1], [0, 1], linewidth=2, linestyle=:dash, color=:black)
# `acceleration=CPUThreads()` to parallelize the computation.

# We choose a `StratifiedCV` resampling strategy; the complete list of options is
# [here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
# [here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).

e_pipe = evaluate(pipe, X, y,
resampling=StratifiedCV(nfolds=6, rng=123),
Expand Down Expand Up @@ -535,7 +535,7 @@ pipe2 = ContinuousEncoder() |>
# [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl).

# First, we select appropriate controls from [this
# list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
# list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):

controls = [
Step(1), # to increment iteration parameter (`pipe.nrounds`)
Expand Down Expand Up @@ -580,7 +580,7 @@ fit!(mach_iterated_pipe);
# wanting to visualize the effect of changes to a *single*
# hyperparameter (which could be an iteration parameter). See, for
# example, [this section of the
# manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
# manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)
# or [this
# tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb).

Expand Down Expand Up @@ -618,7 +618,7 @@ r2 = range(iterated_pipe, p2, lower=2, upper=6)
# and `upper`.

# Next, we choose an optimization strategy from [this
# list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
# list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):

tuning = RandomSearch(rng=123)

Expand Down Expand Up @@ -755,4 +755,3 @@ ŷ_basic = predict(mach_basic, Xtest);
auc(ŷ_basic, ytest),
accuracy(mode.(ŷ_basic), ytest)
)

36 changes: 18 additions & 18 deletions examples/telco/notebook.pluto.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ md"# MLJ for Data Scientists in Two Hours"
# ╔═╡ 8a6670b8-96a8-4a5d-b795-033f6f2a0674
md"""
An application of the [MLJ
toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the
toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the
Telco Customer Churn dataset, aimed at practicing data scientists
new to MLJ (Machine Learning in Julia). This tutorial does not
cover exploratory data analysis.
Expand All @@ -25,9 +25,9 @@ deep-learning).
# ╔═╡ b04c4790-59e0-42a3-af2a-25235e544a31
md"""
For other MLJ learning resources see the [Learning
MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)
MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)
section of the
[manual](https://alan-turing-institute.github.io/MLJ.jl/dev/).
[manual](https://juliaai.github.io/MLJ.jl/dev/).
"""

# ╔═╡ 4eb8dff4-c23a-4b41-8af5-148d95ea2900
Expand Down Expand Up @@ -106,7 +106,7 @@ used to develop this tutorial. If this is your first time running
the notebook, package instantiation and pre-compilation may take a
minute or so to complete. **This step will fail** if the [correct
Manifest.toml and Project.toml
files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)
files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)
are not in the same directory as this notebook.
"""

Expand All @@ -131,7 +131,7 @@ don't fully grasp should become clearer in the Telco study.
# ╔═╡ 33ca287e-8cba-47d1-a0de-1721c1bc2df2
md"""
This section is a condensed adaption of the [Getting Started
example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
in the MLJ documentation.
"""

Expand Down Expand Up @@ -197,7 +197,7 @@ end
# ╔═╡ 0f978839-cc95-4c3a-8a29-32f11452654a
md"""
A machine stores some other information enabling [warm
restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)
restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)
for some models, but we won't go into that here. You are allowed to
access and mutate the `model` parameter:
"""
Expand Down Expand Up @@ -324,7 +324,7 @@ begin
return x
end
end

df0.TotalCharges = fix_blanks(df0.TotalCharges);
end

Expand Down Expand Up @@ -424,7 +424,7 @@ md"> Introduces: `@load`, `input_scitype`, `target_scitype`"
# ╔═╡ f97969e2-c15c-42cf-a6fa-eaf14df5d44b
md"""
For tools helping us to identify suitable models, see the [Model
Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)
Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)
section of the manual. We will build a gradient tree-boosting model,
a popular first choice for structured data like we have here. Model
code is contained in a third-party package called
Expand Down Expand Up @@ -497,7 +497,7 @@ pipe = ContinuousEncoder() |> booster
md"""
Note that the component models appear as hyperparameters of
`pipe`. Pipelines are an implementation of a more general [model
composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
interface provided by MLJ that advanced users may want to learn about.
"""

Expand Down Expand Up @@ -693,7 +693,7 @@ observation space, for a total of 18 folds) and set
# ╔═╡ 562887bb-b7fb-430f-b61c-748aec38e674
md"""
We choose a `StratifiedCV` resampling strategy; the complete list of options is
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
[here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
"""

# ╔═╡ f9be989e-2604-44c2-9727-ed822e4fd85d
Expand Down Expand Up @@ -734,7 +734,7 @@ begin
table = (measure=measure, measurement=measurement)
return DataFrames.DataFrame(table)
end

const confidence_intervals_basic_model = confidence_intervals(e_pipe)
end

Expand All @@ -753,7 +753,7 @@ with low feature importance, to speed up later optimization:
# ╔═╡ cdfe840d-4e87-467f-b582-dfcbeb05bcc5
begin
unimportant_features = filter(:importance => <(0.005), feature_importance_table).feature

pipe2 = ContinuousEncoder() |>
FeatureSelector(features=unimportant_features, ignore=true) |> booster
end
Expand Down Expand Up @@ -790,7 +790,7 @@ eg, the neural network models provided by
# ╔═╡ 8fc99d35-d8cc-455f-806e-1bc580dc349d
md"""
First, we select appropriate controls from [this
list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
"""

# ╔═╡ 29f33708-4a82-4acc-9703-288eae064e2a
Expand Down Expand Up @@ -857,7 +857,7 @@ here is the `learning_curve` function, which can be useful when
wanting to visualize the effect of changes to a *single*
hyperparameter (which could be an iteration parameter). See, for
example, [this section of the
manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)
or [this
tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb).
"""
Expand Down Expand Up @@ -898,7 +898,7 @@ show(iterated_pipe, 2)
begin
p1 = :(model.evo_tree_classifier.η)
p2 = :(model.evo_tree_classifier.max_depth)

r1 = range(iterated_pipe, p1, lower=-2, upper=-0.5, scale=x->10^x)
r2 = range(iterated_pipe, p2, lower=2, upper=6)
end
Expand All @@ -912,7 +912,7 @@ and `upper`.
# ╔═╡ af3023e6-920f-478d-af76-60dddeecbe6c
md"""
Next, we choose an optimization strategy from [this
list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
"""

# ╔═╡ 93c17a9b-b49c-4780-9074-c069a0e97d7e
Expand Down Expand Up @@ -1105,9 +1105,9 @@ md"For comparison, here's the performance for the basic pipeline model"
begin
mach_basic = machine(pipe, X, y)
fit!(mach_basic, verbosity=0)

ŷ_basic = predict(mach_basic, Xtest);

@info("Basic model measurements on test set:",
brier_loss(ŷ_basic, ytest) |> mean,
auc(ŷ_basic, ytest),
Expand Down
Loading

0 comments on commit ec5af95

Please sign in to comment.