Skip to content

Commit

Permalink
Merge pull request #767 from alan-turing-institute/dev
Browse files Browse the repository at this point in the history
For a 0.16.1 release
  • Loading branch information
ablaom authored Apr 1, 2021
2 parents 092522f + e258afb commit f416f06
Show file tree
Hide file tree
Showing 16 changed files with 1,472 additions and 88 deletions.
45 changes: 30 additions & 15 deletions ORGANIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,20 +41,28 @@ its conventional use, are marked with a ⟂ symbol:
detailed description of MLJBase's contents.

* [MLJModels.jl](https://github.com/alan-turing-institute/MLJModels.jl)
hosts the MLJ **registry**, which contains metadata on all the models
the MLJ user can search and load from MLJ. Moreover, it provides
the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes **model interfaces** for a number of third
party model providers not implementing interfaces natively, such as
[DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl),
[ScikitLearn.jl](https://github.com/cstjean/ScikitLearn.jl) or
[XGBoost.jl](https://github.com/dmlc/XGBoost.jl). These packages are
*not* imported by MLJModels and are not dependencies from the
point-of-view of current package management.
hosts the MLJ **registry**, which contains metadata on all the
models the MLJ user can search and load from MLJ. Moreover, it
provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for
data pre-processing, such as `ContinuousEncoder` and `Standardizer`.

* [MLJTuning.jl](https://github.com/alan-turing-institute/MLJTuning.jl)
provides MLJ's interface for hyper-parameter tuning strategies, and
selected implementations, such as grid search.
provides MLJ's `TunedModel` wrapper for hyper-parameter
optimization, including the extendable API for tuning strategies,
and selected in-house implementations, such as `Grid` and
`RandomSearch`.

* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl)
provides the `IteratedModel` wrapper for controlling iterative
models (snapshots, early stopping criteria, etc)

* [MLJSerialization.jl](https://github.com/JuliaAI/MLJSerialization.jl)
provides functionality for saving MLJ machines to file

* [MLJOpenML.jl](https://github.com/JuliaAI/MLJOpenML.jl) provides
integration with the [OpenML](https://www.openml.org) data science
exchange platform

* (⟂)
[MLJLinearModels.jl](https://github.com/alan-turing-institute/MLJLinearModels.jl)
Expand All @@ -68,7 +76,7 @@ its conventional use, are marked with a ⟂ symbol:

* (⟂)
[ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
is a tiny, zero-dependency package providing "scientific" types,
is an ultra lightweight package providing "scientific" types,
such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
purpose is to formalize conventions around the scientific
interpretation of ordinary machine types, such as `Float32` and
Expand All @@ -78,6 +86,13 @@ its conventional use, are marked with a ⟂ symbol:
[MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl)
articulates MLJ's own convention for the scientific interpretation of
data.

* (⟂)
[StatisticalTraits.jl](https://github.com/alan-turing-institute/StatisticalTraits.jl)
An ultra lightweight package defining fall-back implementations for
a collection of traits possessed by statistical objects.

* [MLJTutorials](https://github.com/alan-turing-institute/MLJTutorials)
collects tutorials on how to use MLJ.
* (⟂)
[DataScienceTutorials](https://github.com/alan-turing-institute/DataScienceTutorials.jl)
collects tutorials on how to use MLJ, which are deployed
[here](https://alan-turing-institute.github.io/DataScienceTutorials.jl/)
14 changes: 10 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "0.16.0"
version = "0.16.1"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand All @@ -10,8 +10,11 @@ Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJIteration = "614be32b-d00c-4edb-bd02-1eb411ab5e55"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
MLJOpenML = "cbea4545-8c96-4583-ad3a-44078d60d369"
MLJScientificTypes = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Expand All @@ -24,19 +27,22 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
CategoricalArrays = "^0.8,^0.9"
ComputationalResources = "^0.3"
Distributions = "^0.21,^0.22,^0.23, 0.24"
MLJBase = "^0.17"
MLJBase = "^0.18"
MLJIteration = "^0.2"
MLJModels = "^0.14"
MLJOpenML = "^1"
MLJScientificTypes = "^0.4.1"
MLJSerialization = "^1.1"
MLJTuning = "^0.6"
ProgressMeter = "^1.1"
StatsBase = "^0.32,^0.33"
Tables = "^0.2,^1.0"
julia = "^1.1"

[extras]
NearestNeighbors = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["NearestNeighbors", "StableRNGs", "Test"]
test = ["NearestNeighborModels", "StableRNGs", "Test"]
5 changes: 3 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,15 @@ pages = [
"Tuning Models" => "tuning_models.md",
"Learning Curves" => "learning_curves.md",
"Transformers and other unsupervised models" => "transformers.md",
"Controlling Iterative Models" => "controlling_iterative_models.md",
"Composing Models" => "composing_models.md",
"Controlling Iterative Models" => "controlling_iterative_models.md",
"Homogeneous Ensembles" => "homogeneous_ensembles.md",
"Generating Synthetic Data" => "generating_synthetic_data.md",
"OpenML Integration" => "openml_integration.md",
"Acceleration and Parallelism" => "acceleration_and_parallelism.md",
"Simple User Defined Models" => "simple_user_defined_models.md",
"Quick-Start Guide to Adding Models" => "quick_start_guide_to_adding_models.md",
"Quick-Start Guide to Adding Models" =>
"quick_start_guide_to_adding_models.md",
"Adding Models for General Use" => "adding_models_for_general_use.md",
"Benchmarking" => "benchmarking.md",
"Internals" => "internals.md",
Expand Down
25 changes: 14 additions & 11 deletions docs/src/controlling_iterative_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ criterion, such as `k` consecutive deteriorations of the performance
(see [`Patience`](@ref EarlyStopping.Patience) below). A more
sophisticated kind of control might dynamically mutate parameters,
such as a learning rate, in response to the behavior of these
estimates. Some iterative model implementations enable some form of
automated control, with the method and options for doing so varying
from model to model. But sometimes it is up to the user to arrange
control, which in the crudest case reduces to manually experimenting
with the iteration parameter.
estimates.

Some iterative model implementations enable some form of automated
control, with the method and options for doing so varying from model
to model. But sometimes it is up to the user to arrange control, which
in the crudest case reduces to manually experimenting with the
iteration parameter.

In response to this ad hoc state of affairs, MLJ provides a uniform
and feature-rich interface for controlling any iterative model that
Expand All @@ -34,7 +36,6 @@ iterations from the controlled training phase:

```@example gree
using MLJ
using MLJIteration
X, y = make_moons(1000, rng=123)
EvoTreeClassifier = @load EvoTreeClassifier verbosity=0
Expand Down Expand Up @@ -110,7 +111,7 @@ control | description
[`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("num iterations: $i"))` | Call `f(i)`, where `i` is total number of iterations | yes
[`WithLossDo`](@ref IterationControl.WithLossDo)`(f=x->@info("loss: $x"))` | Call `f(loss)` where `loss` is the current loss | yes
[`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))` | Call `f(v)` where `v` is the current batch of training losses | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")` | Save current machine to `machine1.jlso`, `machine2.jslo`, etc | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")` | * Save current machine to `machine1.jlso`, `machine2.jslo`, etc | yes

> Table 1. Atomic controls. Some advanced options omitted.
Expand All @@ -119,6 +120,9 @@ control | description
"Early Stopping - But When?", in *Neural Networks: Tricks of the
Trade*, ed. G. Orr, Springer.

* If using `MLJIteration` without `MLJ`, then `Save` is not available
unless one is also using `MLJSerialization`.

**Stopping option.** All the following controls trigger a stop if the
provided function `f` returns `true` and `stop_if_true=true` is
specified in the constructor: `Callback`, `WithNumberDo`,
Expand Down Expand Up @@ -177,7 +181,6 @@ since the previous best cross-validation loss reaches 20.

```@example gree
using MLJ
using MLJIteration
X, y = @load_boston;
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
Expand Down Expand Up @@ -226,7 +229,7 @@ state being external to the control `struct`) and one for all
subsequent control applications, which generally updates state
also. There are two optional methods: `done`, for specifying
conditions triggering a stop, and `takedown` for specifying actions to
perform at the end of all training.
perform at the end of controlled training.

We summarize the training algorithm, as it relates to controls, after
giving a simple example.
Expand All @@ -236,7 +239,7 @@ giving a simple example.

Below we define a control, `IterateFromList(list)`, to train, on each
application of the control, until the iteration count reaches the next
value in a user-specified list, triggering a stop when the list is
value in a user-specified `list`, triggering a stop when the `list` is
exhausted. For example, to train on iteration counts on a log scale,
one might use `IterateFromList([round(Int, 10^x) for x in range(1, 2,
length=10)]`.
Expand All @@ -246,7 +249,7 @@ In the code, `wrapper` is an object that wraps the training machine

```julia

import IterationControl # or MLJIteration.IterationControl
import IterationControl # or MLJ.IterationControl

struct IterateFromList
list::Vector{<:Int} # list of iteration parameter values
Expand Down
Loading

0 comments on commit f416f06

Please sign in to comment.