Merge pull request #767 from alan-turing-institute/dev

For a 0.16.1 release
JuliaAI · Apr 1, 2021 · f416f06 · f416f06
2 parents 092522f + e258afb
commit f416f06
Show file tree

Hide file tree

Showing 16 changed files with 1,472 additions and 88 deletions.
diff --git a/ORGANIZATION.md b/ORGANIZATION.md
@@ -41,20 +41,28 @@ its conventional use, are marked with a ⟂ symbol:
   detailed description of MLJBase's contents.
 
 * [MLJModels.jl](https://github.com/alan-turing-institute/MLJModels.jl)
-  hosts the MLJ **registry**, which contains metadata on all the models
-  the MLJ user can search and load from MLJ. Moreover, it provides
-  the functionality for **loading model code** from MLJ on
-  demand. Finally, it furnishes **model interfaces** for a number of third
-  party model providers not implementing interfaces natively, such as
-  [DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl),
-  [ScikitLearn.jl](https://github.com/cstjean/ScikitLearn.jl) or
-  [XGBoost.jl](https://github.com/dmlc/XGBoost.jl). These packages are
-  *not* imported by MLJModels and are not dependencies from the
-  point-of-view of current package management.
+  hosts the MLJ **registry**, which contains metadata on all the
+  models the MLJ user can search and load from MLJ. Moreover, it
+  provides the functionality for **loading model code** from MLJ on
+  demand. Finally, it furnishes some commonly used transformers for
+  data pre-processing, such as `ContinuousEncoder` and `Standardizer`.
 
 * [MLJTuning.jl](https://github.com/alan-turing-institute/MLJTuning.jl)
-  provides MLJ's interface for hyper-parameter tuning strategies, and
-  selected implementations, such as grid search. 
+  provides MLJ's `TunedModel` wrapper for hyper-parameter
+  optimization, including the extendable API for tuning strategies,
+  and selected in-house implementations, such as `Grid` and
+  `RandomSearch`.
+
+* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl)
+  provides the `IteratedModel` wrapper for controlling iterative
+  models (snapshots, early stopping criteria, etc)
+
+* [MLJSerialization.jl](https://github.com/JuliaAI/MLJSerialization.jl)
+  provides functionality for saving MLJ machines to file
+
+* [MLJOpenML.jl](https://github.com/JuliaAI/MLJOpenML.jl) provides
+  integration with the [OpenML](https://www.openml.org) data science
+  exchange platform
 
 * (⟂)
   [MLJLinearModels.jl](https://github.com/alan-turing-institute/MLJLinearModels.jl)
@@ -68,7 +76,7 @@ its conventional use, are marked with a ⟂ symbol:
 
 * (⟂)
   [ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
-  is a tiny, zero-dependency package providing "scientific" types,
+  is an ultra lightweight package providing "scientific" types,
   such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
   purpose is to formalize conventions around the scientific
   interpretation of ordinary machine types, such as `Float32` and
@@ -78,6 +86,13 @@ its conventional use, are marked with a ⟂ symbol:
   [MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl)
   articulates MLJ's own convention for the scientific interpretation of
   data.
+
+* (⟂)
+  [StatisticalTraits.jl](https://github.com/alan-turing-institute/StatisticalTraits.jl)
+  An ultra lightweight package defining fall-back implementations for
+  a collection of traits possessed by statistical objects.
 
-* [MLJTutorials](https://github.com/alan-turing-institute/MLJTutorials)
-  collects tutorials on how to use MLJ. 
+* (⟂)
+  [DataScienceTutorials](https://github.com/alan-turing-institute/DataScienceTutorials.jl)
+  collects tutorials on how to use MLJ, which are deployed
+  [here](https://alan-turing-institute.github.io/DataScienceTutorials.jl/)
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "MLJ"
 uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
 authors = ["Anthony D. Blaom <[email protected]>"]
-version = "0.16.0"
+version = "0.16.1"
 
 [deps]
 CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
@@ -10,8 +10,11 @@ Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
+MLJIteration = "614be32b-d00c-4edb-bd02-1eb411ab5e55"
 MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
+MLJOpenML = "cbea4545-8c96-4583-ad3a-44078d60d369"
 MLJScientificTypes = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
+MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
 MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
 Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
 ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
@@ -24,19 +27,22 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
 CategoricalArrays = "^0.8,^0.9"
 ComputationalResources = "^0.3"
 Distributions = "^0.21,^0.22,^0.23, 0.24"
-MLJBase = "^0.17"
+MLJBase = "^0.18"
+MLJIteration = "^0.2"
 MLJModels = "^0.14"
+MLJOpenML = "^1"
 MLJScientificTypes = "^0.4.1"
+MLJSerialization = "^1.1"
 MLJTuning = "^0.6"
 ProgressMeter = "^1.1"
 StatsBase = "^0.32,^0.33"
 Tables = "^0.2,^1.0"
 julia = "^1.1"
 
 [extras]
-NearestNeighbors = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
+NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
 StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
 
 [targets]
-test = ["NearestNeighbors", "StableRNGs", "Test"]
+test = ["NearestNeighborModels", "StableRNGs", "Test"]
diff --git a/docs/make.jl b/docs/make.jl
@@ -37,14 +37,15 @@ pages = [
     "Tuning Models" => "tuning_models.md",
     "Learning Curves" => "learning_curves.md",
     "Transformers and other unsupervised models" => "transformers.md",
-    "Controlling Iterative Models" => "controlling_iterative_models.md",
     "Composing Models" => "composing_models.md",
+    "Controlling Iterative Models" => "controlling_iterative_models.md",
     "Homogeneous Ensembles" => "homogeneous_ensembles.md",
     "Generating Synthetic Data" => "generating_synthetic_data.md",
     "OpenML Integration" => "openml_integration.md",
     "Acceleration and Parallelism" => "acceleration_and_parallelism.md",
     "Simple User Defined Models" => "simple_user_defined_models.md",
-    "Quick-Start Guide to Adding Models" => "quick_start_guide_to_adding_models.md",
+    "Quick-Start Guide to Adding Models" =>
+               "quick_start_guide_to_adding_models.md",
     "Adding Models for General Use" => "adding_models_for_general_use.md",
     "Benchmarking" => "benchmarking.md",
     "Internals" => "internals.md",

diff --git a/docs/src/controlling_iterative_models.md b/docs/src/controlling_iterative_models.md
@@ -6,11 +6,13 @@ criterion, such as `k` consecutive deteriorations of the performance
 (see [`Patience`](@ref EarlyStopping.Patience) below). A more
 sophisticated kind of control might dynamically mutate parameters,
 such as a learning rate, in response to the behavior of these
-estimates. Some iterative model implementations enable some form of
-automated control, with the method and options for doing so varying
-from model to model. But sometimes it is up to the user to arrange
-control, which in the crudest case reduces to manually experimenting
-with the iteration parameter.
+estimates. 
+
+Some iterative model implementations enable some form of automated
+control, with the method and options for doing so varying from model
+to model. But sometimes it is up to the user to arrange control, which
+in the crudest case reduces to manually experimenting with the
+iteration parameter.
 
 In response to this ad hoc state of affairs, MLJ provides a uniform
 and feature-rich interface for controlling any iterative model that
@@ -34,7 +36,6 @@ iterations from the controlled training phase:
 
 ```@example gree
 using MLJ
-using MLJIteration
 
 X, y = make_moons(1000, rng=123)
 EvoTreeClassifier = @load EvoTreeClassifier verbosity=0
@@ -110,7 +111,7 @@ control                                              | description
 [`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("num iterations: $i"))` | Call `f(i)`, where `i` is total number of iterations                                    | yes
 [`WithLossDo`](@ref IterationControl.WithLossDo)`(f=x->@info("loss: $x"))`                | Call `f(loss)` where `loss` is the current loss                                         | yes
 [`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))`      | Call `f(v)` where `v` is the current batch of training losses                           | yes
-[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")`            | Save current machine to `machine1.jlso`, `machine2.jslo`, etc                           | yes
+[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")`            | * Save current machine to `machine1.jlso`, `machine2.jslo`, etc                           | yes
 
 > Table 1. Atomic controls. Some advanced options omitted.
 
@@ -119,6 +120,9 @@ control                                              | description
  "Early Stopping - But When?", in *Neural Networks: Tricks of the
  Trade*, ed. G. Orr, Springer.
 
+* If using `MLJIteration` without `MLJ`, then `Save` is not available
+  unless one is also using `MLJSerialization`.
+
 **Stopping option.** All the following controls trigger a stop if the
 provided function `f` returns `true` and `stop_if_true=true` is
 specified in the constructor: `Callback`, `WithNumberDo`,
@@ -177,7 +181,6 @@ since the previous best cross-validation loss reaches 20.
 
 ```@example gree
 using MLJ
-using MLJIteration
 
 X, y = @load_boston;
 RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
@@ -226,7 +229,7 @@ state being external to the control `struct`) and one for all
 subsequent control applications, which generally updates state
 also. There are two optional methods: `done`, for specifying
 conditions triggering a stop, and `takedown` for specifying actions to
-perform at the end of all training.
+perform at the end of controlled training.
 
 We summarize the training algorithm, as it relates to controls, after
 giving a simple example.
@@ -236,7 +239,7 @@ giving a simple example.
 
 Below we define a control, `IterateFromList(list)`, to train, on each
 application of the control, until the iteration count reaches the next
-value in a user-specified list, triggering a stop when the list is
+value in a user-specified `list`, triggering a stop when the `list` is
 exhausted. For example, to train on iteration counts on a log scale,
 one might use `IterateFromList([round(Int, 10^x) for x in range(1, 2,
 length=10)]`.
@@ -246,7 +249,7 @@ In the code, `wrapper` is an object that wraps the training machine
 
 ```julia
 
-import IterationControl # or MLJIteration.IterationControl
+import IterationControl # or MLJ.IterationControl
 
 struct IterateFromList
     list::Vector{<:Int} # list of iteration parameter values