Skip to content

Commit

Permalink
Merge pull request #1120 from JuliaAI/dev
Browse files Browse the repository at this point in the history
For a 0.20.4 release
  • Loading branch information
ablaom authored May 20, 2024
2 parents 1a1d10f + 6a57430 commit 61f12f9
Show file tree
Hide file tree
Showing 42 changed files with 451 additions and 9,621 deletions.
143 changes: 63 additions & 80 deletions ORGANIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,102 +8,85 @@ connections do not currently exist but are planned/proposed.*
Repositories of some possible interest outside of MLJ, or beyond
its conventional use, are marked with a ⟂ symbol:

* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the
general user's point-of-entry for choosing, loading, composing,
evaluating and tuning machine learning models. It pulls in most code
from other repositories described below. MLJ also hosts the [MLJ
manual](src/docs) which documents functionality across the
repositories, with the exception of ScientificTypesBase, and
MLJScientific types which host their own documentation. (The MLJ
manual and MLJTutorials do provide overviews of scientific types.)

* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl)
is a lightweight package imported by packages implementing MLJ's
interface for their machine learning models. It's only dependencies
are ScientificTypesBase.jl (which depends only on the standard
library module `Random`) and
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
(which depends only on ScientificTypesBase.jl).
* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the general user's point-of-entry for
choosing, loading, composing, evaluating and tuning machine learning models. It pulls in
most code from other repositories described below. MLJ also hosts the [MLJ
manual](src/docs) which documents functionality across the repositories, although some
pages point to documentation hosted locally by a particular package.


* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl) is a lightweight
package imported by packages implementing MLJ's interface for their machine learning
models. It's only dependencies are ScientificTypesBase.jl (which depends only on the
standard library module `Random`) and
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) (which depends
only on ScientificTypesBase.jl).

* (⟂)
[MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is
a large repository with two main purposes: (i) to give "dummy"
methods defined in MLJModelInterface their intended functionality
(which depends on third party packages, such as
* (⟂) [MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is a large repository with two
main purposes: (i) to give "dummy" methods defined in MLJModelInterface their intended
functionality (which depends on third party packages, such as
[Tables.jl](https://github.com/JuliaData/Tables.jl),
[Distributions.jl](https://github.com/JuliaStats/Distributions.jl)
and
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl));
and (ii) provide functionality essential to the MLJ user that has
not been relegated to its own "satellite" repository for some
reason. See the [MLJBase.jl
readme](https://github.com/JuliaAI/MLJBase.jl) for a
detailed description of MLJBase's contents.
[Distributions.jl](https://github.com/JuliaStats/Distributions.jl) and
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl)); and (ii)
provide functionality essential to the MLJ user that has not been relegated to its own
"satellite" repository for some reason. See the [MLJBase.jl
readme](https://github.com/JuliaAI/MLJBase.jl) for a detailed description of MLJBase's
contents.

* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provifes
* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provides
performance measures (metrics) such as losses and scores.

* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl)
hosts the *MLJ model registry*, which contains metadata on all the
models the MLJ user can search and load from MLJ. Moreover, it
provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for
data pre-processing, such as `ContinuousEncoder` and `Standardizer`.
* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl) hosts the *MLJ model registry*,
which contains metadata on all the models the MLJ user can search and load from
MLJ. Moreover, it provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for data pre-processing,
such as `ContinuousEncoder` and `Standardizer`.

* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl)
provides MLJ's `TunedModel` wrapper for hyper-parameter
optimization, including the extendable API for tuning strategies,
and selected in-house implementations, such as `Grid` and
`RandomSearch`.
* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl) provides MLJ's `TunedModel`
wrapper for hyper-parameter optimization, including the extendable API for tuning
strategies, and selected in-house implementations, such as `Grid` and `RandomSearch`.

* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl)
provides MLJ's `EnsembleModel` wrapper, for creating homogenous
model ensembles.
* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl) provides MLJ's
`EnsembleModel` wrapper, for creating homogeneous model ensembles.

* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl)
provides the `IteratedModel` wrapper for controlling iterative
models (snapshots, early stopping criteria, etc)
* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl) provides the
`IteratedModel` wrapper for controlling iterative models (snapshots, early stopping
criteria, etc)

* (⟂)
[OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides
integration with the [OpenML](https://www.openml.org) data science
exchange platform
* [MLJFlow.jl](https://github.com/JuliaAI/MLJFlow.jl) provides integration with the
platform-agnostic machine learning tracking tool [MLflow](https://mlflow.org).

* (⟂)
[MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl)
is an experimental package for a wide range of julia-native penalized linear models
such as Lasso, Elastic-Net, Robust regression, LAD regression,
etc.
* (⟂) [OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides integration with the
[OpenML](https://www.openml.org) data science exchange platform

* (⟂) [MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl) provides a wide
range of julia-native penalized linear models such as Lasso, Elastic-Net, Robust
regression, LAD regression, etc.

* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental
package for gradient-descent models, such as traditional
neural-networks, built with
* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental package for
gradient-descent models, such as traditional neural-networks, built with
[Flux.jl](https://github.com/FluxML/Flux.jl), in MLJ.

* (⟂)
[ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl)
is an ultra lightweight package providing "scientific" types,
such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
purpose is to formalize conventions around the scientific
interpretation of ordinary machine types, such as `Float32` and
* (⟂) [ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl) is an
ultra lightweight package providing "scientific" types, such as `Continuous`,
`OrderedFactor`, `Image` and `Table`. It's purpose is to formalize conventions around
the scientific interpretation of ordinary machine types, such as `Float32` and
`DataFrame`.

* (⟂)
[ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl)
articulates the particular convention for the scientific interpretation of
data that MLJ adopts
* (⟂) [ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl) articulates the
particular convention for the scientific interpretation of data that MLJ adopts

* (⟂)
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
An ultra lightweight package defining fall-back implementations for
a collection of traits possessed by statistical objects, principally
models and measures (metrics).
* (⟂) [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) An ultra
lightweight package defining fall-back implementations for a collection of traits
possessed by statistical objects, principally models and measures (metrics).

* (⟂)
[DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl)
collects tutorials on how to use MLJ, which are deployed
* (⟂) [DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl) collects
tutorials on how to use MLJ, which are deployed
[here](https://JuliaAI.github.io/DataScienceTutorials.jl/)

* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl)
provides tests for implementations of the MLJ model interface, and
integration tests for the entire MLJ ecosystem
* [MLJTestInterface](https://github.com/JuliaAI/MLJTestInterface.jl) provides tests for
implementations of the MLJ model interface

* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl) provides tests
for the entire MLJ ecosystem. (Called when you run `ENV["MLJ_TEST_INTEGRATION"]="true";
Pkg.test("MLJ")`.
7 changes: 4 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "0.20.3"
version = "0.20.4"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down Expand Up @@ -34,7 +34,7 @@ Distributions = "0.21,0.22,0.23, 0.24, 0.25"
MLJBalancing = "0.1"
MLJBase = "1"
MLJEnsembles = "0.4"
MLJFlow = "0.4"
MLJFlow = "0.4.2"
MLJIteration = "0.6"
MLJModels = "0.16"
MLJTestIntegration = "0.5.0"
Expand Down Expand Up @@ -84,8 +84,9 @@ PartitionedLS = "19f41c5e-8610-11e9-2f2a-0d67e7c5027f"
SIRUS = "cdeec39e-fb35-4959-aadb-a1dd5dede958"
SelfOrganizingMaps = "ba4b7379-301a-4be0-bee6-171e4e152787"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Test"]
test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Suppressor","Test"]
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,15 @@ framework?** Start [here](https://JuliaAI.github.io/MLJ.jl/dev/quick_start_guide

MLJ was initially created as a Tools, Practices and Systems project at
the [Alan Turing Institute](https://www.turing.ac.uk/)
in 2019. Current funding is provided by a [New Zealand Strategic
in 2019. Funding has also been provided by a [New Zealand Strategic
Science Investment
Fund](https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/)
awarded to the University of Auckland.

MLJ has been developed with the support of the following organizations:

<div align="center">
<img src="material/DFKI.png" width = 100/>
<img src="material/Turing_logo.png" width = 100/>
<img src="material/UoA_logo.png" width = 100/>
<img src="material/IQVIA_logo.png" width = 100/>
Expand Down
24 changes: 11 additions & 13 deletions docs/src/about_mlj.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# About MLJ

MLJ (Machine Learning in Julia) is a toolbox written in Julia
MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing [over 180 machine learning
models](@ref model_list) written in Julia and other languages. In
Expand All @@ -22,8 +22,7 @@ The first code snippet below creates a new Julia environment
[Installation](@ref) for more on creating a Julia environment for use
with MLJ.

Julia installation instructions are
[here](https://julialang.org/downloads/).
Julia installation instructions are [here](https://julialang.org/downloads/).

```julia
using Pkg
Expand All @@ -44,7 +43,7 @@ Loading and instantiating a gradient tree-boosting model:
using MLJ
Booster = @load EvoTreeRegressor # loads code defining a model type
booster = Booster(max_depth=2) # specify hyper-parameter at construction
booster.nrounds=50 # or mutate afterwards
booster.nrounds = 50 # or mutate afterwards
```

This model is an example of an iterative model. As it stands, the
Expand Down Expand Up @@ -92,7 +91,7 @@ it "self-tuning":
```julia
self_tuning_pipe = TunedModel(model=pipe,
tuning=RandomSearch(),
ranges = max_depth_range,
ranges=max_depth_range,
resampling=CV(nfolds=3, rng=456),
measure=l1,
acceleration=CPUThreads(),
Expand All @@ -105,12 +104,12 @@ Loading a selection of features and labels from the Ames
House Price dataset:

```julia
X, y = @load_reduced_ames;
X, y = @load_reduced_ames
```
Evaluating the "self-tuning" pipeline model's performance using 5-fold
cross-validation (implies multiple layers of nested resampling):

```julia
```julia-repl
julia> evaluate(self_tuning_pipe, X, y,
measures=[l1, l2],
resampling=CV(nfolds=5, rng=123),
Expand Down Expand Up @@ -155,8 +154,7 @@ Extract:

* Consistent interface to handle probabilistic predictions.

* Extensible [tuning
interface](https://github.com/JuliaAI/MLJTuning.jl),
* Extensible [tuning interface](https://github.com/JuliaAI/MLJTuning.jl),
to support a growing number of optimization strategies, and designed
to play well with model composition.

Expand Down Expand Up @@ -229,19 +227,19 @@ installed in a new
[environment](https://julialang.github.io/Pkg.jl/v1/environments/) to
avoid package conflicts. You can do this with

```julia
```julia-repl
julia> using Pkg; Pkg.activate("my_MLJ_env", shared=true)
```

Installing MLJ is also done with the package manager:

```julia
```julia-repl
julia> Pkg.add("MLJ")
```

**Optional:** To test your installation, run

```julia
```julia-repl
julia> Pkg.test("MLJ")
```

Expand All @@ -252,7 +250,7 @@ environment to make model-specific code available. This
happens automatically when you use MLJ's interactive load command
`@iload`, as in

```julia
```julia-repl
julia> Tree = @iload DecisionTreeClassifier # load type
julia> tree = Tree() # instance
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/adding_models_for_general_use.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ suitable for addition to the MLJ Model Registry, consult the [MLJModelInterface.
documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).

For quick-and-dirty user-defined models see [Simple User Defined
Models](simple_user_defined_models.md).
Models](simple_user_defined_models.md).
Empty file modified docs/src/api.md
100755 → 100644
Empty file.
Loading

0 comments on commit 61f12f9

Please sign in to comment.