Skip to content

Commit

Permalink
Merge pull request #814 from alan-turing-institute/dev
Browse files Browse the repository at this point in the history
For a 0.16.7 release
  • Loading branch information
ablaom authored Jul 1, 2021
2 parents 1db4f14 + a23515a commit 974cd90
Show file tree
Hide file tree
Showing 33 changed files with 294 additions and 289 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*ipynb linguist-vendored
36 changes: 36 additions & 0 deletions BIBLIOGRAPHY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Citing MLJ

An overview of MLJ design:


[![DOI](https://joss.theoj.org/papers/10.21105/joss.02704/status.svg)](https://doi.org/10.21105/joss.02704)

```bibtex
@article{Blaom2020,
doi = {10.21105/joss.02704},
url = {https://doi.org/10.21105/joss.02704},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {55},
pages = {2704},
author = {Anthony D. Blaom and Franz Kiraly and Thibaut Lienart and Yiannis Simillides and Diego Arenas and Sebastian J. Vollmer},
title = {{MLJ}: A Julia package for composable machine learning},
journal = {Journal of Open Source Software}
}
```

An in-depth view of MLJ's model composition design:

[![arXiv](https://img.shields.io/badge/arXiv-2012.15505-<COLOR>.svg)](https://arxiv.org/abs/2012.15505)

```bibtex
@misc{blaom2020flexible,
title={Flexible model composition in machine learning and its implementation in {MLJ}},
author={Anthony D. Blaom and Sebastian J. Vollmer},
year={2020},
eprint={2012.15505},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
37 changes: 20 additions & 17 deletions ORGANIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,20 @@ its conventional use, are marked with a ⟂ symbol:
evaluating and tuning machine learning models. It pulls in most code
from other repositories described below. MLJ also hosts the [MLJ
manual](src/docs) which documents functionality across the
repositories, with the exception of ScientificTypes, and
repositories, with the exception of ScientificTypesBase, and
MLJScientific types which host their own documentation. (The MLJ
manual and MLJTutorials do provide overviews of scientific types.)

* [MLJModelInterface.jl](https://github.com/alan-turing-institute/MLJModelInterface.jl)
is a lightweight package imported by packages implementing
MLJ's interface for their machine learning models. It's *sole*
dependency is ScientificTypes, which is a tiny package with *no*
dependencies.
* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl)
is a lightweight package imported by packages implementing MLJ's
interface for their machine learning models. It's only dependencies
are ScientificTypesBase.jl (which depends only on the standard
library module `Random`) and
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
(which depends only on ScientificTypesBase.jl).

* (⟂)
[MLJBase.jl](https://github.com/alan-turing-institute/MLJBase.jl) is
[MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is
a large repository with two main purposes: (i) to give "dummy"
methods defined in MLJModelInterface their intended functionality
(which depends on third party packages, such as
Expand All @@ -35,17 +37,17 @@ its conventional use, are marked with a ⟂ symbol:
and (ii) provide functionality essential to the MLJ user that has
not been relegated to its own "satellite" repository for some
reason. See the [MLJBase.jl
readme](https://github.com/alan-turing-institute/MLJBase.jl) for a
readme](https://github.com/JuliaAI/MLJBase.jl) for a
detailed description of MLJBase's contents.

* [MLJModels.jl](https://github.com/alan-turing-institute/MLJModels.jl)
hosts the MLJ **registry**, which contains metadata on all the
* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl)
hosts the *MLJ model registry*, which contains metadata on all the
models the MLJ user can search and load from MLJ. Moreover, it
provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for
data pre-processing, such as `ContinuousEncoder` and `Standardizer`.

* [MLJTuning.jl](https://github.com/alan-turing-institute/MLJTuning.jl)
* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl)
provides MLJ's `TunedModel` wrapper for hyper-parameter
optimization, including the extendable API for tuning strategies,
and selected in-house implementations, such as `Grid` and
Expand All @@ -67,30 +69,31 @@ its conventional use, are marked with a ⟂ symbol:
exchange platform

* (⟂)
[MLJLinearModels.jl](https://github.com/alan-turing-institute/MLJLinearModels.jl)
[MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl)
is an experimental package for a wide range of julia-native penalized linear models
such as Lasso, Elastic-Net, Robust regression, LAD regression,
etc.

* [MLJFlux.jl](https://github.com/alan-turing-institute/MLJFlux.jl) an
experimental package for using **neural-network models**, built with
* [MLJFlux.jl](https://github.com/JuliaAI/MLJFlux.jl) an experimental
package for gradient-descent models, such as traditional
neural-networks, built with
[Flux.jl](https://github.com/FluxML/Flux.jl), in MLJ.

* (⟂)
[ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl)
[ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl)
is an ultra lightweight package providing "scientific" types,
such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
purpose is to formalize conventions around the scientific
interpretation of ordinary machine types, such as `Float32` and
`DataFrame`.

* (⟂)
[MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl)
[ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl)
articulates MLJ's own convention for the scientific interpretation of
data.

* (⟂)
[StatisticalTraits.jl](https://github.com/alan-turing-institute/StatisticalTraits.jl)
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
An ultra lightweight package defining fall-back implementations for
a collection of traits possessed by statistical objects.

Expand Down
6 changes: 3 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "0.16.6"
version = "0.16.7"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand All @@ -14,12 +14,12 @@ MLJEnsembles = "50ed68f4-41fd-4504-931a-ed422449fee0"
MLJIteration = "614be32b-d00c-4edb-bd02-1eb411ab5e55"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
MLJOpenML = "cbea4545-8c96-4583-ad3a-44078d60d369"
MLJScientificTypes = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
Expand All @@ -33,10 +33,10 @@ MLJEnsembles = "0.1"
MLJIteration = "0.3"
MLJModels = "0.14"
MLJOpenML = "1"
MLJScientificTypes = "0.4.1"
MLJSerialization = "1.1"
MLJTuning = "0.6"
ProgressMeter = "1.1"
ScientificTypes = "2"
StatsBase = "0.32,0.33"
Tables = "0.2,1.0"
julia = "1.3"
Expand Down
178 changes: 39 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,87 +12,56 @@
<img src="https://img.shields.io/badge/docs-stable-blue.svg"
alt="Documentation">
</a>
</a>
<!-- <a href="https://doi.org/10.5281/zenodo.3541506"> -->
<!-- <img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3541506.svg" -->
<!-- alt="Cite MLJ"> -->
<!-- </a> -->
<a href="https://mybinder.org/v2/gh/alan-turing-institute/MLJ.jl/master?filepath=binder%2FMLJ_demo.ipynb">
<img src="https://mybinder.org/badge_logo.svg"
alt="Binder">
</a>
<a href="https://doi.org/10.21105/joss.02704">
<img src="https://joss.theoj.org/papers/10.21105/joss.02704/status.svg"
alt="DOI">
<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-yelllow"
alt="bibtex">
</a>
<a href="BIBLIOGRAPHY.md">
<img src="https://img.shields.io/badge/cite-BibTeX-blue"
alt="bibtex">
</a>

</p>
</h2>


**New to MLJ? Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/)**.

**Wanting to integrate an existing machine learning model into the MLJ
framework? Start
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/)**.

The remaining information on this page will be of interest primarily
to developers interested in contributing to core packages in the MLJ
ecosystem, whose organization is described further below.

MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing over [150 machine
learning
tuning, evaluating, composing and comparing over [160 machine learning
models](https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/)
written in Julia and other languages. MLJ is released under the MIT
license and sponsored by the [Alan Turing
Institute](https://www.turing.ac.uk/).
written in Julia and other languages.

<br>
<p align="center">
<a href="#the-mlj-universe">MLJ Universe</a> &nbsp;&nbsp;
<a href="#known-issues">Known Issues</a> &nbsp;&nbsp;
<a href="#customizing-behavior">Customizing Behavior</a> &nbsp;&nbsp;
<a href="#citing-mlj">Citing MLJ</a>
</p>
</br>
**New to MLJ?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/).

**Integrating an existing machine learning model into the MLJ
framework?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/).

MLJ was initially created as a Tools,
Practices and Systems project at the [Alan Turing
Institute](https://www.turing.ac.uk/) in 2019. Current funding is
provided by a [New Zealand Strategic Science Investment
Fund](https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/).

MLJ been developed with the support of the following organizations:

<div align="center">
<img src="material/Turing_logo.png" width = 100/>
<img src="material/UoA_logo.png" width = 100/>
<img src="material/IQVIA_logo.png" width = 100/>
<img src="material/warwick.png" width = 100/>
<img src="material/julia.png" width = 100/>
</div>


### The MLJ Universe

The functionality of MLJ is distributed over a number of repositories
illustrated in the dependency chart below.

<br>
<p align="center">
<a href="CONTRIBUTING.md">Contributing</a> &nbsp;&nbsp;
<a href="ORGANIZATION.md">Code Organization</a> &nbsp;&nbsp;
<a href="ROADMAP.md">Road Map</a>
</br>
<br>
<a href="https://github.com/alan-turing-institute/MLJ">MLJ</a> &nbsp;&nbsp;
<a href="https://github.com/alan-turing-institute/MLJBase.jl">MLJBase</a> &nbsp;&nbsp;
<a href="https://github.com/alan-turing-institute/MLJModelInterface.jl">MLJModelInterface</a> &nbsp;&nbsp;
<a href="https://github.com/alan-turing-institute/MLJModels.jl">MLJModels</a> &nbsp;&nbsp;
<a href="https://github.com/alan-turing-institute/MLJTuning.jl">MLJTuning</a> &nbsp;&nbsp;
<a href="https://github.com/alan-turing-institute/MLJLinearModels.jl">MLJLinearModels</a> &nbsp;&nbsp;
<a href="https://github.com/FluxML/MLJFlux.jl">MLJFlux</a>
</br>
<br>
<a href="https://github.com/alan-turing-institute/MLJTutorials">MLJTutorials</a> &nbsp;&nbsp;
<a href="https://github.com/JuliaAI/MLJEnsembles.jl">MLJEnsembles</a> &nbsp;&nbsp;
<a href="https://github.com/JuliaAI/MLJIteration.jl">MLJIteration</a> &nbsp;&nbsp;
<a href="https://github.com/JuliaAI/MLJOpenML.jl">MLJOpenML</a> &nbsp;&nbsp;
<a href="https://github.com/JuliaAI/MLJSerialization.jl">MLJSerialization</a>
</br>
<br>
<a href="https://github.com/JuliaAI/MLJScientificTypes.jl">MLJScientificTypes</a> &nbsp;&nbsp;
<a href="https://github.com/JuliaAI/ScientificTypes.jl">ScientificTypes</a>
</p>
<p></p>
<br>
<p></p>
illustrated in the dependency chart below. These repositories live at
the [JuliaAI](https://github.com/JuliaAI) umbrella organization.

<div align="center">
<img src="material/MLJ_stack.svg" alt="Dependency Chart">
Expand All @@ -101,89 +70,20 @@ illustrated in the dependency chart below.
*Dependency chart for MLJ repositories. Repositories with dashed
connections do not currently exist but are planned/proposed.*


### Known Issues

#### ScikitLearn/MKL issue

For users of Mac OS using Julia 1.3 or higher, using ScikitLearn
models can lead to unexpected MKL errors due to an issue not related
to MLJ. See
[this Julia Discourse discussion](https://discourse.julialang.org/t/julia-1-3-1-4-on-macos-and-intel-mkl-error/36469/2)
and
[this issue](https://github.com/JuliaPackaging/BinaryBuilder.jl/issues/700)
for context.

A temporary workaround for this issue is to force the installation of
an older version of the `OpenSpecFun_jll` library. To install an
appropriate version, activate your MLJ environment and run

```julia
using Pkg;
Pkg.add(PackageSpec(url="https://github.com/tlienart/OpenSpecFun_jll.jl"))
```

#### Serialization for composite models with component models with custom serialization

See
[here](https://github.com/alan-turing-institute/MLJ.jl/issues/678). Workaround:
Instead of `XGBoost` models (the chief known case) use models from the
pure Julia package `EvoTrees`.


### Customizing behavior

To customize behaviour of MLJ you will need to clone the relevant
component package (e.g., MLJBase.jl) - or a fork thereof - and modify
your local julia environment to use your local clone in place of the
official release. For example, you might proceed something like this:

```julia
using Pkg
Pkg.activate("my_MLJ_enf", shared=true)
Pkg.develop("path/to/my/local/MLJBase")
```

To test your local clone, do

```julia
Pkg.test("MLJBase")
```

For more on package management, see https://julialang.github.io/Pkg.jl/v1/ .



### Citing MLJ


[![DOI](https://joss.theoj.org/papers/10.21105/joss.02704/status.svg)](https://doi.org/10.21105/joss.02704)

```bibtex
@article{Blaom2020,
doi = {10.21105/joss.02704},
url = {https://doi.org/10.21105/joss.02704},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {55},
pages = {2704},
author = {Anthony D. Blaom and Franz Kiraly and Thibaut Lienart and Yiannis Simillides and Diego Arenas and Sebastian J. Vollmer},
title = {{MLJ}: A Julia package for composable machine learning},
journal = {Journal of Open Source Software}
}
```
<br>
<p align="center">
<a href="CONTRIBUTING.md">Contributing</a> &nbsp;&nbsp;
<a href="ORGANIZATION.md">Code Organization</a> &nbsp;&nbsp;
<a href="ROADMAP.md">Road Map</a>
</br>

#### Contributors

*Core design*: A. Blaom, F. Kiraly, S. Vollmer

*Active maintainers*: A. Blaom, T. Lienart, S. Okon
*Lead contributor*: A. Blaom

*Active collaborators*: D. Arenas, D. Buchaca, J. Hoffimann, S. Okon, J. Samaroo, S. Vollmer
*Active maintainers*: A. Blaom, S. Okon, T. Lienart, D. Aluthge

*Past collaborators*: D. Aluthge, E. Barp, G. Bohner, M. K. Borregaard, V. Churavy, H. Devereux, M. Giordano, M. Innes, F. Kiraly, M. Nook, Z. Nugent, P. Oleśkiewicz, A. Shridar, Y. Simillides, A. Sengupta, A. Stechemesser.

#### License

MLJ is supported by the Alan Turing Institute and released under the MIT "Expat" License.
Loading

0 comments on commit 974cd90

Please sign in to comment.