-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #29 from gamma-opt/icnns
Icnns
- Loading branch information
Showing
34 changed files
with
1,325 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Input convex neural networks (ICNNs) | ||
|
||
In input convex neural networks, the neuron weights are constrained to be nonnegative and weighted skip connections are added from the input layer to each layer. More details can be found in [Amos et al. (2017)](literature.md). These changes make the network output convex with respect to the inputs. A convex piecewise linear function can be formulated as a linear programming problem (LP) which is much more computationally efficient than the MILP formulations of "regular" neural networks. This is the reason for implementing ICNN functionality into this package. ICNNs are a viable option when the data or function being modeled is approximately convex and/or some prediction accuracy must be sacrificed for computational performance. | ||
|
||
## Training | ||
|
||
The [Flux.jl](https://fluxml.ai/Flux.jl/stable/) interface doesn't allow for simple implementation of ICNNs as far as we know. However, the ICNN models can easily be implemented and trained using Tensorflow with Python, for example. The model parameters can then be exported as a JSON file and imported into Julia to create the LP formulation. An example on how to build the ICNN, train it, and export the parameters using the high-level Tensorflow interface can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl). The requirements for the JSON file structure are listed in the function description of [`ICNN_incorporate!`](@ref). | ||
|
||
## Formulation | ||
|
||
The interface for formulating ICNNs as LPs has been designed to make incorporating them into a larger optimization problem, e.g. as surrogates, as easy as possible. The `JuMP` model is first built and the variables, constraints, and objective are added. The function [`ICNN_incorporate!`](@ref) takes as arguments the `JuMP` model, the relative filepath of the JSON file containing the ICNN parameters, the output variable, and finally the input variables (in order). Currently only models with one output variable are supported. | ||
|
||
Build the model. | ||
|
||
```julia | ||
jump_model = Model(Gurobi.Optimizer) | ||
|
||
@variable(jump_model, -1 <= x <= 1) | ||
@variable(jump_model, -1 <= y <= 1) | ||
@variable(jump_model, z) | ||
|
||
@constraint(jump_model, y >= 1-x) | ||
|
||
@objective(jump_model, Min, x+y) | ||
``` | ||
|
||
Include input convex neural network as a part of the larger optimization problem. | ||
The JSON file containing the ICNN parameters is called "model_weights.json". | ||
The variables `x` and `y` are linked to the variable `z` by the ICNN. | ||
|
||
```julia | ||
ICNN_incorporate!(jump_model, "model_weights.json", z, x, y) | ||
|
||
optimize!(jump_model) | ||
solution_summary(jump_model) | ||
|
||
# see optimal solution | ||
value(x) | ||
value(y) | ||
value(z) | ||
``` | ||
|
||
The problem is very fast to solve since no binary variables are added. | ||
|
||
If one wants to use ICNNs "by themselves" for global optimization, for example, the same steps can be followed but without adding any extra variables or constraints. | ||
|
||
Add input and output variables of the ICNN into the `JuMP` model and minimize the ICNN output as the objective. | ||
|
||
```julia | ||
jump_model = Model(Gurobi.Optimizer) | ||
|
||
@variable(jump_model, -1 <= x <= 1) | ||
@variable(jump_model, -1 <= y <= 1) | ||
@variable(jump_model, z) | ||
|
||
@objective(jump_model, Min, z) | ||
|
||
ICNN_incorporate!(jump_model, "model_weights.json", z, x, y) | ||
|
||
optimize!(jump_model) | ||
solution_summary(jump_model) | ||
``` | ||
|
||
## Considerations | ||
|
||
### Feasibility | ||
|
||
In an optimization problem where an ICNN has been incorporated as a surrogate, infeasibility might not be able to be detected. This is because the ICNN is formulated as an epigraph with a penalty added to the objective function minimizing the ICNN output. However, this penalty term doesn't prevent the solver from finding solutions that are "above" the ICNN function hypersurface. Therefore, the optimization problem structure should be studied carefully to reason whether it is possible to come up with these pseudo-feasible solutions. If studying the problem structure is too complex, the optimal solution returned by the solver can be checked using the [`check_ICNN`](@ref) function. If the check fails, the optimization problem is likely infeasible but this has not been proved so the problem should be investigated more thoroughly. | ||
|
||
### Penalty domination | ||
|
||
The second important consideration is the objective function of the optimization problem where the ICNN has been incorporated as a surrogate. As stated in the previous paragrah, the ICNN LP formulation relies on a penalty term that is added to the objective function. Thus, if the objective function already includes a term which is linked to or is itself the ICNN output variable, the penalty term added in the [`ICNN_incorporate!`](@ref) will not have the desired effect of guaranteeing that the ICNN is satisfied. We have not figured out a way around this issue, so if the penalty term is "dominated" in the objective, ICNN surrogates are probably not suitable for the given optimization problem. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,38 @@ | ||
# Gogeta.jl | ||
|
||
[Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate trained machine learning models as mathematical optimization problems. | ||
[Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate trained machine learning (ML) models as mathematical optimization problems. This approach can be utilized in the global optimization of the ML models, or when using the ML models as surrogates in larger optimization problems. | ||
|
||
Currently supported models are `Flux.Chain` ReLU-activated neural networks (dense and convolutional) and `EvoTrees` tree ensemble models. | ||
Currently supported models are $ReLU$-activated neural networks (dense and convolutional), input convex neural networks (ICNNs), and tree ensemble models. | ||
|
||
## Installation | ||
|
||
The latest official version can be installed from the Julia General repository. | ||
|
||
```julia-repl | ||
julia> Pkg.add("Gogeta") | ||
``` | ||
|
||
Some experimental features may have been implemented on the GitHub page. The latest development version can be accessed by adding the GitHub HTTPS and the branch name as follows: | ||
|
||
```julia-repl | ||
julia> Pkg.add("https://github.com/gamma-opt/Gogeta.jl.git#<branch-name>") | ||
``` | ||
|
||
Replace `<branch-name>` with the name of the branch you want to add. | ||
|
||
!!! warning | ||
|
||
The code on some of the Git branches might be experimental and not work as expected. | ||
|
||
## How can this package be used? | ||
|
||
Formulating trained machine learning (ML) models as mixed-integer programming (MIP) problems opens up multiple possibilities. Firstly, it allows for global optimization - finding the input that provably maximizes or minimizes the ML model output. Secondly, changing the objective function in the MIP formulation and/or adding additional constraints makes it possible to solve problems related to the ML model, such as finding adversarial inputs. Lastly, the MIP formulation of a ML model can be included into a larger optimization problem. This is useful in surrogate contexts where an ML model can be trained to approximate a complicated function that itself cannot be used in an optimization problem. | ||
Formulating trained machine learning (ML) models as mixed-integer linear programming (MILP) problems opens up multiple possibilities. Firstly, it allows for global optimization - finding the input that provably maximizes or minimizes the ML model output. Secondly, changing the objective function in the MILP formulation and/or adding additional constraints makes it possible to solve problems related to the ML model, such as finding adversarial inputs. Lastly, the MILP formulation of a ML model can be incorporated into a larger optimization problem. This is useful in a surrogate modeling context where an ML model can be trained to approximate a complex function that itself cannot be used in an optimization problem. | ||
|
||
Despite its usefulness, modeling ML models as MILP problems has significant limitations. The biggest limitation is the capability of MILP solvers which limits the ML model size. With neural networks, for example, only models with at most hundreds of neurons can be effectively formulated as MILPs and then optimized. In practice, formulating into MILPs and optimizing all large modern ML models such as convolutional neural networks and transformer networks is computationally infeasible. However, if small neural networks are all that is required for the specific application, the methods implemented in this package can be useful. Secondly, only piecewise linear ML models can be formulated as MILP problems. For example, with neural networks this entails using activation functions such as $ReLU$. | ||
|
||
Despite its usefulness, modeling ML models as MIP problems has significant limitations. The biggest limitation is the capability of MIP solvers which limits the ML model size. With neural networks, for example, only models with at most hundreds of neurons can be effectively tackled. In practice, formulating into MIPs and optimizing all large modern models such as convolutional neural networks and transformer networks is computationally infeasible. However, if small neural networks are all that is required for the specific application, the techniques implemented in this package can be useful. Secondly, only piecewise linear ML models can be formulated as MIP problems. For example, with neural networks this entails using only ReLU as the activation function. | ||
Input convex neural networks (ICNNs) are a special type of machine learning model that can be formulated as linear optimization problems (LP). The convexity limits the expressiveness of the ICNN but the LP formulation enables fast optimization of even very large ICNNs. If the data or the function being modeled is approximately convex, ICNNs can provide similar accuracy to regular neural networks. If a ML model is used in some of the contexts mentioned in the first paragraph, ICNNs can be used instead of neural networks without the computational limitations of MILP models. | ||
|
||
## Getting started | ||
|
||
The following sections [Tree ensembles](tree_ensembles.md) and [Neural networks](neural_networks.md) give a very simple demonstration on how to use the package. | ||
Multiprocessing examples and more detailed code can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl). | ||
The following sections [Tree ensembles](tree_ensembles.md), [Neural networks](neural_networks.md), [Neural networks in larger optimization problems](nns_in_larger.md) and [Input convex neural networks](icnns.md) give simple demonstrations on how to use the package. | ||
Examples on multiprocessing features as well as more detailed code can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.