Skip to content

Commit

Permalink
Merge pull request #29 from gamma-opt/icnns
Browse files Browse the repository at this point in the history
Icnns
  • Loading branch information
EetuReijonen authored Jul 29, 2024
2 parents 1d8a089 + 87b1977 commit 65eb120
Show file tree
Hide file tree
Showing 34 changed files with 1,325 additions and 59 deletions.
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ version = "0.2.0"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

Expand All @@ -20,9 +21,9 @@ julia = "1"
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
GLPK = "60bf3e95-4087-53dc-ae20-288a0d20c6a6"
HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"
QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"

[targets]
test = ["Test", "Random", "BSON", "GLPK", "HiGHS", "QuasiMonteCarlo"]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@

Gogeta.jl (pronounced "Go-gee-ta") enables the user to represent trained machine learning models with mathematical programming, more specifically as mixed-integer optimization problems. This, in turn, allows for "fusing" the capabilities of mathematical optimization solvers and machine learning models to solve problems that neither could solve on their own.

Currently supported models are tree ensembles and neural networks and convolutional neural networks using ReLU activation.
Currently supported models are tree ensembles, input convex neural networks, and neural networks and convolutional neural networks using ReLU activation.
2 changes: 2 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ makedocs(
"Introduction" => "index.md",
"Tree ensembles" => "tree_ensembles.md",
"Neural networks" => "neural_networks.md",
"Neural networks in larger optimization problems" => "nns_in_larger.md",
"Input convex neural networks" => "icnns.md",
"Public API" => "api.md",
"Literature" => "literature.md",
"Reference" => "reference.md",
Expand Down
14 changes: 14 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,31 @@ These are all of the functions and data structures that the user needs to know i

### MIP formulation
* [`NN_formulate!`](@ref) - formulate a `JuMP` model, perform simultaneous bound tightening and possibly compression
* [`NN_incorporate!`](@ref) - formulate a neural network MILP to be part of a larger `JuMP` model by linking the input and output variables

### Compression
* [`NN_compress`](@ref) - compress a neural network using precomputed activation bounds

### Forward pass
* [`forward_pass!`](@ref) - fix the input variables and optimize the model to get the output
* [`forward_pass_NN!`](@ref) - forward pass in a model with anonymous variables with the input and output variables given as arguments

### Sampling-based optimization
* [`optimize_by_sampling!`](@ref) - optimize the JuMP model by using a sampling-based approach
* [`optimize_by_walking!`](@ref) - optimize the JuMP model by using a more sophisticated sampling-based approach

## Input convex neural networks

### LP formulation
* [`ICNN_incorporate!`](@ref) - formulate an ICNN LP to be part of a larger `JuMP` model by linking the input and output variables

### Forward pass
* [`forward_pass_ICNN!`](@ref) - fix the input variables and optimize the model to get the output

### Feasibility
* [`check_ICNN`](@ref) - check whether the given inputs and outputs satisfy the ICNN


## Convolutional neural networks

### Data structures
Expand Down
72 changes: 72 additions & 0 deletions docs/src/icnns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Input convex neural networks (ICNNs)

In input convex neural networks, the neuron weights are constrained to be nonnegative and weighted skip connections are added from the input layer to each layer. More details can be found in [Amos et al. (2017)](literature.md). These changes make the network output convex with respect to the inputs. A convex piecewise linear function can be formulated as a linear programming problem (LP) which is much more computationally efficient than the MILP formulations of "regular" neural networks. This is the reason for implementing ICNN functionality into this package. ICNNs are a viable option when the data or function being modeled is approximately convex and/or some prediction accuracy must be sacrificed for computational performance.

## Training

The [Flux.jl](https://fluxml.ai/Flux.jl/stable/) interface doesn't allow for simple implementation of ICNNs as far as we know. However, the ICNN models can easily be implemented and trained using Tensorflow with Python, for example. The model parameters can then be exported as a JSON file and imported into Julia to create the LP formulation. An example on how to build the ICNN, train it, and export the parameters using the high-level Tensorflow interface can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl). The requirements for the JSON file structure are listed in the function description of [`ICNN_incorporate!`](@ref).

## Formulation

The interface for formulating ICNNs as LPs has been designed to make incorporating them into a larger optimization problem, e.g. as surrogates, as easy as possible. The `JuMP` model is first built and the variables, constraints, and objective are added. The function [`ICNN_incorporate!`](@ref) takes as arguments the `JuMP` model, the relative filepath of the JSON file containing the ICNN parameters, the output variable, and finally the input variables (in order). Currently only models with one output variable are supported.

Build the model.

```julia
jump_model = Model(Gurobi.Optimizer)

@variable(jump_model, -1 <= x <= 1)
@variable(jump_model, -1 <= y <= 1)
@variable(jump_model, z)

@constraint(jump_model, y >= 1-x)

@objective(jump_model, Min, x+y)
```

Include input convex neural network as a part of the larger optimization problem.
The JSON file containing the ICNN parameters is called "model_weights.json".
The variables `x` and `y` are linked to the variable `z` by the ICNN.

```julia
ICNN_incorporate!(jump_model, "model_weights.json", z, x, y)

optimize!(jump_model)
solution_summary(jump_model)

# see optimal solution
value(x)
value(y)
value(z)
```

The problem is very fast to solve since no binary variables are added.

If one wants to use ICNNs "by themselves" for global optimization, for example, the same steps can be followed but without adding any extra variables or constraints.

Add input and output variables of the ICNN into the `JuMP` model and minimize the ICNN output as the objective.

```julia
jump_model = Model(Gurobi.Optimizer)

@variable(jump_model, -1 <= x <= 1)
@variable(jump_model, -1 <= y <= 1)
@variable(jump_model, z)

@objective(jump_model, Min, z)

ICNN_incorporate!(jump_model, "model_weights.json", z, x, y)

optimize!(jump_model)
solution_summary(jump_model)
```

## Considerations

### Feasibility

In an optimization problem where an ICNN has been incorporated as a surrogate, infeasibility might not be able to be detected. This is because the ICNN is formulated as an epigraph with a penalty added to the objective function minimizing the ICNN output. However, this penalty term doesn't prevent the solver from finding solutions that are "above" the ICNN function hypersurface. Therefore, the optimization problem structure should be studied carefully to reason whether it is possible to come up with these pseudo-feasible solutions. If studying the problem structure is too complex, the optimal solution returned by the solver can be checked using the [`check_ICNN`](@ref) function. If the check fails, the optimization problem is likely infeasible but this has not been proved so the problem should be investigated more thoroughly.

### Penalty domination

The second important consideration is the objective function of the optimization problem where the ICNN has been incorporated as a surrogate. As stated in the previous paragrah, the ICNN LP formulation relies on a penalty term that is added to the objective function. Thus, if the objective function already includes a term which is linked to or is itself the ICNN output variable, the penalty term added in the [`ICNN_incorporate!`](@ref) will not have the desired effect of guaranteeing that the ICNN is satisfied. We have not figured out a way around this issue, so if the penalty term is "dominated" in the objective, ICNN surrogates are probably not suitable for the given optimization problem.
29 changes: 23 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,38 @@
# Gogeta.jl

[Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate trained machine learning models as mathematical optimization problems.
[Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate trained machine learning (ML) models as mathematical optimization problems. This approach can be utilized in the global optimization of the ML models, or when using the ML models as surrogates in larger optimization problems.

Currently supported models are `Flux.Chain` ReLU-activated neural networks (dense and convolutional) and `EvoTrees` tree ensemble models.
Currently supported models are $ReLU$-activated neural networks (dense and convolutional), input convex neural networks (ICNNs), and tree ensemble models.

## Installation

The latest official version can be installed from the Julia General repository.

```julia-repl
julia> Pkg.add("Gogeta")
```

Some experimental features may have been implemented on the GitHub page. The latest development version can be accessed by adding the GitHub HTTPS and the branch name as follows:

```julia-repl
julia> Pkg.add("https://github.com/gamma-opt/Gogeta.jl.git#<branch-name>")
```

Replace `<branch-name>` with the name of the branch you want to add.

!!! warning

The code on some of the Git branches might be experimental and not work as expected.

## How can this package be used?

Formulating trained machine learning (ML) models as mixed-integer programming (MIP) problems opens up multiple possibilities. Firstly, it allows for global optimization - finding the input that provably maximizes or minimizes the ML model output. Secondly, changing the objective function in the MIP formulation and/or adding additional constraints makes it possible to solve problems related to the ML model, such as finding adversarial inputs. Lastly, the MIP formulation of a ML model can be included into a larger optimization problem. This is useful in surrogate contexts where an ML model can be trained to approximate a complicated function that itself cannot be used in an optimization problem.
Formulating trained machine learning (ML) models as mixed-integer linear programming (MILP) problems opens up multiple possibilities. Firstly, it allows for global optimization - finding the input that provably maximizes or minimizes the ML model output. Secondly, changing the objective function in the MILP formulation and/or adding additional constraints makes it possible to solve problems related to the ML model, such as finding adversarial inputs. Lastly, the MILP formulation of a ML model can be incorporated into a larger optimization problem. This is useful in a surrogate modeling context where an ML model can be trained to approximate a complex function that itself cannot be used in an optimization problem.

Despite its usefulness, modeling ML models as MILP problems has significant limitations. The biggest limitation is the capability of MILP solvers which limits the ML model size. With neural networks, for example, only models with at most hundreds of neurons can be effectively formulated as MILPs and then optimized. In practice, formulating into MILPs and optimizing all large modern ML models such as convolutional neural networks and transformer networks is computationally infeasible. However, if small neural networks are all that is required for the specific application, the methods implemented in this package can be useful. Secondly, only piecewise linear ML models can be formulated as MILP problems. For example, with neural networks this entails using activation functions such as $ReLU$.

Despite its usefulness, modeling ML models as MIP problems has significant limitations. The biggest limitation is the capability of MIP solvers which limits the ML model size. With neural networks, for example, only models with at most hundreds of neurons can be effectively tackled. In practice, formulating into MIPs and optimizing all large modern models such as convolutional neural networks and transformer networks is computationally infeasible. However, if small neural networks are all that is required for the specific application, the techniques implemented in this package can be useful. Secondly, only piecewise linear ML models can be formulated as MIP problems. For example, with neural networks this entails using only ReLU as the activation function.
Input convex neural networks (ICNNs) are a special type of machine learning model that can be formulated as linear optimization problems (LP). The convexity limits the expressiveness of the ICNN but the LP formulation enables fast optimization of even very large ICNNs. If the data or the function being modeled is approximately convex, ICNNs can provide similar accuracy to regular neural networks. If a ML model is used in some of the contexts mentioned in the first paragraph, ICNNs can be used instead of neural networks without the computational limitations of MILP models.

## Getting started

The following sections [Tree ensembles](tree_ensembles.md) and [Neural networks](neural_networks.md) give a very simple demonstration on how to use the package.
Multiprocessing examples and more detailed code can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl).
The following sections [Tree ensembles](tree_ensembles.md), [Neural networks](neural_networks.md), [Neural networks in larger optimization problems](nns_in_larger.md) and [Input convex neural networks](icnns.md) give simple demonstrations on how to use the package.
Examples on multiprocessing features as well as more detailed code can be found in the `examples/`-folder of the [package repository](https://github.com/gamma-opt/Gogeta.jl).
8 changes: 6 additions & 2 deletions docs/src/literature.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Literature

The mathematical optimization methods implemented in this package are based on the work of many brilliant researchers.
The mathematical optimization methods and algorithms implemented in this package are based on the work of many brilliant researchers.
The most important papers for our work are listed here. In these works, more in-depth information about the formulations and various algorithms we use can also be found.

* **(Convolutional) neural network formulation:**
Expand All @@ -25,4 +25,8 @@ The most important papers for our work are listed here. In these works, more in-

* **Tree ensembles:**

* *Mišić, V. V. (2020). Optimization of tree ensembles. Operations Research, 68(5), 1605-1624.*
* *Mišić, V. V. (2020). Optimization of tree ensembles. Operations Research, 68(5), 1605-1624.*

* **Input convex neural networks**

* *Amos, B., Xu, L., & Kolter, J. Z. (2017, July). Input convex neural networks. In International conference on machine learning (pp. 146-155). PMLR.*
22 changes: 15 additions & 7 deletions docs/src/neural_networks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@

With neural networks, the hidden layers must use the $ReLU$ activation function, and the output layer must use the identity activation.

A neural networks satifying these requirements can be formulated into a mixed-integer optimization problem.
A neural networks satifying these requirements can be formulated into a mixed-integer linear optimization problem (MILP).
Along with formulation, the neuron activation bounds can be calculated, which improves computational performance as well as enables compression.

The network is compressed by pruning neurons that are either stabily active or inactive. The activation bounds are used to identify these neurons.

!!! note

This section describes how to formulate a `Flux.Chain` neural network model as a MILP. Further constraints can be added and the objective function can be changed but if one is using neural networks as surrogate models in a larger optimization problem, [this section](nns_in_larger.md) has a guide on how to accomplish this effectively and formulate the neural network with anonymous variables.

## Formulation

First, create a neural network model satisfying the requirements:
Expand Down Expand Up @@ -123,7 +127,11 @@ A detailed discussion on bound tightening techniques can be found in [Grimstad a

## Sampling

Instead of just solving the MIP, the neural network can be optimized (finding the output maximizing/minimizing input) by using a sampling approach. Note that these features are experimental and cannot be guranteed to find the global optimum.
Instead of just solving the MIP, the neural network can be optimized (finding the output maximizing/minimizing input) by using a sampling approach. Note that these features are experimental and cannot be guaranteed to find the global optimum.

!!! note

Much more effective algorithms for finding the optimum of a trained neural network exist, such as projected gradient descent. The sampling-based optimization algorithms implemented in this package are best intended for satisfying one's curiosity and understanding the problem structure better.

```julia
using QuasiMonteCarlo
Expand All @@ -142,7 +150,7 @@ x_opt, optimum = optimize_by_sampling!(jump_model, samples);

### Relaxing walk algorithm

Another method for heuristically optimizing the JuMP model is the so-called relaxing walk algorithm. It is based on a sampling approach that utilizes LP relaxations of the original problem and a pseudo gradient descent -algorithm.
Another method for heuristically optimizing the `JuMP` model is the so-called relaxing walk algorithm. It is based on a sampling approach that utilizes LP relaxations of the original problem and a pseudo gradient descent -algorithm.

```julia
jump_model = Model(Gurobi.Optimizer)
Expand Down Expand Up @@ -171,9 +179,9 @@ The choice of the best neural network bound tightening and compression procedure
Based on some limited computational tests of our own as well as knowledge from the field, we can make the following general recommendations:

* Wide but shallow neural networks should be preferred. The bound tightening gets exponentially harder with deeper layers.
* For small neural network models, using the "fast" bound tightening option is probably the best, since the resulting formulations are easy to solve even with loose bounds.
* For larger neural networks, "standard" bound tightening will produce tighter bounds but take more time. However, when using the `JuMP` model, the tighter bounds might make it more computationally feasible.
* For large neural networks where the output bounds are known, "output" bound tightening can be used. This bound tightening is very slow but might be necessary to increase the computational feasibility of the resulting `JuMP` model.
* If the model has many so-called "dead" neurons, creating the JuMP model by using compression is beneficial, since the formulation will have fewer constraints and the bound tightening will be faster, reducing total formulation time.
* For small neural network models, using the `fast` bound tightening option is probably the best, since the resulting formulations are easy to solve even with loose bounds.
* For larger neural networks, `standard` bound tightening will produce tighter bounds but take more time. However, when using the `JuMP` model, the tighter bounds might make it more computationally feasible.
* For large neural networks where the output bounds are known, `output` bound tightening can be used. This bound tightening is very slow but might be necessary to increase the computational feasibility of the resulting `JuMP` model.
* If the model has many so-called "dead" neurons, creating the `JuMP` model by using compression is beneficial, since the formulation will have fewer constraints and the bound tightening will be faster, reducing total formulation time.

These are only general recommendations based on limited evidence, and the user should validate the performance of each bound tightening and compression procedure in relation to her own work.
Loading

0 comments on commit 65eb120

Please sign in to comment.