Skip to content

Commit

Permalink
Merge pull request #20 from gamma-opt/add-cnn
Browse files Browse the repository at this point in the history
Convolutional neural network (CNN) functionality
  • Loading branch information
EetuReijonen authored Feb 29, 2024
2 parents 17d7049 + 9c79b76 commit 013edfa
Show file tree
Hide file tree
Showing 10 changed files with 482 additions and 4 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

Gogeta.jl (pronounced "Go-gee-ta") enables the user to represent machine-learning models with mathematical programming, more specifically as mixed-integer optimization problems. This, in turn, allows for "fusing" the capabilities of mathematical optimisation solvers and machine learning models to solve problems that neither could solve on their own.

Currently supported models include neural networks using ReLU activation and tree ensembles.
Currently supported models include neural networks and convolutional neural networks using ReLU activation and tree ensembles.

## Package features

Expand All @@ -21,3 +21,6 @@ Currently supported models include neural networks using ReLU activation and tre
* **bound tightening** - improve computational feasibility by tightening bounds in the formulation according to input/output bounds
* **neural network compression** - reduce network size by removing inactive or stably active neurons
* **neural network optimization** - find the input that maximizes the neural network output

### Convolutional neural networks
* **convolutional neural network to MIP conversion** - formulate a mixed-integer programming problem from a convolutional neural network
16 changes: 15 additions & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,18 @@ These are all of the functions and data structures that the user needs to know i
* [`compress_with_precomputed`](@ref) - perform compression with precomputed neuron activation bounds

### Forward pass
* [`forward_pass!`](@ref) - fixes the input variables and optimizes the model to get the ouput
* [`forward_pass!`](@ref) - fix the input variables and optimize the model to get the output

## Convolutional neural networks

### Data structures
* [`CNNStructure`](@ref) - container for the layer stucture of a convolutional neural network model

### Parameter extraction
* [`get_structure`](@ref) - get layer structure from a convolutional neural network model

### MIP formulation
* [`create_MIP_from_CNN!`](@ref) - formulate a `JuMP` model from the CNN

### Forward pass
* [`image_pass!`](@ref) - fix the input variables and optimize the model to get the ouput
44 changes: 43 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate machine-learning models as mathematical programming problems.

Currently supported models are `Flux.Chain` ReLU-activated neural networks and `EvoTrees` tree ensemble models.
Currently supported models are `Flux.Chain` ReLU-activated neural networks (dense and convolutional) and `EvoTrees` tree ensemble models.

## Installation
```julia-repl
Expand Down Expand Up @@ -112,6 +112,48 @@ Use the `JuMP` model to calculate a forward pass through the network (input at t
forward_pass!(jump_model, [-1.0, 0.0])
```

#### Convolutional neural networks

The convolutional neural network requirements can be found in the [`create_MIP_from_CNN!`](@ref) documentation.

First, create some kind of input (or load an image from your computer).

```julia
input = rand(Float32, 70, 50, 1, 1) # BW 70x50 image
```

Then, create a convolutional neural network model satisfying the requirements:

```julia
using Flux

CNN_model = Flux.Chain(
Conv((4,3), 1 => 10, pad=(2, 1), stride=(3, 2), relu),
MeanPool((5,3), pad=(3, 2), stride=(2, 2)),
MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
Conv((4,3), 10 => 5, pad=(2, 1), stride=(3, 2), relu),
MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
Flux.flatten,
Dense(20 => 100, relu),
Dense(100 => 1)
)
```

Then, create an empty `JuMP` model, extract the layer structure of the CNN model and finally formulate the MIP.

```julia
jump = Model(Gurobi.Optimizer)
set_silent(jump)
cnns = get_structure(CNN_model, input);
create_MIP_from_CNN!(jump, CNN_model, cnns)
```

Check that the `JuMP` model produces the same outputs as the `Flux.Chain`.

```julia
vec(CNN_model(input)) image_pass!(jump, input)
```

## How to use?

Using the tree ensemble optimization from this package is quite straightforward. The only parameter the user can change is the solution method: with initial constraints or with lazy constraints.
Expand Down
3 changes: 3 additions & 0 deletions examples/Project.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
[deps]
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
GLPK = "60bf3e95-4087-53dc-ae20-288a0d20c6a6"
Gogeta = "8b0c908c-0e18-4590-8d3d-9c2483fd76bf"
Gurobi = "2e9cd046-0924-5485-92f1-d5272153d98b"
HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"
Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand Down
74 changes: 74 additions & 0 deletions examples/conv_neural_networks.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
using Flux
using Plots
using JuMP
using Gurobi
using Random
using Gogeta

using Images
using FileIO
image = load("/Users/eetureijonen/Pictures/IMG_0195.JPG"); # swap in your own image
downscaled_image = imresize(image, (70, 50));

input = reshape(Float32.(channelview(Gray.(downscaled_image))), 70, 50, 1, 1);
input = input[end:-1:1, :, :, :];
size(CNN_model[1:6](input))

Random.seed!(1234)
CNN_model = Flux.Chain(
Conv((4,3), 1 => 10, pad=(2, 1), stride=(3, 2), relu),
MeanPool((5,3), pad=(3, 2), stride=(2, 2)),
MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
Conv((4,3), 10 => 5, pad=(2, 1), stride=(3, 2), relu),
MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
Flux.flatten,
Dense(20 => 100, relu),
Dense(100 => 1)
)

# input image
heatmap(input[:, :, 1, 1], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))

# convolution layer outputs
outputs = [CNN_model[1](input)[:, :, channel, 1] for channel in 1:10];
display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));

# meanpool outputs
outputs = [CNN_model[1:2](input)[:, :, channel, 1] for channel in 1:10];
display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));

# maxpool
outputs = [CNN_model[1:3](input)[:, :, channel, 1] for channel in 1:10];
display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));

# new conv
outputs = [CNN_model[1:4](input)[:, :, channel, 1] for channel in 1:5];
display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));

# last maxpool
outputs = [CNN_model[1:5](input)[:, :, channel, 1] for channel in 1:5];
display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));

# create jump model from cnn
jump = Model(Gurobi.Optimizer)
set_silent(jump)
cnns = get_structure(CNN_model, input);
create_MIP_from_CNN!(jump, CNN_model, cnns)

# Test that jump model produces same outputs for all layers as the CNN
@time CNN_model[1](input)[:, :, :, 1] image_pass!(jump, input, cnns, 1)
@time CNN_model[1:2](input)[:, :, :, 1] image_pass!(jump, input, cnns, 2)
@time CNN_model[1:3](input)[:, :, :, 1] image_pass!(jump, input, cnns, 3)
@time CNN_model[1:4](input)[:, :, :, 1] image_pass!(jump, input, cnns, 4)
@time CNN_model[1:5](input)[:, :, :, 1] image_pass!(jump, input, cnns, 5)
@time vec(CNN_model[1:6](input)) image_pass!(jump, input, cnns, 6)
@time vec(CNN_model[1:7](input)) image_pass!(jump, input, cnns, 7)
@time vec(CNN_model[1:8](input)) image_pass!(jump, input, cnns, 8)
@time vec(CNN_model(input)) image_pass!(jump, input)

# Plot true model meanpool fifth channel
heatmap(CNN_model[1:2](input)[:, :, 5, 1], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))

# Plot jump model meanpool fifth channel
heatmap(image_pass!(jump, input, cnns, 2)[:, :, 5], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))

6 changes: 6 additions & 0 deletions src/Gogeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ include("neural_networks/interface.jl")
export NN_to_MIP_with_precomputed, NN_to_MIP_with_bound_tightening, compress_with_precomputed, compress_with_bound_tightening
export forward_pass!, SolverParams

include("neural_networks/CNN_util.jl")
export CNNStructure, get_structure, image_pass!

include("neural_networks/CNN_convert.jl")
export create_MIP_from_CNN!

include("tree_ensembles/types.jl")
export TEModel, extract_evotrees_info

Expand Down
146 changes: 146 additions & 0 deletions src/neural_networks/CNN_convert.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
"""
function create_MIP_from_CNN!(jump_model::JuMP.Model, CNN_model::Flux.Chain, cnnstruct::CNNStructure)
Creates a mixed-integer optimization problem from a `Flux.Chain` convolutional neural network model.
The optimization formulation is saved in the `JuMP.Model` given as an input.
The convolutional neural network must follow a certain structure:
- It must consist of (in order) convolutional and pooling layers, a `Flux.flatten` layer and finally dense layers
- I.e. allowed layer types: `Conv`, `MaxPool`, `MeanPool`, `Flux.flatten`, `Dense`
- The activation function for all of the convolutional layers and the dense layers must be `ReLU`
- The last dense layer must use the `identity` activation function
- Input size, filter size, stride and padding can be chosen freely
# Parameters
- `jump_model`: an empty optimization model where the formulation will be saved
- `CNN_model`: `Flux.Chain` containing the CNN
- `cnnstruct`: holds the layer structure of the CNN
"""
function create_MIP_from_CNN!(jump_model::JuMP.Model, CNN_model::Flux.Chain, cnnstruct::CNNStructure)

channels = cnnstruct.channels
dims = cnnstruct.dims
dense_lengths = cnnstruct.dense_lengths
conv_inds = cnnstruct.conv_inds
maxpool_inds = cnnstruct.maxpool_inds
meanpool_inds = cnnstruct.meanpool_inds
flatten_ind = cnnstruct.flatten_ind
dense_inds = cnnstruct.dense_inds

# 2d layers
@variable(jump_model, c[layer=union(0, conv_inds, maxpool_inds, meanpool_inds), 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]] >= 0) # input is always between 0 and 1
@variable(jump_model, cs[layer=conv_inds, 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]] >= 0)
@variable(jump_model, cz[layer=conv_inds, 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]], Bin)

# dense layers
@variable(jump_model, x[layer=union(flatten_ind, dense_inds), 1:dense_lengths[layer]])
@variable(jump_model, s[layer=dense_inds[1:end-1], 1:dense_lengths[layer]] >= 0)
@variable(jump_model, z[layer=dense_inds[1:end-1], 1:dense_lengths[layer]], Bin)

U_bounds_dense = Dict{Int, Vector}()
L_bounds_dense = Dict{Int, Vector}()

pixel_or_pad(layer, row, col, channel) = if haskey(c, (layer, row, col, channel)) c[layer, row, col, channel] else 0.0 end

for (layer_index, layer_data) in enumerate(CNN_model)

if layer_index in conv_inds

filters = [Flux.params(layer_data)[1][:, :, in_channel, out_channel] for in_channel in 1:channels[layer_index-1], out_channel in 1:channels[layer_index]]
biases = Flux.params(layer_data)[2]

f_height, f_width = size(filters[1, 1])

for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]

pos = (layer_data.stride[1]*(row-1) + 1 - layer_data.pad[1], layer_data.stride[2]*(col-1) + 1 - layer_data.pad[2])

for out_channel in 1:channels[layer_index]

convolution = @expression(jump_model,
sum([filters[in_channel, out_channel][f_height-i, f_width-j] * pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, in_channel)
for i in 0:f_height-1,
j in 0:f_width-1,
in_channel in 1:channels[layer_index-1]])
)

@constraint(jump_model, c[layer_index, row, col, out_channel] - cs[layer_index, row, col, out_channel] == convolution + biases[out_channel])
@constraint(jump_model, c[layer_index, row, col, out_channel] <= 1.0 * (1-cz[layer_index, row, col, out_channel]))
@constraint(jump_model, cs[layer_index, row, col, out_channel] <= 1.0 * cz[layer_index, row, col, out_channel])
end
end

elseif layer_index in maxpool_inds

p_height = layer_data.k[1]
p_width = layer_data.k[2]

for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]

pos = (layer_data.stride[1]*(row-1) - layer_data.pad[1], layer_data.stride[2]*(col-1) - layer_data.pad[2])

for channel in 1:channels[layer_index-1]

@constraint(jump_model, [i in 1:p_height, j in 1:p_width], c[layer_index, row, col, channel] >= pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel))

pz = @variable(jump_model, [1:p_height, 1:p_width], Bin)
@constraint(jump_model, sum([pz[i, j] for i in 1:p_height, j in 1:p_width]) == 1)

@constraint(jump_model, [i in 1:p_height, j in 1:p_width], c[layer_index, row, col, channel] <= pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel) + (1-pz[i, j]))
end
end

elseif layer_index in meanpool_inds

p_height = layer_data.k[1]
p_width = layer_data.k[2]

for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]

pos = (layer_data.stride[1]*(row-1) - layer_data.pad[1], layer_data.stride[2]*(col-1) - layer_data.pad[2])

for channel in 1:channels[layer_index-1]
@constraint(jump_model, c[layer_index, row, col, channel] == 1/(p_height*p_width) * sum(pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel)
for i in 1:p_height,
j in 1:p_width)
)
end
end

elseif layer_index == flatten_ind

@constraint(jump_model, [channel in 1:channels[layer_index-1], row in dims[layer_index-1][1]:-1:1, col in 1:dims[layer_index-1][2]],
x[flatten_ind, row + (col-1)*dims[layer_index-1][1] + (channel-1)*prod(dims[layer_index-1])] == c[layer_index-1, row, col, channel]
)

elseif layer_index in dense_inds

weights = Flux.params(layer_data)[1]
biases = Flux.params(layer_data)[2]

n_neurons = length(biases)
n_previous = length(x[layer_index-1, :])

# compute heuristic bounds
if layer_index == minimum(dense_inds)
U_bounds_dense[layer_index] = [sum(max(weights[neuron, previous] * 1.0, 0.0) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
L_bounds_dense[layer_index] = [sum(min(weights[neuron, previous] * 1.0, 0.0) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
else
U_bounds_dense[layer_index] = [sum(max(weights[neuron, previous] * max(0, U_bounds_dense[layer_index-1][previous]), weights[neuron, previous] * max(0, L_bounds_dense[layer_index-1][previous])) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
L_bounds_dense[layer_index] = [sum(min(weights[neuron, previous] * max(0, U_bounds_dense[layer_index-1][previous]), weights[neuron, previous] * max(0, L_bounds_dense[layer_index-1][previous])) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
end

if layer_data.σ == relu
for neuron in 1:n_neurons
@constraint(jump_model, x[layer_index, neuron] >= 0)
@constraint(jump_model, x[layer_index, neuron] <= U_bounds_dense[layer_index][neuron] * (1 - z[layer_index, neuron]))
@constraint(jump_model, s[layer_index, neuron] <= -L_bounds_dense[layer_index][neuron] * z[layer_index, neuron])
@constraint(jump_model, x[layer_index, neuron] - s[layer_index, neuron] == biases[neuron] + sum(weights[neuron, i] * x[layer_index-1, i] for i in 1:n_previous))
end
elseif layer_data.σ == identity
@constraint(jump_model, [neuron in 1:n_neurons], x[layer_index, neuron] == biases[neuron] + sum(weights[neuron, i] * x[layer_index-1, i] for i in 1:n_previous))
end
end
end
end
Loading

0 comments on commit 013edfa

Please sign in to comment.