Merge pull request #20 from gamma-opt/add-cnn

Convolutional neural network (CNN) functionality
gamma-opt · Feb 29, 2024 · 013edfa · 013edfa
2 parents 17d7049 + 9c79b76
commit 013edfa
Show file tree

Hide file tree

Showing 10 changed files with 482 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 
 Gogeta.jl (pronounced "Go-gee-ta") enables the user to represent machine-learning models with mathematical programming, more specifically as mixed-integer optimization problems. This, in turn, allows for "fusing" the capabilities of mathematical optimisation solvers and machine learning models to solve problems that neither could solve on their own.
 
-Currently supported models include neural networks using ReLU activation and tree ensembles.
+Currently supported models include neural networks and convolutional neural networks using ReLU activation and tree ensembles.
 
 ## Package features
 
@@ -21,3 +21,6 @@ Currently supported models include neural networks using ReLU activation and tre
 * **bound tightening** - improve computational feasibility by tightening bounds in the formulation according to input/output bounds
 * **neural network compression** - reduce network size by removing inactive or stably active neurons
 * **neural network optimization** - find the input that maximizes the neural network output
+
+### Convolutional neural networks
+* **convolutional neural network to MIP conversion** - formulate a mixed-integer programming problem from a convolutional neural network
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -34,4 +34,18 @@ These are all of the functions and data structures that the user needs to know i
 * [`compress_with_precomputed`](@ref) - perform compression with precomputed neuron activation bounds
 
 ### Forward pass
-* [`forward_pass!`](@ref) - fixes the input variables and optimizes the model to get the ouput
+* [`forward_pass!`](@ref) - fix the input variables and optimize the model to get the output
+
+## Convolutional neural networks
+
+### Data structures
+* [`CNNStructure`](@ref) - container for the layer stucture of a convolutional neural network model
+
+### Parameter extraction
+* [`get_structure`](@ref) - get layer structure from a convolutional neural network model
+
+### MIP formulation
+* [`create_MIP_from_CNN!`](@ref) - formulate a `JuMP` model from the CNN
+
+### Forward pass
+* [`image_pass!`](@ref) - fix the input variables and optimize the model to get the ouput
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -2,7 +2,7 @@
 
 [Gogeta](https://gamma-opt.github.io/Gogeta.jl/) is a package that enables the user to formulate machine-learning models as mathematical programming problems.
 
-Currently supported models are `Flux.Chain` ReLU-activated neural networks and `EvoTrees` tree ensemble models.
+Currently supported models are `Flux.Chain` ReLU-activated neural networks (dense and convolutional) and `EvoTrees` tree ensemble models.
 
 ## Installation
 ```julia-repl
@@ -112,6 +112,48 @@ Use the `JuMP` model to calculate a forward pass through the network (input at t
 forward_pass!(jump_model, [-1.0, 0.0])
 ```
 
+#### Convolutional neural networks
+
+The convolutional neural network requirements can be found in the [`create_MIP_from_CNN!`](@ref) documentation.
+
+First, create some kind of input (or load an image from your computer).
+
+```julia
+input = rand(Float32, 70, 50, 1, 1) # BW 70x50 image
+```
+
+Then, create a convolutional neural network model satisfying the requirements:
+
+```julia
+using Flux
+
+CNN_model = Flux.Chain(
+    Conv((4,3), 1 => 10, pad=(2, 1), stride=(3, 2), relu),
+    MeanPool((5,3), pad=(3, 2), stride=(2, 2)),
+    MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
+    Conv((4,3), 10 => 5, pad=(2, 1), stride=(3, 2), relu),
+    MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
+    Flux.flatten,
+    Dense(20 => 100, relu),
+    Dense(100 => 1)
+)
+```
+
+Then, create an empty `JuMP` model, extract the layer structure of the CNN model and finally formulate the MIP.
+
+```julia
+jump = Model(Gurobi.Optimizer)
+set_silent(jump)
+cnns = get_structure(CNN_model, input);
+create_MIP_from_CNN!(jump, CNN_model, cnns)
+```
+
+Check that the `JuMP` model produces the same outputs as the `Flux.Chain`.
+
+```julia
+vec(CNN_model(input)) ≈ image_pass!(jump, input)
+```
+
 ## How to use?
 
 Using the tree ensemble optimization from this package is quite straightforward. The only parameter the user can change is the solution method: with initial constraints or with lazy constraints.

diff --git a/examples/Project.toml b/examples/Project.toml
@@ -1,10 +1,13 @@
 [deps]
 Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
+FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 GLPK = "60bf3e95-4087-53dc-ae20-288a0d20c6a6"
 Gogeta = "8b0c908c-0e18-4590-8d3d-9c2483fd76bf"
 Gurobi = "2e9cd046-0924-5485-92f1-d5272153d98b"
+HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"
+Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
 JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

diff --git a/examples/conv_neural_networks.jl b/examples/conv_neural_networks.jl
@@ -0,0 +1,74 @@
+using Flux
+using Plots
+using JuMP
+using Gurobi
+using Random
+using Gogeta
+
+using Images
+using FileIO
+image = load("/Users/eetureijonen/Pictures/IMG_0195.JPG"); # swap in your own image
+downscaled_image = imresize(image, (70, 50));
+
+input = reshape(Float32.(channelview(Gray.(downscaled_image))), 70, 50, 1, 1);
+input = input[end:-1:1, :, :, :];
+size(CNN_model[1:6](input))
+
+Random.seed!(1234)
+CNN_model = Flux.Chain(
+    Conv((4,3), 1 => 10, pad=(2, 1), stride=(3, 2), relu),
+    MeanPool((5,3), pad=(3, 2), stride=(2, 2)),
+    MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
+    Conv((4,3), 10 => 5, pad=(2, 1), stride=(3, 2), relu),
+    MaxPool((3,4), pad=(1, 3), stride=(3, 2)),
+    Flux.flatten,
+    Dense(20 => 100, relu),
+    Dense(100 => 1)
+)
+
+# input image
+heatmap(input[:, :, 1, 1], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))
+
+# convolution layer outputs
+outputs = [CNN_model[1](input)[:, :, channel, 1] for channel in 1:10];
+display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));
+
+# meanpool outputs
+outputs = [CNN_model[1:2](input)[:, :, channel, 1] for channel in 1:10];
+display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));
+
+# maxpool
+outputs = [CNN_model[1:3](input)[:, :, channel, 1] for channel in 1:10];
+display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));
+
+# new conv
+outputs = [CNN_model[1:4](input)[:, :, channel, 1] for channel in 1:5];
+display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));
+
+# last maxpool
+outputs = [CNN_model[1:5](input)[:, :, channel, 1] for channel in 1:5];
+display.(heatmap.(outputs, background=false, legend=false, color = :inferno, aspect_ratio=:equal, axis=([], false)));
+
+# create jump model from cnn
+jump = Model(Gurobi.Optimizer)
+set_silent(jump)
+cnns = get_structure(CNN_model, input);
+create_MIP_from_CNN!(jump, CNN_model, cnns)
+
+# Test that jump model produces same outputs for all layers as the CNN
+@time CNN_model[1](input)[:, :, :, 1] ≈ image_pass!(jump, input, cnns, 1)
+@time CNN_model[1:2](input)[:, :, :, 1] ≈ image_pass!(jump, input, cnns, 2)
+@time CNN_model[1:3](input)[:, :, :, 1] ≈ image_pass!(jump, input, cnns, 3)
+@time CNN_model[1:4](input)[:, :, :, 1] ≈ image_pass!(jump, input, cnns, 4)
+@time CNN_model[1:5](input)[:, :, :, 1] ≈ image_pass!(jump, input, cnns, 5)
+@time vec(CNN_model[1:6](input)) ≈ image_pass!(jump, input, cnns, 6)
+@time vec(CNN_model[1:7](input)) ≈ image_pass!(jump, input, cnns, 7)
+@time vec(CNN_model[1:8](input)) ≈ image_pass!(jump, input, cnns, 8)
+@time vec(CNN_model(input)) ≈ image_pass!(jump, input)
+
+# Plot true model meanpool fifth channel
+heatmap(CNN_model[1:2](input)[:, :, 5, 1], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))
+
+# Plot jump model meanpool fifth channel
+heatmap(image_pass!(jump, input, cnns, 2)[:, :, 5], background=false, legend=false, color=:inferno, aspect_ratio=:equal, axis=([], false))
+
diff --git a/src/Gogeta.jl b/src/Gogeta.jl
@@ -28,6 +28,12 @@ include("neural_networks/interface.jl")
 export NN_to_MIP_with_precomputed, NN_to_MIP_with_bound_tightening, compress_with_precomputed, compress_with_bound_tightening
 export forward_pass!, SolverParams
 
+include("neural_networks/CNN_util.jl")
+export CNNStructure, get_structure, image_pass!
+
+include("neural_networks/CNN_convert.jl")
+export create_MIP_from_CNN!
+
 include("tree_ensembles/types.jl")
 export TEModel, extract_evotrees_info
 

diff --git a/src/neural_networks/CNN_convert.jl b/src/neural_networks/CNN_convert.jl
@@ -0,0 +1,146 @@
+"""
+    function create_MIP_from_CNN!(jump_model::JuMP.Model, CNN_model::Flux.Chain, cnnstruct::CNNStructure)
+
+Creates a mixed-integer optimization problem from a `Flux.Chain` convolutional neural network model.
+The optimization formulation is saved in the `JuMP.Model` given as an input.
+
+The convolutional neural network must follow a certain structure:
+- It must consist of (in order) convolutional and pooling layers, a `Flux.flatten` layer and finally dense layers
+- I.e. allowed layer types: `Conv`, `MaxPool`, `MeanPool`, `Flux.flatten`, `Dense`
+- The activation function for all of the convolutional layers and the dense layers must be `ReLU`
+- The last dense layer must use the `identity` activation function
+- Input size, filter size, stride and padding can be chosen freely
+
+# Parameters
+- `jump_model`: an empty optimization model where the formulation will be saved
+- `CNN_model`: `Flux.Chain` containing the CNN
+- `cnnstruct`: holds the layer structure of the CNN
+
+"""
+function create_MIP_from_CNN!(jump_model::JuMP.Model, CNN_model::Flux.Chain, cnnstruct::CNNStructure)
+
+    channels = cnnstruct.channels
+    dims = cnnstruct.dims
+    dense_lengths = cnnstruct.dense_lengths
+    conv_inds = cnnstruct.conv_inds
+    maxpool_inds = cnnstruct.maxpool_inds
+    meanpool_inds = cnnstruct.meanpool_inds
+    flatten_ind = cnnstruct.flatten_ind
+    dense_inds  = cnnstruct.dense_inds
+
+    # 2d layers
+    @variable(jump_model, c[layer=union(0, conv_inds, maxpool_inds, meanpool_inds), 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]] >= 0) # input is always between 0 and 1
+    @variable(jump_model, cs[layer=conv_inds, 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]] >= 0)
+    @variable(jump_model, cz[layer=conv_inds, 1:dims[layer][1], 1:dims[layer][2], 1:channels[layer]], Bin)
+
+    # dense layers
+    @variable(jump_model, x[layer=union(flatten_ind, dense_inds), 1:dense_lengths[layer]])
+    @variable(jump_model, s[layer=dense_inds[1:end-1], 1:dense_lengths[layer]] >= 0)
+    @variable(jump_model, z[layer=dense_inds[1:end-1], 1:dense_lengths[layer]], Bin)
+
+    U_bounds_dense = Dict{Int, Vector}()
+    L_bounds_dense = Dict{Int, Vector}()
+
+    pixel_or_pad(layer, row, col, channel) = if haskey(c, (layer, row, col, channel)) c[layer, row, col, channel] else 0.0 end
+
+    for (layer_index, layer_data) in enumerate(CNN_model)
+
+        if layer_index in conv_inds
+
+            filters = [Flux.params(layer_data)[1][:, :, in_channel, out_channel] for in_channel in 1:channels[layer_index-1], out_channel in 1:channels[layer_index]]
+            biases = Flux.params(layer_data)[2]
+
+            f_height, f_width = size(filters[1, 1])
+
+            for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]
+
+                pos = (layer_data.stride[1]*(row-1) + 1 - layer_data.pad[1], layer_data.stride[2]*(col-1) + 1 - layer_data.pad[2])
+
+                for out_channel in 1:channels[layer_index]
+
+                    convolution = @expression(jump_model, 
+                        sum([filters[in_channel, out_channel][f_height-i, f_width-j] * pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, in_channel)
+                        for i in 0:f_height-1, 
+                            j in 0:f_width-1, 
+                            in_channel in 1:channels[layer_index-1]])
+                    )
+
+                    @constraint(jump_model, c[layer_index, row, col, out_channel] - cs[layer_index, row, col, out_channel] == convolution + biases[out_channel])
+                    @constraint(jump_model, c[layer_index, row, col, out_channel] <= 1.0 * (1-cz[layer_index, row, col, out_channel]))
+                    @constraint(jump_model, cs[layer_index, row, col, out_channel] <= 1.0 * cz[layer_index, row, col, out_channel])
+                end
+            end
+
+        elseif layer_index in maxpool_inds
+
+            p_height = layer_data.k[1]
+            p_width = layer_data.k[2]
+
+            for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]
+
+                pos = (layer_data.stride[1]*(row-1) - layer_data.pad[1], layer_data.stride[2]*(col-1) - layer_data.pad[2])
+
+                for channel in 1:channels[layer_index-1]
+
+                    @constraint(jump_model, [i in 1:p_height, j in 1:p_width], c[layer_index, row, col, channel] >= pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel))
+
+                    pz = @variable(jump_model, [1:p_height, 1:p_width], Bin)
+                    @constraint(jump_model, sum([pz[i, j] for i in 1:p_height, j in 1:p_width]) == 1)
+
+                    @constraint(jump_model, [i in 1:p_height, j in 1:p_width], c[layer_index, row, col, channel] <= pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel) + (1-pz[i, j]))
+                end
+            end
+
+        elseif layer_index in meanpool_inds
+
+            p_height = layer_data.k[1]
+            p_width = layer_data.k[2]
+
+            for row in 1:dims[layer_index][1], col in 1:dims[layer_index][2]
+
+                pos = (layer_data.stride[1]*(row-1) - layer_data.pad[1], layer_data.stride[2]*(col-1) - layer_data.pad[2])
+
+                for channel in 1:channels[layer_index-1]
+                    @constraint(jump_model, c[layer_index, row, col, channel] == 1/(p_height*p_width) * sum(pixel_or_pad(layer_index-1, pos[1]+i, pos[2]+j, channel)
+                        for i in 1:p_height, 
+                            j in 1:p_width)
+                    )
+                end
+            end
+
+        elseif layer_index == flatten_ind
+
+            @constraint(jump_model, [channel in 1:channels[layer_index-1], row in dims[layer_index-1][1]:-1:1, col in 1:dims[layer_index-1][2]], 
+                x[flatten_ind, row + (col-1)*dims[layer_index-1][1] + (channel-1)*prod(dims[layer_index-1])] == c[layer_index-1, row, col, channel]
+            )
+
+        elseif layer_index in dense_inds
+
+            weights = Flux.params(layer_data)[1]
+            biases = Flux.params(layer_data)[2]
+
+            n_neurons = length(biases)
+            n_previous = length(x[layer_index-1, :])
+
+            # compute heuristic bounds
+            if layer_index == minimum(dense_inds)
+                U_bounds_dense[layer_index] = [sum(max(weights[neuron, previous] * 1.0, 0.0) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
+                L_bounds_dense[layer_index] = [sum(min(weights[neuron, previous] * 1.0, 0.0) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
+            else
+                U_bounds_dense[layer_index] = [sum(max(weights[neuron, previous] * max(0, U_bounds_dense[layer_index-1][previous]), weights[neuron, previous] * max(0, L_bounds_dense[layer_index-1][previous])) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
+                L_bounds_dense[layer_index] = [sum(min(weights[neuron, previous] * max(0, U_bounds_dense[layer_index-1][previous]), weights[neuron, previous] * max(0, L_bounds_dense[layer_index-1][previous])) for previous in 1:n_previous) + biases[neuron] for neuron in 1:n_neurons]
+            end
+
+            if layer_data.σ == relu
+                for neuron in 1:n_neurons
+                    @constraint(jump_model, x[layer_index, neuron] >= 0)
+                    @constraint(jump_model, x[layer_index, neuron] <= U_bounds_dense[layer_index][neuron] * (1 - z[layer_index, neuron]))
+                    @constraint(jump_model, s[layer_index, neuron] <= -L_bounds_dense[layer_index][neuron] * z[layer_index, neuron])
+                    @constraint(jump_model, x[layer_index, neuron] - s[layer_index, neuron] == biases[neuron] + sum(weights[neuron, i] * x[layer_index-1, i] for i in 1:n_previous))
+                end
+            elseif layer_data.σ == identity
+                @constraint(jump_model, [neuron in 1:n_neurons], x[layer_index, neuron] == biases[neuron] + sum(weights[neuron, i] * x[layer_index-1, i] for i in 1:n_previous))
+            end
+        end
+    end
+end