Skip to content

Commit

Permalink
Merge pull request #3 from isaacsas/fitting_tutorial
Browse files Browse the repository at this point in the history
Fitting tutorial
  • Loading branch information
isaacsas authored Sep 10, 2024
2 parents 8b86dc0 + dedc947 commit 5a240de
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 10 deletions.
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ makedocs(
"Home" => "index.md",
"Forward Model" => "forward_simulation.md",
"Surrogate Construction" => "surrogate_construction.md",
"Fitting Data" => "fitting.md",
"API" => "SPRFitting_api.md"
],
warnonly = [:missing_docs]
Expand Down
105 changes: 105 additions & 0 deletions docs/src/fitting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Fitting to SPR Data
In this tutorial we will illustrate how to fit the surrogate model to SPR data to estimate $k_{\text{on}}$, $k_{\text{off}}$, $k_{\on,b}
serial on any computer, and our workflow for building large surrogates on
clusters (specific to the Grid Engine queuing system based cluster we use).

## Setup
We begin by installing the packages we need in a clean environment:
```julia
using Pkg

# create a new environment for the surrogate constructions
Pkg.activate("fitting_env")
Pkg.add(url="https://github.com/isaacsas/SPRFittingPaper2023.jl.git")
```

## Estimating Best Fit Parameters to an SPR Dataset
We wil work with the same SPR dataset as in the [Forward Model Simulation](@ref)
tutorial. As described in [1], our general approach will be to use the XNES
optimizer from BlackBoxOptim.jl to minimize the difference between the SPR data
and the surrogate's predicted SPR traces. XNES is a stochastic natural evolution
optimizer that we have found often gives the best fits among many optimizer
choices. We use this optimizer via Optimization.jl, which allows for the easy
selection of different optimizer choices. As XNES is stochastic, we will run the
parameter estimation process `nfits` times, and select parameters from the run
with the minimal loss as our final estimates.

We'll use the full surrogate from [1], which can be downloaded
[here](https://doi.org/10.6084/m9.figshare.26936854).

We being by defining some variables related to the fitting process:
```julia
nfits = 100 # how many fits to run and then take the minimum over
nsims = 250 # number of simulations to average when plotting
logCP_optrange = (1.0, 5.0) # allowable log-space range of CP values
```

Next we set the location of the surrogate:
```julia
surfile = "PATH_TO_DOWNLOADED_SURROGATE.jld"
```

We are now load the surrogate and SPR data:
```julia
# load the surrogate
surrogate = Surrogate(surfile)
sps = surrogate.surpars

# range of CP values to use
optpar_ranges = [logCP_optrange]

# load the aligned SPR data from input CSV file
datadir = joinpath(@__DIR__, "..", "..", "data")
aligned_data_fname = joinpath(datadir, "Data_FC4_10-05-22_Protein07_FD-11A_RBD-13.8_aligned.csv")
aligneddat = get_aligned_data(aligned_data_fname)
```
Finally, we now run the optimizer `nfits` times, taking as our parameter estimates the fit with the mimimal loss:
```julia
optsol, best_pars = fit_spr_data(surrogate, aligneddat, optpar_ranges)
for i in 2:nfits
optsol_new, best_pars_new = fit_spr_data(surrogate, aligneddat, optpar_ranges)
if optsol_new.minimum < optsol.minimum
optsol = optsol_new
best_pars = best_pars_new
end
end
```
We find the best fit parameters, `[kon, koff, konb, reach, CP]`, are
```julia
best_pars = [5.2859956892261876e-5, 0.04086857480653136, 0.7801815024260655, 31.898843844047246, 128.56923492479402]
```
We can now plot (and output) a figure comparing our fits to the SPR data
```julia
# number of simulations to average over in making the plot
simpars.nsims = nsims

# save a figure showing the fit to the SPR data
OUTDIR = tempdir()
figfile = joinpath(OUTDIR, "fit_curves.png")
visualisefit(optsol, aligneddat, surrogate, simpars, figfile)
```
Which gives

![spr_fit](./fitting_data/fit_curves.png)

Here the SPR data is shown in black and the average predicted responses in
color. Finally, we can save a spreadsheet with our fits and parameter estimates
via
```julia
curvefile = joinpath(OUTDIR, "parameters.xlsx")
savefit(optsol, aligneddat, surrogate, simpars, curvefile)
```
For this example the file [here](./fitting_data/parameters.xlsx_fit.xlsx) shows
the resulting spreadsheet. Note that the second sheet within it shows the
parameter estimates.

## General Workflow
A more detailed workflow that processes multiple SPR inputs, includes monovalent
fits, and systematically writes the output is presented in GIVE_LOCATION.


## Bibliography
1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek,
*Analysis of emergent bivalent antibody binding identifies the molecular
reach as a critical determinant of SARS-CoV-2 neutralisation potency*, in
review, [available on bioRxiv](https://www.biorxiv.org/content/10.1101/2023.09.06.556503v2) (2024).
Binary file added docs/src/fitting_data/fit_curves.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/fitting_data/parameters.xlsx_fit.xlsx
Binary file not shown.
25 changes: 15 additions & 10 deletions docs/src/forward_simulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ using Pkg
Pkg.activate("fwd_sim_env")
Pkg.add(url="https://github.com/isaacsas/SPRFittingPaper2023.jl.git")
Pkg.add("Plots")
Pkg.add("CSV""
Pkg.add("CSV")
```

## Running Forward Simulations
Expand Down Expand Up @@ -59,8 +59,10 @@ simpars = SimParams(; antigenconcen = biopars.antigenconcen, tstop, dt, N,
```
See the [`SimParams`](@ref) documentation for more on what the various arguments here mean.

Finally, for each antibody concentration we will run one forward simulation and
plot the ratio of the number of bound antibodies to the number of antigen, scaled by the fitted coefficient of proportionality, `CP`:
Finally, for each antibody concentration we will run `simpars.nsims` forward
simulations and plot the ratio of the average number of bound antibodies to the
number of antigen, scaled by the fitted coefficient of proportionality, `CP`.
Since we set no value for it, by default `simpars.nsims = 1000` are averaged:
```@example fwdsim
plt = plot(; xlabel = "times (sec)",
ylabel = "SPR Response")
Expand All @@ -75,15 +77,18 @@ end
plt
```
Note that here we only ran one simulation for each parameter set, and hence
`means` just returns the `CP` scaled number of antibodies bound to the surface at
Here `means` finalizes calculating the average number of bound antibodies across
the 1000 simulations that were run for each antibody concentration. It then
returns the `CP` scaled average number of antibodies bound to the surface at
each time in `tsave` divided by the number of antigen in the system (i.e. `N =
1000` here). If we had wanted to run and average multiple samples we could have
passed [`run_spr_sim!`](@ref) a terminator such as a
[`SPRFittingPaper2023.VarianceTerminator`](@ref) or
[`SPRFittingPaper2023.SimNumberTerminator`](@ref).
1000` here). See the [`run_spr_sim!`](@ref) docs and the docs for the
terminators [`SPRFittingPaper2023.VarianceTerminator`](@ref) or
[`SPRFittingPaper2023.SimNumberTerminator`](@ref) for more information on
terminating simulations in relation to the number / accuracy of the desired
averages.

Finally, let's load and plot the corresponding aligned SPR data that we originally estimated these parameters from to confirm the good fit.
Finally, let's load and plot the corresponding aligned SPR data that we
originally estimated these parameters from to confirm the good fit.
```@example fwdsim
datadir = joinpath(@__DIR__, "..", "..", "data")
fname = joinpath(datadir, "Data_FC4_10-05-22_Protein07_FD-11A_RBD-13.8_aligned.csv")
Expand Down

0 comments on commit 5a240de

Please sign in to comment.