Skip to content

Commit

Permalink
Merge pull request #13 from lanl/lauren_branch
Browse files Browse the repository at this point in the history
Added new ModelF for bigger data and added some more examples/tutorials
  • Loading branch information
lbeesleyBIOSTAT authored Apr 24, 2024
2 parents 4c4974e + a661872 commit 70cc44b
Show file tree
Hide file tree
Showing 7 changed files with 2,908 additions and 48 deletions.
611 changes: 611 additions & 0 deletions examples/ex_bayarri_discrepancy.ipynb

Large diffs are not rendered by default.

143 changes: 95 additions & 48 deletions examples/ex_friedman.ipynb

Large diffs are not rendered by default.

792 changes: 792 additions & 0 deletions examples/ex_shpb_clustering.ipynb

Large diffs are not rendered by default.

701 changes: 701 additions & 0 deletions examples/ex_shpb_hierarchical.ipynb

Large diffs are not rendered by default.

539 changes: 539 additions & 0 deletions examples/ex_shpb_pooled.ipynb

Large diffs are not rendered by default.

108 changes: 108 additions & 0 deletions examples/impala_options.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "73a88bcb-ffcb-4b96-90e4-b8422c627f3c",
"metadata": {},
"source": [
"# An Overview of IMPALA Workflow and Options\n",
"In this document, we provide a non-exhaustive list of the functionality and user options for IMPALA. Generally, IMPALA provides a codebase for calibrating computer model outputs/simulations to observed data. \n",
"\n",
"Estimation uses Bayesian Markov Chain Monte Carlo, which has been implemented using a sophisticated sampling method called parallel tempering. The parallel tempering sampling allows IMPALA to navigate very complicated posterior surfaces (see Friedman example), including surfaces with multiple local modes and/or non-identified parameters. \n",
"\n",
"IMPALA calibration analyses generally proceed through several basic steps:\n",
"\n",
"### Specify Simulator\n",
"\n",
"1. If the computer model you want to calibrate is fast to run, define a function f(theta; X) that takes calibration parameter theta as an input and spits out the computer model output\n",
"\n",
"2. If the computer model is slow to run, generate a library of simulator runs and fit an emulator, which takes theta as an input and spits out an approximation of the true computer model output\n",
"\n",
"3. IMPALA has several material strength models already defined (e.g., Preston-Tonks-Wallace and Johnson-Cook). Check the impala/physics folder for pre-defined material strength models already available in IMPALA. \n",
"\n",
"### Initialize Impala Model\n",
"\n",
"1. ModelMaterialStrength: class for pre-defined IMPALA material strength models\n",
"\n",
"2. ModelBassPca_func or ModelBassPca_mult: classes for functional or multivariate emulators fit using the pyBASS library\n",
" \n",
"3. ModelF: class for user-defined functions f(x), which can also be used to implement IMPALA with non-BASS emulators\n",
"\n",
"### Prepare the Fit\n",
"\n",
"1. CalibSetup: initializes an IMPALA calibration object. This is also where you specify parameter bounds and any constraints\n",
"\n",
"2. addVecExperiments: define the observed data, corresponding computer model, discrepancy basis (if any), and several noise model prior hyperparameters. Multiple addVecExperiments calls can be used to add different experiments, possibly with different corresponding computer models. Some inputs include:\n",
" * yobs: a vector (numpy array) of observed data\n",
" * model: an IMPALA model object as defined above. See code documentation for details. \n",
" * sd_est: a list or numpy array of initial values for observation noise standard deviation\n",
" * s2_df: a list or numpy array of initial values for s2 Inverse Gamma prior degrees of freedom\n",
" * s2_ind: a list or numpy array of indices for s2 value associated with each element of yobs\n",
" * meas_error_cor: (optional) correlation matrix for observation measurement errors, default = independent \n",
" * D: (optional) numpy array containing basis functions for discrepancy, possibly including intercept.\n",
" * discrep_tau: (optional) fixed prior variance for discrepancy basis coefficients (discrepancy = D @ discrep_vars, \n",
" * discrep_vars ~ N(0,discrep_tau))\n",
"\n",
"3. setTemperatureLadder: define how the parallel tempering should be implemented, requiring users to specify an array of exponents that will be applied to the data likelihood. An example specifcation is np.array(1.05 ** np.linspace(0,49,50)), which assigns a grid of 50 temperatures. Generally, more temperatures or a finer grid of temperatures are associated with longer runtime but may also be associated with better movement around the posterior surface for complicated posteriors. \n",
"\n",
"4. setMCMC: define how many MCMC iterations to use for the sampler. Most users can leave these settings at default values, with the expection of nmcmc (the number of iterations), which must be specified. \n",
"\n",
"5. setHierPriors: (optional) define hyperparameteters associated with the hierarchical and clustering calibrations. These generally control the amount of shrinkage toward a common theta across experiments. Please refer to the code documnetation for details. \n",
"\n",
"6. setClusterPriors: (optional) define hyperparameteters associated with the clustering calibration, including the maximum number of clusters (nclustmax) and the rate and shape associated with the Gamma prior on the Dirichlet process concentration parameter, eta. \n",
"\n",
"### Run MCMC\n",
"\n",
"1. calibPool: pooled calibration\n",
"\n",
"2. calibHier: hierarchical calibration\n",
"\n",
"3. calibClust: clustered calibration\n",
"\n",
"### Evaluate Convergence\n",
"\n",
"1. Look at trace plots, e.g., using parameter_trace_plot function\n",
"\n",
"2. Look at pairs plots, e.g., using pairs function\n",
"\n",
"3. There are many more convergence diagnostics you can explore. In the future, additional convergence evaluation functions will be added to the IMPALA repo. \n",
"\n",
"### Evaluate Model Fit\n",
"\n",
"1. Posterior predictions can be compared with the training data to evaluate goodness of fit of the calibrated computer model. See the examples elsewhere in the IMPALA repo for code. "
]
},
{
"cell_type": "markdown",
"id": "5678f566",
"metadata": {},
"source": [
"# Summary\n",
"The following figure summarizes the usual IMPALA fitting workflow: \n",
"\n",
"![something](./images/Impala_Diagram.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
62 changes: 62 additions & 0 deletions impala/superCal/models_withLik.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,68 @@ def discrep_sample(self, yobs, pred, cov, itemp): #Added by Lauren on 11/17/23.
return discrep_vars


#######
### ModelF_bigdata: Function for Simulator Model Evaluation or Evaluation of Alternative Emulator Model using Bigger Data
class ModelF_bigdata(AbstractModel):
""" Custom Simulator/Emulator Model """
def __init__(self, f, input_names, exp_ind=None, s2='gibbs'): # not sure if this is vectorized
"""
f : user-defined function taking single input with elements x[0] = first element of theta, x[1] = second element of theta, etc. Function must output predictions for all observations
input_names : list of the names of the inputs to bmod
s2 : method for handling experiment-specific noise s2; options are 'MH' (Metropolis-Hastings Sampling), 'fix' (fixed at s2_est from addVecExperiments call), and 'gibbs' (Gibbs sampling)
"""
self.mod = f
self.input_names = input_names
self.stochastic = False
self.yobs = None
self.meas_error_cor = 1.#np.diag(self.basis.shape[0])
if exp_ind is None:
exp_ind = np.array(0)
self.nexp = exp_ind.max() + 1
self.exp_ind = exp_ind
self.nd = 0
self.s2 = s2
self.vec = np.linspace(1,16200,16200) #define to speed up llik evaluation
self.vec2 = np.linspace(1,16200,16200) #define to speed up llik evaluation
self.inv = np.linspace(1,16200,16200) #define to speed up lik_cov_inv evaluation
self.m = np.linspace(1,16200,16200) #define to speed up discrep_sample evaluation
self.vmat = np.linspace(1,16200,16200) #define to speed up discrep_sample evaluation
return
def eval(self, parmat, pool = None, nugget=False):
parmat_array = np.vstack([parmat[v] for v in self.input_names]).T # get correct subset/ordering of inputs
if pool is True:
return np.apply_along_axis(self.mod, 1, parmat_array)
else:
nrep = list(parmat.values())[0].shape[0] // self.nexp
out_all = np.apply_along_axis(self.mod, 1, parmat_array)

#out_sub = np.concatenate([out_all[(i*nrep):(i*nrep+nrep), self.exp_ind==i] for i in range(self.nexp)], 1)
out_sub = np.concatenate([out_all[np.ix_(np.arange(i, nrep*self.nexp, self.nexp), np.where(self.exp_ind==i)[0])] for i in range(self.nexp)], 1)
return out_sub
# this is evaluating all experiments for all thetas, which is overkill
# need to have some way of dealing with non-pooled eval fo this and bassPCA version
def discrep_sample(self, yobs, pred, cov, itemp): #Added by Lauren on 11/17/23.
self.vec = cov['inv'].reshape(-1,1)* (yobs - pred).reshape(-1,1)
self.vmat = np.repeat(cov['inv'].reshape(-1,1),self.nd,axis=1)*self.D
S = np.linalg.inv(
#np.eye(self.nd) / self.discrep_tau #defined by addVecExperiments
np.diag(1/self.discrep_tau) #modification to allow vector-valued discrep_tau
+ self.D.T @ self.vmat
)
self.m = self.D.T @ self.vec
discrep_vars = sc.chol_sample((S @ self.m).flatten(), S/itemp)
return discrep_vars
def llik(self, yobs, pred, cov): # assumes diagonal cov
self.vec = yobs.flatten() - pred.flatten()
self.vec2 = self.vec*self.vec*cov['inv']
out = -.5 * cov['ldet'] - .5 * self.vec2.sum()
return out
def lik_cov_inv(self, s2vec): # default is diagonal covariance matrix
self.inv = 1/s2vec
ldet = np.log(s2vec).sum()
out = {'inv' : self.inv, 'ldet' : ldet}
return out



#######
Expand Down

0 comments on commit 70cc44b

Please sign in to comment.