Merge pull request #13 from lanl/lauren_branch

Added new ModelF for bigger data and added some more examples/tutorials
lanl · Apr 24, 2024 · 70cc44b · 70cc44b
2 parents 4c4974e + a661872
commit 70cc44b
Show file tree

Hide file tree

Showing 7 changed files with 2,908 additions and 48 deletions.
diff --git a/examples/ex_bayarri_discrepancy.ipynb b/examples/ex_bayarri_discrepancy.ipynb
diff --git a/examples/ex_friedman.ipynb b/examples/ex_friedman.ipynb
diff --git a/examples/ex_shpb_clustering.ipynb b/examples/ex_shpb_clustering.ipynb
diff --git a/examples/ex_shpb_hierarchical.ipynb b/examples/ex_shpb_hierarchical.ipynb
diff --git a/examples/ex_shpb_pooled.ipynb b/examples/ex_shpb_pooled.ipynb
diff --git a/examples/impala_options.ipynb b/examples/impala_options.ipynb
@@ -0,0 +1,108 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "73a88bcb-ffcb-4b96-90e4-b8422c627f3c",
+   "metadata": {},
+   "source": [
+    "# An Overview of IMPALA Workflow and Options\n",
+    "In this document, we provide a non-exhaustive list of the functionality and user options for IMPALA. Generally, IMPALA provides a codebase for calibrating computer model outputs/simulations to observed data. \n",
+    "\n",
+    "Estimation uses Bayesian Markov Chain Monte Carlo, which has been implemented using a sophisticated sampling method called parallel tempering. The parallel tempering sampling allows IMPALA to navigate very complicated posterior surfaces (see Friedman example), including surfaces with multiple local modes and/or non-identified parameters. \n",
+    "\n",
+    "IMPALA calibration analyses generally proceed through several basic steps:\n",
+    "\n",
+    "### Specify Simulator\n",
+    "\n",
+    "1. If the computer model you want to calibrate is fast to run, define a function f(theta; X) that takes calibration parameter theta as an input and spits out the computer model output\n",
+    "\n",
+    "2. If the computer model is slow to run, generate a library of simulator runs and fit an emulator, which takes theta as an input and spits out an approximation of the true computer model output\n",
+    "\n",
+    "3. IMPALA has several material strength models already defined (e.g., Preston-Tonks-Wallace and Johnson-Cook). Check the impala/physics folder for pre-defined material strength models already available in IMPALA. \n",
+    "\n",
+    "### Initialize Impala Model\n",
+    "\n",
+    "1. ModelMaterialStrength:  class for pre-defined IMPALA material strength models\n",
+    "\n",
+    "2. ModelBassPca_func or ModelBassPca_mult: classes for functional or multivariate emulators fit using the pyBASS library\n",
+    "    \n",
+    "3. ModelF: class for user-defined functions f(x), which can also be used to implement IMPALA with non-BASS emulators\n",
+    "\n",
+    "### Prepare the Fit\n",
+    "\n",
+    "1. CalibSetup: initializes an IMPALA calibration object. This is also where you specify parameter bounds and any constraints\n",
+    "\n",
+    "2. addVecExperiments: define the observed data, corresponding computer model, discrepancy basis (if any), and several noise model prior hyperparameters. Multiple addVecExperiments calls can be used to add different experiments, possibly with different corresponding computer models. Some inputs include:\n",
+    "    * yobs: a vector (numpy array) of observed data\n",
+    "    * model: an IMPALA model object as defined above. See code documentation for details. \n",
+    "    * sd_est: a list or numpy array of initial values for observation noise standard deviation\n",
+    "    * s2_df: a list or numpy array of initial values for s2 Inverse Gamma prior degrees of freedom\n",
+    "    * s2_ind: a list or numpy array of indices for s2 value associated with each element of yobs\n",
+    "    * meas_error_cor: (optional) correlation matrix for observation measurement errors, default = independent \n",
+    "    * D: (optional) numpy array containing basis functions for discrepancy, possibly including intercept.\n",
+    "    * discrep_tau: (optional) fixed prior variance for discrepancy basis coefficients (discrepancy = D @ discrep_vars, \n",
+    "    * discrep_vars ~ N(0,discrep_tau))\n",
+    "\n",
+    "3. setTemperatureLadder: define how the parallel tempering should be implemented, requiring users to specify an array of exponents that will be applied to the data likelihood. An example specifcation is np.array(1.05 ** np.linspace(0,49,50)), which assigns a grid of 50 temperatures. Generally, more temperatures or a finer grid of temperatures are associated with longer runtime but may also be associated with better movement around the posterior surface for complicated posteriors. \n",
+    "\n",
+    "4. setMCMC: define how many MCMC iterations to use for the sampler. Most users can leave these settings at default values, with the expection of nmcmc (the number of iterations), which must be specified. \n",
+    "\n",
+    "5. setHierPriors: (optional) define hyperparameteters associated with the hierarchical and clustering calibrations. These generally control the amount of shrinkage toward a common theta across experiments. Please refer to the code documnetation for details. \n",
+    "\n",
+    "6. setClusterPriors: (optional) define hyperparameteters associated with the clustering calibration, including the maximum number of clusters (nclustmax) and the rate and shape associated with the Gamma prior on the Dirichlet process concentration parameter, eta.  \n",
+    "\n",
+    "### Run MCMC\n",
+    "\n",
+    "1. calibPool: pooled calibration\n",
+    "\n",
+    "2. calibHier: hierarchical calibration\n",
+    "\n",
+    "3. calibClust: clustered calibration\n",
+    "\n",
+    "### Evaluate Convergence\n",
+    "\n",
+    "1. Look at trace plots, e.g., using parameter_trace_plot function\n",
+    "\n",
+    "2. Look at pairs plots, e.g., using pairs function\n",
+    "\n",
+    "3. There are many more convergence diagnostics you can explore. In the future, additional convergence evaluation functions will be added to the IMPALA repo. \n",
+    "\n",
+    "### Evaluate Model Fit\n",
+    "\n",
+    "1. Posterior predictions can be compared with the training data to evaluate goodness of fit of the calibrated computer model. See the examples elsewhere in the IMPALA repo for code. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5678f566",
+   "metadata": {},
+   "source": [
+    "# Summary\n",
+    "The following figure summarizes the usual IMPALA fitting workflow: \n",
+    "\n",
+    "![something](./images/Impala_Diagram.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/impala/superCal/models_withLik.py b/impala/superCal/models_withLik.py
@@ -491,6 +491,68 @@ def discrep_sample(self, yobs, pred, cov, itemp): #Added by Lauren on 11/17/23.
         return discrep_vars
 
 
+#######
+### ModelF_bigdata: Function for Simulator Model Evaluation or Evaluation of Alternative Emulator Model using Bigger Data 
+class ModelF_bigdata(AbstractModel):
+    """ Custom Simulator/Emulator Model """
+    def __init__(self, f, input_names, exp_ind=None, s2='gibbs'): # not sure if this is vectorized
+        """
+        f           : user-defined function taking single input with elements x[0] = first element of theta, x[1] = second element of theta, etc. Function must output predictions for all observations        
+        input_names : list of the names of the inputs to bmod
+        s2          : method for handling experiment-specific noise s2; options are 'MH' (Metropolis-Hastings Sampling), 'fix' (fixed at s2_est from addVecExperiments call), and 'gibbs' (Gibbs sampling)
+        """
+        self.mod = f
+        self.input_names = input_names
+        self.stochastic = False
+        self.yobs = None
+        self.meas_error_cor = 1.#np.diag(self.basis.shape[0])
+        if exp_ind is None:
+            exp_ind = np.array(0)
+        self.nexp = exp_ind.max() + 1
+        self.exp_ind = exp_ind
+        self.nd = 0
+        self.s2 = s2
+        self.vec = np.linspace(1,16200,16200) #define to speed up llik evaluation
+        self.vec2 = np.linspace(1,16200,16200) #define to speed up llik evaluation
+        self.inv = np.linspace(1,16200,16200) #define to speed up lik_cov_inv evaluation
+        self.m = np.linspace(1,16200,16200) #define to speed up discrep_sample evaluation
+        self.vmat = np.linspace(1,16200,16200) #define to speed up discrep_sample evaluation
+        return
+    def eval(self, parmat, pool = None, nugget=False):
+        parmat_array = np.vstack([parmat[v] for v in self.input_names]).T # get correct subset/ordering of inputs
+        if pool is True:
+            return np.apply_along_axis(self.mod, 1, parmat_array)
+        else:
+            nrep = list(parmat.values())[0].shape[0] // self.nexp
+            out_all = np.apply_along_axis(self.mod, 1, parmat_array)
+
+            #out_sub = np.concatenate([out_all[(i*nrep):(i*nrep+nrep), self.exp_ind==i] for i in range(self.nexp)], 1)
+            out_sub = np.concatenate([out_all[np.ix_(np.arange(i, nrep*self.nexp, self.nexp), np.where(self.exp_ind==i)[0])] for i in range(self.nexp)], 1)
+            return out_sub
+            # this is evaluating all experiments for all thetas, which is overkill
+        # need to have some way of dealing with non-pooled eval fo this and bassPCA version  
+    def discrep_sample(self, yobs, pred, cov, itemp): #Added by Lauren on 11/17/23.
+        self.vec = cov['inv'].reshape(-1,1)* (yobs - pred).reshape(-1,1)
+        self.vmat = np.repeat(cov['inv'].reshape(-1,1),self.nd,axis=1)*self.D
+        S = np.linalg.inv(
+            #np.eye(self.nd) / self.discrep_tau #defined by addVecExperiments
+            np.diag(1/self.discrep_tau) #modification to allow vector-valued discrep_tau
+           + self.D.T @ self.vmat
+            )
+        self.m = self.D.T @ self.vec
+        discrep_vars = sc.chol_sample((S @ self.m).flatten(), S/itemp)
+        return discrep_vars
+    def llik(self, yobs, pred, cov): # assumes diagonal cov
+        self.vec = yobs.flatten() - pred.flatten()
+        self.vec2 = self.vec*self.vec*cov['inv']
+        out = -.5 * cov['ldet'] - .5 * self.vec2.sum()
+        return out
+    def lik_cov_inv(self, s2vec): # default is diagonal covariance matrix
+        self.inv = 1/s2vec
+        ldet = np.log(s2vec).sum()
+        out = {'inv' : self.inv, 'ldet' : ldet}
+        return out
+
 
 
 #######