diff --git a/doc/Constraints.md b/doc/Constraints.md
index 3a767f14..4d74a0bb 100644
--- a/doc/Constraints.md
+++ b/doc/Constraints.md
@@ -15,7 +15,9 @@ documentation is highly recommended.
The nonlinear least-squares system used by the Ceres Solver is written as:
-$$ \mathrm{arg\ min}_{x} \left(\frac{1}{2} \sum_i \rho_i \left(||f(x_{i_1},...,x_{i_k})||^2\right)\right)$$
+```math
+\mathrm{arg\ min}_{x} \left(\frac{1}{2} \sum_i \rho_i \left(||f(x_{i_1},...,x_{i_k})||^2\right)\right)
+```
In Ceres Solver parlance, ρ() is called a "loss function". f() is called a "cost function", which accepts one or
more inputs, x. And the inputs, x, are called "parameter blocks". The "parameter blocks" themselves may be
@@ -42,9 +44,9 @@ are acceptable.
The "cost function" is the main component of the Constraint object. It is responsible for computing the cost to be
minimized by the Ceres Solver optimizer. The cost function must implement some sort of equation to generate a score
for arbitrary input values. In its most generic form, that equation is written simply as:
-
-$$ \begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = f\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)$$
-
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = f\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)
+```
where f() is the cost function, x1 through xn are the input Variables, each of which may contain
multi-dimensional data, and ri are one or more dimensions of the computed costs. In Ceres Solver notation,
@@ -58,8 +60,9 @@ one or more outputs. However, in practice there are two common forms for cost fu
An observation model, sometimes called a sensor model, predicts a sensor measurement based on the current estimates
of the system Variables. The cost is then computed as the difference between the predicted sensor measurement and the
actual sensor measurement, normalized by the measurement uncertainty.
-
-$$ \begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}z_1 \\ ... \\ z_i\end{bmatrix} - h\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}$$
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}z_1 \\ ... \\ z_i\end{bmatrix} - h\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}
+```
where z is the sensor measured, h() is the sensor prediction function, and Σ is the covariance matrix. Within
the least-squares minimization, the entire cost function will get squared. By dividing by the square root of the
@@ -73,7 +76,9 @@ A state transition model, sometimes called a motion model, predicts the value of
current estimates of the system Variables. This is generally used to enforce a physical model of the system, such
as known vehicle kinematics.
-$$ \begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}x_{t_1} \\ ... \\ x_{t_i}\end{bmatrix} - f\left(\begin{bmatrix}x_{{t-1}_1} \\ ... \\ x_{{t-1}_i}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}$$
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}x_{t_1} \\ ... \\ x_{t_i}\end{bmatrix} - f\left(\begin{bmatrix}x_{{t-1}_1} \\ ... \\ x_{{t-1}_i}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}
+```
where xt is the current Variable estimate for time _t_, xt-1 is the current Variable estimate
for time _t-1_, f() is the state prediction function that implements the desired kinematic or dynamic model
@@ -246,10 +251,12 @@ modeled this way. Our cost function will follow the "observation model", where t
predict the sensor measurement, and the cost will be the different between the measured and the prediction normalized
by the measurement uncertainty.
-$$ \begin{bmatrix} \mathrm{cost}_1 \\ \mathrm{cost}_2 \\ \mathrm{cost}_3\end{bmatrix}
+```math
+\begin{bmatrix} \mathrm{cost}_1 \\ \mathrm{cost}_2 \\ \mathrm{cost}_3\end{bmatrix}
= \left(\begin{bmatrix}z_x \\ z_y \\ z_{yaw}\end{bmatrix}
- \begin{bmatrix}position_x \\ position_y \\ orientation_{yaw}\end{bmatrix}\right)
-\cdot \Sigma ^{-\frac{1}{2}}$$
+\cdot \Sigma ^{-\frac{1}{2}}
+```
We will make use of Ceres Solver's automatic derivative system to compute the Jacobians. For that to work, we must
implement the cost function equation as a functor object (has an `operator()` method). To compute the cost, our