Skip to content

Latest commit

 

History

History
495 lines (285 loc) · 11.9 KB

latex_equations.md

File metadata and controls

495 lines (285 loc) · 11.9 KB

Sebastian Raschka

Last updated 07/12/2014


Table of contents


I frequently embed all sorts of equations in my IPython notebooks, but instead of re-typing them every time, I thought that it might be worthwhile to have a copy&paste-ready equation glossary at hand.

Since I recently had to work without internet connection, I decided compose this in a MathJax-free manner.

For example, if you want to use those equations in a IPython notebook markdown cell, simply Y$-signs, e.g.,

$\mu = 2$

or prepend /begin{equation} and append /end{equation}






Bayes Theorem

[back to top]

  • Naive Bayes' classifier:

    • posterior probability:

      P(\omega_j|x) = \frac{p(x|\omega_j) \cdot P(\omega_j)}{p(x)}
    

      \Rightarrow \text{posterior probability} = \frac{ \text{likelihood}  \cdot \text{prior probability}}{\text{evidence}}
    
    • decision rule:

      \text{Decide } \omega_1  \text{ if }  P(\omega_1|x) > P(\omega_2|x)  \text{ else decide } \omega_2 .
      
      \frac{p(x|\omega_1) \cdot P(\omega_1)}{p(x)} > \frac{p(x|\omega_2) \cdot P(\omega_2)}{p(x)}
    
    • objective functions:

      g_1(\pmb x) = P(\omega_1 | \; \pmb{x}), \quad  g_2(\pmb{x}) = P(\omega_2 | \; \pmb{x}), \quad  g_3(\pmb{x}) = P(\omega_2 | \; \pmb{x})
    

      \quad g_i(\pmb{x}) = \pmb{x}^{\,t} \bigg( - \frac{1}{2} \Sigma_i^{-1} \bigg) \pmb{x} + \bigg( \Sigma_i^{-1} \pmb{\mu}_{\,i}\bigg)^t \pmb x + \bigg( -\frac{1}{2} \pmb{\mu}_{\,i}^{\,t}  \Sigma_{i}^{-1} \pmb{\mu}_{\,i} -\frac{1}{2} ln(|\Sigma_i|)\bigg)
    


Binomial distribution

[back to top]

  • Probability density function:

p_k = {n \choose x} \cdot p^k \cdot (1-p)^{n-k}


Co-Variance

[back to top]

S_{xy} = \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})

example covariance matrix:

\pmb{\Sigma_1} = 
\begin{bmatrix}1 & 0 & 0 \\
0 & 1\ &0\\
0 & 0 & 1
\end{bmatrix}


Eigenvector and Eigenvalue

[back to top]

\pmb A\pmb{v} =  \lambda\pmb{v}\\\\

\text{where} \\\\

\pmb A = S_{W}^{-1}S_B\\
\pmb{v} = \text{Eigenvector}\\
\lambda = \text{Eigenvalue}


Least-squares fit regression

[back to top]

  • Linear equation

f(x) = a\cdot x + b

Slope:

a = \frac{S_{x,y}}{\sigma_{x}^{2}}\quad

Y-axis intercept:

b = \bar{y} - a\bar{x}\quad

where

S_{xy} = \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})\quad \text{(covariance)} \\
\sigma{_x}^{2} = \sum_{i=1}^{n} (x_i - \bar{x})^2\quad \text{(variance)}	


  • Matrix equation

\pmb X \; \pmb a = \pmb y

\pmb X \; \pmb a = \pmb y

\Bigg[ \begin{array}{cc}
x_1 & 1  \\
... & 1 \\
x_n & 1  \end{array} \Bigg]$
$\bigg[ \begin{array}{c}
a  \\
b \end{array} \bigg]$
$=\Bigg[ \begin{array}{c}
y_1   \\
...  \\
y_n  \end{array} \Bigg]

\pmb a = (\pmb X^T \; \pmb X)^{-1} \pmb X^T \; \pmb y


Linear Discriminant Analysis

[back to top]

  • In-between class scatter matrix

S_W = \sum\limits_{i=1}^{c} S_i \\\\

\text{where}  \\\\

S_i = \sum\limits_{\pmb x \in D_i}^n (\pmb x - \pmb m_i)\;(\pmb x - \pmb m_i)^T
 \text{  (scatter matrix for every class)} \\\\

\text{and} \\\\
  
\pmb m_i = \frac{1}{n_i} \sum\limits_{\pmb x \in D_i}^n \; \pmb x_k   \text{ (mean vector)}


  • Between class scatter matrix

S_B = \sum\limits_{i=1}^{c} (\pmb m_i - \pmb m) (\pmb m_i - \pmb m)^T


Maximum Likelihood Estimate

[back to top]

The probability of observing the data set

D = \left\{ \pmb x_1, \pmb x_2,..., \pmb x_n \right\} 

can be pictured as probability to observe a particular sequence of patterns,
where the probability of observing a particular patterns depends on θ, the parameters the underlying (class-conditional) distribution. In order to apply MLE, we have to make the assumption that the samples are i.i.d. (independent and identically distributed).

p(D\; | \;  \pmb \theta\;) \\\\
= p(\pmb x_1 \; | \; \pmb \theta\;)\; \cdot \; p(\pmb x_2 \; | \;\pmb \theta\;) \; \cdot \;...  \; p(\pmb x_n \; | \; \pmb \theta\;) \\\\
= \prod_{k=1}^{n} \; p(\pmb x_k \pmb \; | \; \pmb \theta \;)

Where θ is the parameter vector, that contains the parameters for a particular distribution that we want to estimate.

and p(D | θ) is also called the likelihood of θ.

  • log-likelihood

p(D|\theta) = \prod_{k=1}^{n} p(x_k|\theta) \\
\Rightarrow l(\theta) = \sum_{k=1}^{n} ln \; p(x_k|\theta)
  • Differentiation

\nabla_{\pmb \theta} \equiv \begin{bmatrix}  
\frac{\partial \; }{\partial \; \theta_1} \\
\frac{\partial \; }{\partial \; \theta_2} \\
...\\
\frac{\partial \; }{\partial \; \theta_p}\end{bmatrix}

\nabla_{\pmb \theta} l(\pmb\theta) \equiv \begin{bmatrix}  
\frac{\partial \; L(\pmb\theta)}{\partial \; \theta_1} \\
\frac{\partial \; L(\pmb\theta)}{\partial \; \theta_2} \\
...\\
\frac{\partial \; L(\pmb\theta)}{\partial \; \theta_p}\end{bmatrix}$
$= \begin{bmatrix}  
0 \\
0 \\
...\\
0\end{bmatrix}
  • parameter vector

\pmb \theta_i = \bigg[ \begin{array}{c}
\ \theta_{i1} \\
\ \theta_{i2} \\
\end{array} \bigg]=
\bigg[ \begin{array}{c}
\pmb \mu_i \\
\pmb \Sigma_i \\
\end{array} \bigg]


Min-Max scaling

[back to top]

X_{norm} = \frac{X - X_{min}}{X_{max}-X_{min}}


Normal distribution (multivariate)

[back to top]

  • Probability density function

p(\pmb x) \sim N(\pmb \mu|\Sigma)\\\\

p(\pmb x) \sim \frac{1}{(2\pi)^{d/2} \; |\Sigma|^{1/2}} \exp \bigg[ -\frac{1}{2}(\pmb x - \pmb \mu)^t \Sigma^{-1}(\pmb x - \pmb \mu) \bigg]


Normal distribution (univariate)

[back to top]

  • Probability density function

p(x) \sim N(\mu|\sigma^2) \\\\

p(x) \sim \frac{1}{\sqrt{2\pi\sigma^2}} \exp{ \bigg[-\frac{1}{2}\bigg( \frac{x-\mu}{\sigma}\bigg)^2 \bigg] } $


Parzen window function

[back to top]

\phi(\pmb u) = \Bigg[ \begin{array}{ll} 1 & \quad |u_j| \leq 1/2 \; ;\quad \quad j = 1, ..., d \\
0 & \quad \text{otherwise} \end{array} 

for a hypercube of unit length 1 centered at the coordinate system's origin. What this function basically does is assigning a value 1 to a sample point if it lies within 1/2 of the edges of the hypercube, and 0 if lies outside (note that the evaluation is done for all dimensions of the sample point).

If we extend on this concept, we can define a more general equation that applies to hypercubes of any length hn that are centered at x:

k_n = \sum\limits_{i=1}^{n} \phi \bigg( \frac{\pmb x - \pmb x_i}{h_n} \bigg)\\\\

\text{where}\\\\

\pmb u = \bigg( \frac{\pmb x - \pmb x_i}{h_n} \bigg)
  • probability density estimation with hypercube kernel

p_n(\pmb x) = \frac{1}{n} \sum\limits_{i=1}^{n} \frac{1}{h^d} \phi \bigg[ \frac{\pmb x - \pmb x_i}{h_n} \bigg]

\text{where}\\\\   
h^d = V_n\quad   \text{and}    \quad\phi \bigg[ \frac{\pmb x - \pmb x_i}{h_n} \bigg] = k
  • probability density estimation with Gaussian kernel

p_n(\pmb x) = \frac{1}{n} \sum\limits_{i=1}^{n} \frac{1}{h^d} \phi \Bigg[ \frac{1}{(\sqrt {2 \pi})^d h_{n}^{d}} \exp \; \bigg[ -\frac{1}{2} \bigg(\frac{\pmb x - \pmb x_i}{h_n} \bigg)^2 \bigg] \Bigg]


Population mean

[back to top]

\mu = \frac{1}{N} \sum_{i=1}^N x_i

example mean vector:

\pmb{\mu_1} = 
\begin{bmatrix}0\\0\\0\end{bmatrix}


Poisson distribution (univariate)

[back to top]

  • Probability density function

p(x|\theta) = \frac{e^{-\theta}\theta^{xk}}{x_k!}


Principal Component Analysis

[back to top]

  • Scatter matrix

S = \sum\limits_{k=1}^n (\pmb x_k - \pmb m)\;(\pmb x_k - \pmb m)^T

where

\pmb m = \frac{1}{n} \sum\limits_{k=1}^n \; \pmb x_k \text{   (mean vector)}


Rayleigh distribution (univariate)

[back to top]

  • Probability density function

p(x|\theta) =  \Bigg\{ \begin{array}{c}
  2\theta xe^{- \theta x^2},\quad \quad x \geq0, \\
  0,\quad \text{otherwise.} \\
  \end{array}


Standard deviation

[back to top]

\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}



Variance

[back to top]

\sigma{_x}^{2} = \sum_{i=1}^{n} (x_i - \bar{x})^2\quad


Z-score

[back to top]

z = \frac{x - \mu}{\sigma}