Multivariate Gaussian Density#

A random vector \(X \in \mathbb{R}^p\) with mean \(\mu\), covariance \(\Sigma\), and density

\[2\pi^{-\frac{p}{2}}(\det{\Sigma})^{-\frac{1}{2}}e^{-\frac{1}{2}(x - \mu)^\top\Sigma^{-1}(x - \mu)}\]

is said to follow a multivariate normal (Gaussian) distribution. To evaluate this density, one needs to compute:

  • The (square root) determinant: \((\det{\Sigma})^{-\frac{1}{2}}\)

  • The quadratic form \((x - \mu)^\top\Sigma^{-1}(x - \mu)\)

Without the sweep operator, both of these terms require special computational care. The sweep operator permits straightforward calculation of both terms when we construct

\[\begin{split}\begin{pmatrix} \Sigma & x - \mu\\ x^\top - \mu^\top & 0 \end{pmatrix}\end{split}\]

and sweep on the diagonal entries of \(\Sigma\). This operation will result in the following:

\[\begin{split}\begin{pmatrix} -\Sigma^{-1} & \Sigma^{-1}(x - \mu)\\ (x - \mu)^t\Sigma^{-1} & -(x - \mu)^t\Sigma^{-1}(x - \mu) \end{pmatrix}\end{split}\]

and in the process also accumulate \(\det(\Sigma)\). Thus, we can evaluate the MVN density in-place without any additional allocation.

[1]:
import sweepystats as sw
import numpy as np
from scipy.linalg import toeplitz

Loglikelihood#

To evaluate the likelihood of \(X = (X_1 = x_1,..., X_p = x_p)\), we initialize the Normal class and call the loglikelihood function.

[2]:
# instantiate a Normal
p = 5
mu = np.zeros(p)
sigma = toeplitz(np.array([0.5**i for i in range(p)]))
d = sw.Normal(mu, sigma)

Here we simulated

\[\begin{split}\Sigma = \begin{pmatrix} 1 & \rho & \rho^2 & ... & \rho^{p-1}\\ \rho & 1 & \rho & ... & \rho^{p-2}\\ \rho^2 & \rho & 1 & ... & \rho^{p-2}\\ \vdots & & & \ddots & \vdots\\ \rho^{p-1} & ... & & \rho & 1 \end{pmatrix}\end{split}\]

with \(\rho = 0.5\). Now we can ask what is the likelihood that \(X = (X_1 = x_1,..., X_p = x_p)\)?

[3]:
X = np.random.normal(size=p)
d.loglikelihood(X)
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 28610.53it/s]
[3]:
np.float64(-7.372282461672331)

We can check answer with scipy:

[4]:
from scipy.stats import multivariate_normal
multivariate_normal.logpdf(X, mean=mu, cov=sigma)
[4]:
np.float64(-7.372282461672331)

Conditional distributions#

Let \(X \sim N(\mu, \Sigma)\). If \(X = (Y, Z)\) and

\[\begin{split}X = \begin{pmatrix} Y \\ Z \end{pmatrix} \sim N\left( \begin{pmatrix} \mu_Y\\ \mu_Z \end{pmatrix}, \begin{pmatrix} \Sigma_Y & \Sigma_{YZ}\\ \Sigma_{ZY} & \Sigma_{Z} \end{pmatrix}\right)\end{split}\]

then the following classical formulas hold

\[E(Z | Y = y) = \mu_Z + \Sigma_{ZY}\Sigma^{-1}_Y(y - \mu_Y)\]
\[Var(Z | Y = y) = \Sigma_Z - \Sigma_{ZY}\Sigma^{-1}_Y\Sigma_{YZ}.\]

These quantities be easily computed via the sweep operation. Consider the matrix:

\[\begin{split}\begin{pmatrix} \Sigma_Y & \Sigma_{YZ} & \mu_Y - y\\ \Sigma_{ZY} & \Sigma_{Z} & \mu_Z\\ \mu_Y^\top - y^\top & \mu_Z^t & 0 \end{pmatrix}.\end{split}\]

Sweeping on the diagonals of \(\Sigma_Y\) yields:

\[\begin{split}\begin{pmatrix} -\Sigma_Y^{-1} & \Sigma_Y^{-1}\Sigma_{YZ} & \Sigma_Y^{-1}(\mu_Y - y)\\ \Sigma_{ZY}\Sigma_Y^{-1} & \Sigma_Z - \Sigma_{ZY}\Sigma_Y^{-1}\Sigma_{YZ} & \mu_Z + \Sigma_{ZY}\Sigma_Y^{-1}(y - \mu_Y)\\\ (\mu_Y^\top - y^\top)\Sigma_Y^{-1} & \mu_Z^\top + (y^\top - \mu_Y^\top)\Sigma_Y^{-1}\Sigma_{YZ} & -(\mu_Y - y^\top)\Sigma_Y^{-1}(\mu_Y - y) \end{pmatrix}.\end{split}\]

Thus, the conditional expection and variance becomes immediately available, again without needing to allocate anything new.

Conditional expectations#

We wish to evaluate

\[E(Z | Y = y) = \mu_Z + \Sigma_{ZY}\Sigma^{-1}_Y(y - \mu_Y)\]
[23]:
# instantiate a Normal
p = 6
mu = np.zeros(p)
sigma = toeplitz(np.array([0.5**i for i in range(p)]))
d = sw.Normal(mu, sigma)

Suppose \(p=4\) and we condition on the first 2 elements. What is the conditional mean for the remaining 2 elements?

[24]:
y = np.random.normal(2, size=(2,))
yidx = [0, 1]
d.cond_mean(y, yidx)
[24]:
array([0.66050197, 0.33025098, 0.16512549, 0.08256275])
[25]:
# check answers with brute-force implementation
mu_Y, mu_Z = np.zeros(2), np.zeros(p - len(yidx))
sigma_Y = sigma[0:2, 0:2]
sigma_Z = sigma[2:, 2:]
sigma_ZY = sigma[2:, 0:2]
mu_Z + sigma_ZY @ np.linalg.inv(sigma_Y) @ (y - mu_Y)
[25]:
array([0.66050197, 0.33025098, 0.16512549, 0.08256275])
Tip: yidx does NOT have to be contiguous!
[38]:
p = 6
y = np.random.normal(2, size=(2,))
yidx = [0, 3]
d.cond_mean(y, yidx)
[38]:
array([1.72322558, 1.95738227, 1.58511505, 0.79255753])
[39]:
# check answers with brute-force implementation
zidx = [1, 2, 4, 5]
mu_Y, mu_Z = np.zeros(2), np.zeros(p - len(yidx))
sigma_Y = sigma[np.ix_(yidx, yidx)]
sigma_Z = sigma[np.ix_(zidx, zidx)]
sigma_ZY = sigma[np.ix_(zidx, yidx)]
mu_Z + sigma_ZY @ np.linalg.inv(sigma_Y) @ (y - mu_Y)
[39]:
array([1.72322558, 1.95738227, 1.58511505, 0.79255753])

Conditional Variance#

We wish to evaluate

\[Var(Z | Y = y) = \Sigma_Z - \Sigma_{ZY}\Sigma^{-1}_Y\Sigma_{YZ}.\]
[46]:
# instantiate a Normal
p = 6
mu = np.zeros(p)
sigma = toeplitz(np.array([0.5**i for i in range(p)]))
d = sw.Normal(mu, sigma)

To compute conditional variance:

[47]:
y = np.random.normal(2, size=(2,))
yidx = [0, 2]
d.cond_var(y, yidx)
[47]:
array([[0.6     , 0.      , 0.      , 0.      ],
       [0.      , 0.75    , 0.375   , 0.1875  ],
       [0.      , 0.375   , 0.9375  , 0.46875 ],
       [0.      , 0.1875  , 0.46875 , 0.984375]])
[48]:
# check answers with brute-force implementation
zidx = [1, 3, 4, 5]
sigma_Y = sigma[np.ix_(yidx, yidx)]
sigma_Z = sigma[np.ix_(zidx, zidx)]
sigma_ZY = sigma[np.ix_(zidx, yidx)]
sigma_YZ = sigma[np.ix_(yidx, zidx)]
sigma_Z - sigma_ZY @ np.linalg.inv(sigma_Y) @ sigma_YZ
[48]:
array([[0.6     , 0.      , 0.      , 0.      ],
       [0.      , 0.75    , 0.375   , 0.1875  ],
       [0.      , 0.375   , 0.9375  , 0.46875 ],
       [0.      , 0.1875  , 0.46875 , 0.984375]])

Of course, for conditional variance computation, the vector zidx need not be contiguous as well.