多元高斯分布

作者: 引线小白-本文永久链接:http://www.limoncc.com/概率论/2017-01-09-多元高斯分布/
知识共享许可协议: 本博客采用署名-非商业-禁止演绎4.0国际许可证

本文主要总结了多元高斯分布的若干基本问题,和我自己的一些体会。欢迎大家留言讨论,如有错误,请批评指正。

一、一元高斯分布

$$\begin{align}
x\sim\mathcal{N}(x\mid \mu,\sigma^2)
=\left(2\pi\right)^{-1/2}(\sigma^2)^{-1/2}\exp\left[-\frac{1}{2}(x-\mu)^2\sigma^{-2}\right]
\end{align}$$

1、一元高斯分布特征函数

我们有特征函数: $\displaystyle \varphi(t)=\mathrm{E}\left(\mathrm{e}^{\mathrm{i}tx}\right)=\int_{-\infty}^{+\infty}\mathrm{e}^{\mathrm{i}tx}p(x)\mathrm{d}x=\mathrm{e}^{i t\mu-\frac{1}{2}t^2\sigma^2}$。
$$\begin{align}
\varphi(t)=\exp \left[\mathrm{i} t\mu-\frac{1}{2}t^2\sigma^2\right]
\end{align}$$
下面我们来证明这一点:

令 $\displaystyle z=\frac {x-\mu}{\sigma}$,于是有 $\displaystyle z\sim\mathcal{N}\left(z\mid 0,1\right)=\left(2\pi\right)^{-1/2}\exp \left[-\frac{1}{2}z^2\right]$:
$$\begin{align}
\varphi_z(t)
&=\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty}\mathrm{e}^{\mathrm{i}tz}\mathrm{e}^{-\frac{1}{2}z^2}\mathrm{d}z
\end{align}$$

1、我们知道虚数 $\displaystyle \mathbb{z}=\mathrm{e}^{\mathrm{i}\theta}=\cos(\theta)+\mathrm{i}\sin(\theta)$, 虚数的模 $\displaystyle \left| \mathbb{z}\right|=1$。令 $\displaystyle A(t)=\mathrm{e}^{\mathrm{i}tz}\mathrm{e}^{-\frac{1}{2}z^2}$于是有:
$$\left|\frac{\partial{A}}{\partial{t}}\right|=\left|\mathrm{i}z\mathrm{e}^{\mathrm{i}tz-\frac{1}{2}z^2}\right|=\left|z\right|\mathrm{e}^{-\frac{1}{2}z^2} $$
而且有:
$$\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty} \left|\frac{\partial{A}}{\partial{t}}\right|\mathrm{d}z=\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty}\left|z\right|\mathrm{e}^{-\frac{1}{2}z^2}\mathrm{d}z<\infty$$由于 $\displaystyle \int_{-\infty}^{+\infty}\left|z\right|\mathrm{e}^{-\frac{1}{2}z^2}\mathrm{d}z$收敛:故由含参反常积分一致收敛的可微性质知函数 $\displaystyle \left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty} \frac{\partial{A}}{\partial{t}}\mathrm{d}z$关于 $\displaystyle t\in (-\infty,+\infty)$上一致收敛。所以我们可以在 $\displaystyle \varphi_z(t)$ 的积分号下求导(交换积分与求导顺序) :
$$\varphi’_z(t)=\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty} \mathrm{i}z\mathrm{e}^{\mathrm{i}tz-\frac{1}{2}z^2}\mathrm{d}z $$
2、现在对上式进行分布积分: $\displaystyle\begin{cases} u=\mathrm{e}^{\mathrm{i}tz}\\v=-\mathrm{e}^{-\frac{1}{2}z^2}\end{cases} $,同时 $\displaystyle \begin{cases} u’=\mathrm{i}t\mathrm{e}^{\mathrm{i}tz}\\v’=z\mathrm{e}^{-\frac{1}{2}z^2}\end{cases} $ 于是有:
$$\begin{align}
\varphi’_z(t)&=i\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty} u\mathrm{d}v\\
&= \left.-i\left(2\pi\right)^{-1/2}\mathrm{e}^{\mathrm{i}tz-\frac{1}{2}z^2}\right| _{x=0}-t\left(2\pi\right)^{-1/2}\int_{-\infty}^{+\infty}\mathrm{e}^{\mathrm{i}tz}\mathrm{e}^{-\frac{1}{2}z^2}\mathrm{d}z\\
&=-t\varphi_z(t)
\end{align}$$

得到微分方程:
$$\begin{cases}
\varphi’_z(t)+t\varphi_z(t)=0\\
\varphi_z(0)=1
\end{cases}$$
解得:
$$\begin{align}
\varphi_z(t)=\mathrm{e}^{-\frac{1}{2}t^2}
\end{align}$$
又因为: $\displaystyle x=\mu+\sigma z$
$$\begin{align}
\varphi_x(t)=\mathrm{E}\left(\mathrm{e}^{\mathrm{i}tx}\right)
=\mathrm{E}\left(\mathrm{e}^{\mathrm{i}t\mu+\mathrm{i}t\sigma z}\right)
=\mathrm{e}^{\mathrm{i}t\mu}\mathrm{E}\left(\mathrm{e}^{\mathrm{i}t\sigma z}\right)
=\mathrm{e}^{\mathrm{i}t\mu}\varphi_z(\sigma t)
=\mathrm{e}^{\mathrm{i}t\mu-\frac{1}{2}t^2\sigma^2}
\end{align}$$

二、多元高斯分布

1、多元高斯分布

我们知道一维的特征函数为:
$$\begin{align}
\varphi(t)=\exp \left[\mathrm{i}t\mu -\frac{1}{2}t^2\sigma^2\right]
\end{align}$$

多元情况:

$$\begin{align}
\bm{x}\sim\mathcal{N}(\bm{x}\mid\bm{\mu},\bm{\varSigma})=(2\pi)^{-k/2}\left|\bm{\varSigma}\right|^{-1/2}\exp\left[-\frac{1}{2}(\bm{x}-\bm{\mu})^\mathrm{T}\bm{\varSigma}^{-1}(\bm{x}-\bm{\mu}) \right]
\end{align}$$

2、马哈拉诺比斯变换定理

我们有任意高斯分布 $\displaystyle \bm{x}\sim\mathcal{N}\left(\bm{\mu},\bm{\varSigma}\right)$。我们称 $\displaystyle \bm{y}=\bm{\varSigma}^{-\frac{1}{2}}\left[\bm{x}-\bm{\mu}\right]$为马哈拉诺比斯变换。其中
$$\begin{align}
\bm{y}\sim\mathcal{N}\left(\bm{0},\bm{E}_p\right)
\end{align}$$
也就是说 $\displaystyle y_i$是标准高斯分布 $\displaystyle \mathcal{N}\left(0,1\right)$。

证明:
知道:
$$\begin{align}
p(\bm{x})=(2\pi)^{-\frac{k}{2}}\left|\bm{\varSigma}\right|^{-\frac{1}{2}}\exp\left[-\frac{1}{2}(\bm{x}-\bm{\mu})^\mathrm{T}\bm{\varSigma}^{-1}(\bm{x}-\bm{\mu}) \right]
\end{align}$$
同时有: $\displaystyle \bm{x}=\bm{\varSigma}^{\frac{1}{2}}\bm{y}+\bm{\mu}$。 $\displaystyle \bm{J}=\mathrm{det}\left[\frac{\partial \bm{x}}{\partial \bm{y}^\text{T}}\right]=\left|\bm{\varSigma}\right|^{\frac{1}{2}}$
有变量代换定理有:
$$\begin{align}
p(\bm{y})=(2\pi)^{-\frac{k}{2}}\exp \left[-\frac{1}{2}\bm{y}^\text{T}\bm{y}\right]
\end{align}$$
证毕。

独立随机变量联合分布特征函数等于这些随机变量的特征函数之积。于是我们有
$$\begin{align}
\varphi_y(t)=\exp \left[-\frac{1}{2}t^2\right]\to\varphi_{\bm{y}}(\bm{t})=\prod_{i=1}^p\exp \left[-\frac{1}{2}t_i^2\right]=\exp \left[-\frac{1}{2}\bm{t}^\text{T}\bm{t}\right]
\end{align}$$

3、多元高斯分布特征函数

接着我们使用特征函数的线性变换性质有:
$$\begin{align}
\varphi_{\bm{x}}(\bm{t})
&=\mathrm{E}\bigg[\exp \left[\mathrm{i}\bm{t}^\text{T}\bm{x}\right]\bigg]=\mathrm{E}\bigg[\exp \left[\mathrm{i}\bm{t}^\text{T}\left(\bm{\varSigma}^{1/2}\bm{y}+\bm{\mu}\right)\right]\bigg]\\
&=\exp \left[\mathrm{i}\bm{t}^\text{T}\bm{\mu}\right]\varphi_{\bm{y}}\bigg(\big[\bm{\varSigma}^{1/2}\big]^\text{T}\bm{t}\bigg)=\exp \left[\mathrm{i}\bm{t}^\text{T}\bm{\mu}\right]\exp \left[-\frac{1}{2}\bm{t}^\text{T}\bm{\varSigma}^{1/2}\bm{\varSigma}^{1/2}\bm{t}\right]\\
&=\exp \left[\mathrm{i}\bm{t}^\text{T}\bm{\mu}-\frac{1}{2}\bm{t}^\text{T}\bm{\varSigma}\bm{t}\right]
\end{align}$$

三、多元高斯分布的性质

1、高斯随机向量的任意边缘依然是高斯分布

有随机向量 $\displaystyle \bm{x}$是 $\displaystyle k$维的,且 $\displaystyle \bm{x}\sim \mathcal{N}\big(\bm{\mu},\bm{\varSigma}\big)$ 现在我们从 $\displaystyle \{x_i\}_{i=1}^k$中任意选取 $\displaystyle p$个元素,令 $\displaystyle s:p\mapsto k$,则向量 $\displaystyle \tilde{\bm{x}}=[x_{s_1},x_{s_2}\cdots x_{s_p}]^\text{T}$仍然是高斯分布:
$$\begin{align}
\tilde{\bm{x}}\sim\mathcal{N}\big(\tilde{\bm{\mu}},\tilde{\bm{\varSigma}}\big)
\end{align}$$
其中: $\displaystyle \tilde{\bm{\mu}}=[\mu_{s_1},\mu_{s_2}\cdots \mu_{s_p}]^\text{T}$, $\displaystyle \tilde{\bm{\varSigma}}=[c_{ij}],i,j\in\{s\mid s{p}\}$即保留 $\displaystyle \bm{\varSigma}$的第 $\displaystyle s_1,s_2\cdots s_p$行和列的 $\displaystyle p$阶矩阵。

证明:
我们有 $\displaystyle \bm{x}$的特征函数 $\displaystyle \varphi_{\bm{x}}(\bm{t})
=\exp \left[\mathrm{i}\bm{t}^\text{T}\bm{\mu}-\frac{1}{2}\bm{t}^\text{T}\bm{\varSigma}\bm{t}\right]$,我们令 $\displaystyle t_i=0,i\in \{s\mid k \lnot s(p)\}$有:
$$\begin{align}
\varphi_{\tilde{\bm{x}}}(\tilde{\bm{t}})
=\exp \left[\mathrm{i}\tilde{\bm{t}}^\text{T}\tilde{\bm{\mu}}-\frac{1}{2}\tilde{\bm{t}}^\text{T}\tilde{\bm{\varSigma}}\tilde{\bm{t}}\right]
\end{align}$$
故而得证。

2、独立性与相关性等价

有$\displaystyle \bm{x}_1\sim \mathcal{N}\big(\bm{\mu}_1,\bm{\varSigma}_1\big)$和 $\displaystyle \bm{x}_2\sim \mathcal{N}\big(\bm{\mu}_2,\bm{\varSigma}_2\big)$,我们有$\displaystyle \bm{x}_1\bot \bm{x}_2\iff \mathrm{cov}\big[\bm{x}_1,\bm{x}_2\big]=0$,且有$\displaystyle \bm{x}=[\bm{x}_1,\bm{x}_2]^\text{T}$服从: $\displaystyle \bm{x}\sim\mathcal{N}\big(\bm{\mu}, \bm{\varSigma}\big)$。其中: $\displaystyle \bm{\mu}=[\bm{\mu}_1,\bm{\mu}_2]^\text{T},\bm{\varSigma}=\begin{bmatrix} \bm{\varSigma}_1 & \bm{0}\\\bm{0}&\bm{\varSigma}_2 \end{bmatrix}$。

证明
利用
1、特征函数的唯一性定理
2、独立随机变量联合分布的特征函数是它们特征函数之积。证明是显然的。
$$\begin{align}
\displaystyle \bm{x}_1\bot \bm{x}_2\iff \mathrm{cov}\big[\bm{x}_1,\bm{x}_2\big]=0
\end{align}$$

3、仿射(线性)变换不变性

有 $\displaystyle \bm{x}\sim\mathcal{N}\big(\bm{\mu},\bm{\varSigma}\big)$, 仿射变换 $\displaystyle \bm{y}=\bm{A}\bm{x}+\bm{b}$,则 $\displaystyle \bm{y}\sim\mathcal{N}\big(\bm{A}\bm{\mu}+\bm{b},\bm{A}\bm{\varSigma}\bm{A}^\text{T}\big) $

证明:
由特征函数的仿射变换性质有:
$$\begin{align}
\varphi_{\bm{y}}\big(\bm{t}\big)
&=\exp\big[\mathrm{i}\bm{t}^\text{T}\bm{b}\big]\varphi_{\bm{x}}\big(\bm{A}’\bm{t}\big)=\exp\big[\mathrm{i}\bm{t}^\text{T}\bm{b}+\mathrm{i}\big(\bm{A}^\text{T}\bm{t}\big)^\text{T}\bm{\mu}-\frac{1}{2}\bm{t}^\text{T}\bm{A}\bm{\varSigma}\bm{A}^\text{T}\bm{t}\big]\\
&=\exp\bigg[\mathrm{i}\bm{t}^\text{T}\big[\bm{A}\bm{\mu}+\bm{b}\big]–\frac{1}{2}\bm{t}^\text{T}\bm{A}\bm{\varSigma}\bm{A}^\text{T}\bm{t}\bigg]
\end{align}$$
又由特征函数唯一性定理知:

$$\begin{align}
\bm{y}\sim\mathcal{N}\big(\bm{A}\bm{\mu}+\bm{b},\bm{A}\bm{\varSigma}\bm{A}^\text{T}\big)
\end{align}$$
【推论1】:分解:多元高斯分布随机向量都可以经过仿射变换为独立随机变量,且它们是标准高斯随机变量。
【推论2】:降维:随机向量的线性组合是高斯分布 $\displaystyle \iff$随机向量服从高斯分布

推论2的性质颇为惊奇,我们来推导一下,以窥细节:
证明:
【充分性】
若有一维高斯分布 :
$$\begin{align}
y\sim\mathcal{N}\big(\bm{a}^\text{T}\bm{x},\bm{a}^\text{T}\bm{\varSigma}\bm{a}\big)
\end{align}$$
知其特征函数为:
$$\begin{align}
\varphi_y(t)=\exp\big[\mathrm{i}t\bm{a}^\text{T}\bm{x}-\frac{1}{2}t^2\bm{a}^\text{T}\bm{\varSigma}\bm{a}\big]
\end{align}$$
现在对其观察角度加以变换:令 $\displaystyle t=1$同时把 $\displaystyle \bm{a}$看成任意有:
$$\begin{align}
\varphi_{\bm{x}}\big(\bm{a}\big)=\mathrm{E}\big[\mathrm{i}\bm{a}^\text{T}\bm{x}\big]=\mathrm{E}\big[\mathrm{i}y\big]=\varphi_y\big(1\big)=\exp\big[\mathrm{i}\bm{a}^\text{T}\bm{x}-\frac{1}{2}\bm{a}^\text{T}\bm{\varSigma}\bm{a}\big]
\end{align}$$
由此可见 $\displaystyle \bm{x}\sim\mathcal{N}\big(\bm{\mu},\bm{\varSigma}\big)$
【必要性】
若有 $\displaystyle \bm{x}\sim\mathcal{N}\big(\bm{\mu},\bm{\varSigma}\big)$,则 $\displaystyle y=\bm{a}^\text{T}\bm{x}$的特征函数为:
$$\begin{align}
\varphi_y(t)=\varphi_{\bm{x}}\big(t \bm{a}\big)=\exp\big[\mathrm{i}t\bm{a}^\text{T}\bm{x}-\frac{1}{2}t^2\bm{a}^\text{T}\bm{\varSigma}\bm{a}\big]
\end{align}$$
由此可见 $\displaystyle y\sim\mathcal{N}\big(\bm{a}^\text{T}\bm{x},\bm{a}^\text{T}\bm{\varSigma}\bm{a}\big)$
证毕

4、条件分布

我们考虑如下分布 $\displaystyle \begin{bmatrix} \bm{x}_1 \ \bm{x}_2 \end{bmatrix}\sim\mathcal{N}\bigg(\begin{bmatrix} \bm{\mu}_1 \ \bm{\mu}_2 \end{bmatrix},\begin{bmatrix} \bm{\varSigma}_{11} & \bm{\varSigma}_{12}\\\bm{\varSigma}_{21}&\bm{\varSigma}_{22} \end{bmatrix}\bigg)$, 那么条件分布 $\displaystyle \bm{x}_2\mid \bm{x}_1$依然是高斯分布:

$$\begin{align}
\bm{x}_2\mid \bm{x}_1\sim\mathcal{N}\bigg(\bm{\mu}_2+\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\big[\bm{x}_1- \bm{\mu}_1\big],\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}\bigg)
\end{align}$$

证明:
1、为了利用独立性与相关性等价的结论,我们使用线性变换来构造两个独立的新随机变量:

$$\begin{align}
\bm{y}_1&=\bm{x}_1\\
\bm{y}_2&=\bm{T}\bm{x}_1+\bm{x}_2
\end{align}$$

欲使 $\displaystyle \bm{y}_1\bot \bm{y}_2$,则:
$$\begin{align}
\mathrm{cov}\big[\bm{y}_1,\bm{y_2}\big]
&=\mathrm{E}\Bigg[\bigg(\bm{y}_1- \mathrm{E}\big[\bm{y}_1\big]\bigg)\bigg(\bm{y}_2- \mathrm{E}\big[\bm{y}_2\big]\bigg)^\text{T}\Bigg]\\
&=\mathrm{E}\Bigg[\bigg(\bm{x}_1- \bm{\mu}_1\bigg)\bigg(\bm{T}\bm{x}_1+\bm{x}_2- \bm{T}\bm{\mu}_1- \bm{\mu}_2\bigg)^\text{T}\Bigg]\\
&=\mathrm{E}\Bigg[\bigg(\bm{x}_1- \bm{\mu}_1\bigg)\bigg(\bm{T}\big(\bm{x}_1- \bm{\mu}_1\big)+\bm{x}_2- \bm{\mu}_2\bigg)^\text{T}\Bigg]\\
&=\mathrm{E}\Big[\big(\bm{x}_1- \bm{\mu}_1\big)\big(\bm{x}_1- \bm{\mu}_1\big)^\text{T}\Big]\bm{T}^\text{T}+\mathrm{E}\Big[\big(\bm{x}_1- \bm{\mu}_1\big)\big(\bm{x}_2- \bm{\mu}_2\big)^\text{T}\Big]\\
&=\bm{\varSigma}_{11}\bm{T}^\text{T}+\bm{\varSigma}_{12}=\bm{0}
\end{align}$$
于是有:
$$\begin{align}
\bm{T}=-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}
\end{align}$$
也就是说:
$$\begin{align}
\bm{y}_2=-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{x}_1+\bm{x}_2
\end{align}$$
这个线性变换是:
$$\begin{align}
\begin{bmatrix} \bm{y}_1 \\\ \bm{y}_2 \end{bmatrix}=\begin{bmatrix} \bm{I} & \bm{0}\\ -\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}&\bm{I}\end{bmatrix}\begin{bmatrix} \bm{x}_1 \\\\\bm{x}_2\end{bmatrix}
\end{align}$$
2、根据数字特征的性质,我们求出 $\displaystyle \bm{y}_1,\bm{y}_2$的期望和方差,由高斯分布的性质知道经过线性变换后的它们也是服从高斯分布的,从而我们可以确定其分布。容易知道:$\displaystyle \mathrm{E}\big[\bm{y}_1\big]=\bm{\mu}_2$,$\displaystyle \mathrm{cov}\big[\bm{y}_1\big]=\bm{\varSigma}_{11}$,$\displaystyle \mathrm{E}\big[\bm{y}_2\big]=\bm{\mu}_2-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\mu}_1$。下面关键是求:
$$\begin{align}
\mathrm{cov}\big[\bm{y}_2\big]
&=\mathrm{E}\Bigg[\bigg(\bm{y}_2- \mathrm{E}\big[\bm{y}_2\big]\bigg)\bigg(\bm{y}_2- \mathrm{E}\big[\bm{y}_2\big]\bigg)^\text{T}\Bigg]\\
&=\mathrm{E}\Bigg[\bigg(\bm{x}_2- \bm{\mu}_2+\bm{T}\big(\bm{x}_1- \bm{\mu}_1\big)\bigg)\bigg(\bm{x}_2- \bm{\mu}_2+\bm{T}\big(\bm{x}_1- \bm{\mu}_1\big)\bigg)^\text{T}\Bigg]\\
&=\bm{\varSigma}_{22}+\bm{\varSigma}_{21}\bm{T}^\text{T}+\bm{T}\bm{\varSigma}_{12}+\bm{T}\bm{\varSigma}_{11}\bm{T}^\text{T}\\
&=\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}+\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{11}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}\\
&=\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}
\end{align}$$

于是有:
$$\begin{align}
\bm{y}_2\sim\mathcal{N}\bigg(\bm{\mu}_2-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\mu}_1,\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}\bigg)
\end{align}$$

3、并且我们有: $\displaystyle \bm{J}\big(\bm{y}\to \bm{x}\big)=\bigg|\frac{\partial \bm{y}}{\partial \bm{x}^\text{T}}\bigg|=1$,再有 $\displaystyle \bm{y}_1,\bm{y}_2$独立,于是:
$$\begin{align}
p\big(\bm{x}_1,\bm{x}_2\big)
=p\big(\bm{y}_1,\bm{y}_2\big)\bm{J}\big(\bm{y}\to \bm{x}\big)
=p\big(\bm{y}_1\big)p\big(\bm{y}_2\big)
\end{align}$$

现在我们可以求得 $\displaystyle \bm{x}_2\mid \bm{x}_1$的分布:
$$\begin{align}
p\big(\bm{x}_2\mid \bm{x}_1\big)=\frac{p\big(\bm{x}_1,\bm{x}_2\big)}{p\big(\bm{x}_1\big)}=\frac{p\big(\bm{y}_1\big)p\big(\bm{y}_2\big)}{p\big(\bm{y}_1\big)}=p\big(\bm{y}_2\big)=p_{\bm{y}_2}\big(\bm{x}_2- \bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{x}_1\big)
\end{align}$$
代入到密度函数,经过简单变换有:
$$\begin{align}
\bm{x}_2\mid \bm{x}_1\sim\mathcal{N}\bigg(\bm{\mu}_2+\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\big[\bm{x}_1- \bm{\mu}_1\big],\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}\bigg)
\end{align}$$
条件数学期望和协方差矩阵是:
$$\begin{align}
\bm{\mu}_{2\mid1}&=\mathrm{E}\big[\bm{x}_2\mid \bm{x}_1\big]=\bm{\mu}_2+\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\big[\bm{x}_1- \bm{\mu}_1\big]\\
\bm{\varSigma}_{2\mid 1}&=\mathrm{cov}\big[\bm{x}_2\mid \bm{x}_1\big]=\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}
\end{align}$$
证毕

当然我们也可以通过概率归一约束,和配平方加以推导。

5、高斯线性模型

若: $\displaystyle p(\bm{x})=\mathcal{N}\big(\bm{\mu}_{\bm{x}},\bm{\varSigma}_{\bm{x}}\big)$,另外有: $\displaystyle p\big(\bm{y}\mid \bm{x}\big)=\mathcal{N}\big(\bm{A}\bm{x}+\bm{b},\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big)$,则有:
$$\begin{align}
p\big(\bm{x}\mid \bm{y}\big)=\mathcal{N}\big(\bm{\mu}_{\bm{x}\mid \bm{y}},\bm{\varSigma}_{\bm{x}\mid \bm{y}}\big)
\end{align}$$
其中
$\displaystyle \bm{\varSigma}_{\bm{x}\mid \bm{y}}^{-1}=\bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{A}$

$\displaystyle \bm{\mu}_{\bm{x}\mid \bm{y}}=
\bm{\varSigma}_{\bm{x}\mid \bm{y}}\big[\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\big(\bm{y}-\bm{b}\big)+\bm{\varSigma}_{\bm{x}}^{-1} \bm{\mu}_{\bm{x}}\big]
$

还有:
$$\begin{align}
p\big(\bm{y}\big)=\mathcal{N}\big(\bm{A}\bm{\mu}_{\bm{x}}+\bm{b},\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\big)
\end{align}$$

证明:
要想彻底说明这个问题,我们需要分块矩阵的若干引理:
首先我们来证明一个引理

1、【引理1】单位矩阵引理:

$$\begin{align}
[\bm{E}-\bm{S}]^{-1}-[\bm{S}^{-1}-\bm{E}]^{-1}=\bm{E}
\end{align}$$
证明:
$$\begin{align}
&[\bm{E}-\bm{S}]^{-1}-[\bm{S}^{-1}-\bm{E}]^{-1}=\bm{E} \\
&\iff\bm{S}[\bm{E}-\bm{S}]^{-1}-\bm{S}[\bm{S}^{-1}-\bm{E}]^{-1}=\bm{S}\\
&\iff\big[\bm{S}^{-1}-\bm{S}\bm{S}^{-1}\big]^{-1}-\bm{S}[\bm{S}^{-1}-\bm{E}]^{-1}=\bm{S}\\
&\iff[\bm{E}-\bm{S}][\bm{S}^{-1}-\bm{E}]^{-1}=\bm{S}\\
&\iff\bm{S}^{-1}[\bm{E}-\bm{S}][\bm{S}^{-1}-\bm{E}]^{-1}=\bm{S}^{-1}\bm{S}\\
&\iff[\bm{S}^{-1}-\bm{E}][\bm{S}^{-1}-\bm{E}]^{-1}=\bm{E}\\
&\iff\bm{E}=\bm{E}
\end{align}$$
证毕

2、【引理2】分块矩阵的逆

证明:
若分块矩阵的逆矩阵存在,:
$$\begin{align}
\bm{H}^{-1}=\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]^{-1}
\end{align}$$

1、 【$\displaystyle \bm{A}$若可以逆】
首先我们左乘一个矩阵,消除 $\displaystyle \bm{B} $

$$\begin{align}
\left[\begin{array}{cc}\bm{E} & -\bm{B}\bm{D}^{-1} \\\bm{0} & \bm{E}\end{array}\right]
\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]
= \left[\begin{array}{cc}\bm{A}-\bm{B}\bm{D}^{-1}\bm{C} & \bm{0} \\\bm{C} & \bm{D}\end{array}\right]
\end{align}$$

第二步我们右乘一个矩阵,消除 $\displaystyle \bm{C} $

$$\begin{align}
\left[\begin{array}{cc}\bm{E} & -\bm{B}\bm{D}^{-1} \\\bm{0} & \bm{E}\end{array}\right]
\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]
\begin{bmatrix} \bm{E} & \bm{0} \\-\bm{D}^{-1}\bm{C} & \bm{E}\end{bmatrix}
= \left[\begin{array}{cc}\bm{A}-\bm{B}\bm{D}^{-1}\bm{C} & \bm{0} \\\bm{0} & \bm{D}\end{array}\right]
\end{align}$$

简写
$\displaystyle \bm{U}\bm{H}\bm{V}=\bm{W} $,于是 $\displaystyle\bm{H}^{-1}=\bm{V}\bm{W}^{-1}\bm{U} $(这是显然的)

$$\begin{align}
\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]^{-1}
=
\begin{bmatrix} \bm{E} & \bm{0} \\-\bm{D}^{-1}\bm{C} & \bm{E} \end{bmatrix}
\left[\begin{array}{cc}[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1} & \bm{0} \\\bm{0} & \bm{D}^{-1}\end{array}\right]
\left[\begin{array}{cc}\bm{E} & -\bm{B}\bm{D}^{-1} \\\bm{0} & \bm{E}\end{array}\right]
\end{align}$$
简化得:
$$\begin{align}
\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]^{-1}
&= \left[\begin{array}{cc}[
\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1}
&-[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1} \bm{B}\bm{D}^{-1}
\\
-\bm{D}^{-1}\bm{C}[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1}
& \bm{D}^{-1}+\bm{D}^{-1}\bm{C}[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1}\bm{B}\bm{D}^{-1}
\end{array}\right]\\
&=\left[\begin{array}{cc}
\bm{M}
&- \bm{M} \bm{B}\bm{D}^{-1}
\\
-\bm{D}^{-1}\bm{C}\bm{M}
& \bm{D}^{-1}+\bm{D}^{-1}\bm{C}\bm{M}\bm{B}\bm{D}^{-1}
\end{array}\right]
\end{align}$$

其中 $\displaystyle \bm{M}=[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1}$。

2、【同理若 $\displaystyle \bm{B}$可逆】
$$\begin{align}
\left[\begin{array}{cc}\bm{A} & \bm{B} \\\bm{C} & \bm{D}\end{array}\right]^{-1}
&= \left[\begin{array}{cc}
\bm{A}^{-1}+ \bm{A}^{-1}\bm{B}[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}\bm{C}\bm{A}^{-1}
&-\bm{B}\bm{D}^{-1}[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}
\\
-[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}\bm{D}^{-1}\bm{C}
& [\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}
\end{array}\right]\\
&=\left[\begin{array}{cc}
\bm{A}^{-1}+\bm{A}^{-1}\bm{B}\bm{M}\bm{C}\bm{A}^{-1}
&- \bm{B}\bm{D}^{-1}\bm{M}
\\
-\bm{M} \bm{D}^{-1}\bm{C}
&\bm{M}
\end{array}\right]
\end{align}$$

其中 $\displaystyle \bm{M}= [\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}$

3、【引理3】分块矩阵行列式:

由引理2,消除 $\displaystyle \bm{B}$后的结论知道:
$$\begin{align}
\det \begin{bmatrix} \bm{E} & -\bm{B}\bm{D}^{-1} \\\bm{0} & \bm{E}\end{bmatrix}\det \begin{bmatrix} \bm{A} & \bm{B}\\\bm{C}&\bm{D} \end{bmatrix}=\det \begin{bmatrix} \bm{A}-\bm{B}\bm{D}^{-1}\bm{C} & \bm{0}\\\bm{C}&\bm{D} \end{bmatrix}=\det[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]\det[\bm{D}]
\end{align}$$

我们也可以这样(也就是说右乘矩阵消除 $\displaystyle \bm{B}$):

$$\begin{align}
\det \begin{bmatrix} \bm{A} & \bm{B}\\\bm{C}&\bm{D} \end{bmatrix}\det \begin{bmatrix} \bm{E} & -\bm{A}^{-1}\bm{B} \\\bm{0} & \bm{E}\end{bmatrix}
=\det \begin{bmatrix} \bm{A} & \bm{0}\\\bm{C}&\bm{D}-\bm{C}\bm{A}^{-1}\bm{B} \end{bmatrix}=\det[\bm{A}]\det[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]
\end{align}$$

也就是说:
$$\begin{align}
\begin{vmatrix} \bm{A} & \bm{B}\\\bm{C}&\bm{D}\end{vmatrix}=\big|\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}\big|\big|\bm{D}\big|=\big|\bm{A}\big|\big|\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}\big|
\end{align}$$

我们也有
$$\begin{align}
\left|\bm{A}-\bm{B}\bm{D}^{-1}\bm{C} \right|=\left|\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}\right|\left|\bm{D}^{-1}\right|\big|\bm{A}\big|
\end{align}$$

4、【引理4】维度变换

$\displaystyle [\bm{A}+\bm{B}\bm{C}\bm{D}]^{-1}=\bm{A}^{-1}- \bm{A}^{-1}\bm{B}[\bm{C}^{-1}+\bm{D}\bm{A}^{-1}\bm{B}]^{-1}\bm{D}\bm{A}^{-1}$

$\displaystyle \bm{M}=[\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1} =\bm{A}^{-1}+ \bm{A}^{-1}\bm{B}[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1}\bm{C}\bm{A}^{-1} $

$\displaystyle [\bm{A}-\bm{B}\bm{D}^{-1}\bm{C}]^{-1} \bm{B}\bm{D}^{-1}=\bm{A}^{-1}\bm{B}[\bm{D}-\bm{C}\bm{A}^{-1}\bm{B}]^{-1} $
证明:
我们挑一个做说明,为了应用【引理1】我们右乘 $\displaystyle \bm{A}$于是有:
$$\begin{align}
&[\bm{A}+\bm{B}\bm{C}\bm{D}]^{-1}=\bm{A}^{-1}- \bm{A}^{-1}\bm{B}[\bm{C}^{-1}+\bm{D}\bm{A}^{-1}\bm{B}]^{-1}\bm{D}\bm{A}^{-1}\\
&\iff[\bm{A}+\bm{B}\bm{C}\bm{D}]^{-1}\bm{A}=\bm{A}^{-1}- \bm{A}^{-1}\bm{B}[\bm{C}^{-1}+\bm{D}\bm{A}^{-1}\bm{B}]^{-1}\bm{D}\bm{A}^{-1}\bm{A}\\
&\iff[\bm{E}+\bm{A}^{-1}\bm{B}\bm{C}\bm{D}]^{-1}=\bm{E}-\bm{A}^{-1}\bm{B}[\bm{C}^{-1}+\bm{D}\bm{A}^{-1}\bm{B}]^{-1}\bm{D}\\
&\iff[\bm{E}+\bm{A}^{-1}\bm{B}\bm{C}\bm{D}]^{-1}=\bm{E}-[\bm{D}^{-1}\bm{C}^{-1}\bm{B}^{-1}\bm{A}+\bm{E}]^{-1}\\
&\iff\big[\bm{E}-[-\bm{A}^{-1}\bm{B}\bm{C}\bm{D}]\big]^{-1}-\big[[-\bm{A}^{-1}\bm{B}\bm{C}\bm{D}]^{-1}-\bm{E}\big]=\bm{E}\,,\bm{S}=-\bm{A}^{-1}\bm{B}\bm{C}\bm{D}\\
&\iff[\bm{E}-\bm{S}]^{-1}-[\bm{S}^{-1}-\bm{E}]^{-1}=\bm{E}
\end{align}$$
当然我们也可以直接对比引理2的两个结论得出。
证毕

5、【引理5】高斯线性回归模型的联合分布

我们知道 $\displaystyle p(\bm{x},\bm{y})=p(\bm{x})p(\bm{y}\mid \bm{x})=\mathcal{N}\big(\bm{\mu}_{\bm{x}},\bm{\varSigma}_{\bm{x}}\big)\mathcal{N}\big(\bm{A}\bm{x}+\bm{b},\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big)$,现在令: $\displaystyle \bm{z}=\begin{bmatrix} \bm{x} \\\bm{y} \end{bmatrix}$ 我们取对数:
$$\begin{align}
&\ln p(\bm{z})
=\ln p(\bm{x})+\ln p(\bm{y}\mid \bm{x})\\
&=-\frac{1}{2}\big(\bm{x}-\bm{\mu}_{\bm{x}}\big)^\mathrm{T}\bm{\varSigma}_{\bm{x}}^{-1}\big(\bm{x}-\bm{\mu}_{\bm{x}}\big)
-\frac{1}{2}\big(\bm{y}-[\bm{A}\bm{x}+\bm{b}]\big)^\mathrm{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\big(\bm{y}-[\bm{A}\bm{x}+\bm{b}]\big)+\\
&\ln\bigg[(2\pi)^{-(k+s)/2}\big|\bm{\varSigma}_{\bm{x}}\big|^{-1/2}\big|\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big|^{-1/2}\bigg]\\
&=-\frac{1}{2}\Bigg(\begin{bmatrix} \bm{x} \\\ \bm{y} \end{bmatrix}-\begin{bmatrix} \bm{\mu}_{\bm{x}} \\\ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix}\Bigg)^\text{T}
\begin{bmatrix} \bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{A}&-\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\\\ -\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{A}&\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1} \end{bmatrix}
\Bigg(\begin{bmatrix} \bm{x} \\\ \bm{y} \end{bmatrix}-\begin{bmatrix} \bm{\mu}_{\bm{x}} \\\ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix}\Bigg)\\
&+\ln\bigg[(2\pi)^{-(k+s)/2}\big|\bm{\varSigma}_{\bm{x}}\big|^{-1/2}\big|\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big|^{-1/2}\bigg]\\
&=-\frac{1}{2}\Bigg(\begin{bmatrix} \bm{x} \\\ \bm{y} \end{bmatrix}-\begin{bmatrix} \bm{\mu}_{\bm{x}} \\\ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix}\Bigg)^\text{T}
\begin{bmatrix}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\\\bm{A}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T} \end{bmatrix}^{-1}
\Bigg(\begin{bmatrix} \bm{x} \\\ \bm{y} \end{bmatrix}-\begin{bmatrix} \bm{\mu}_{\bm{x}} \\\ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix}\Bigg)\\
&+\ln\bigg[(2\pi)^{-(k+s)/2}\big|\bm{\varSigma}_{\bm{x}}\big|^{-1/2}\big|\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big|^{-1/2}\bigg]\\
&=\ln\mathcal{N}\Bigg(\begin{bmatrix} \bm{\mu}_{\bm{x}} \ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix},\begin{bmatrix}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\\\bm{A}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T} \end{bmatrix}\Bigg)
\end{align}$$
其中我们使用了一些二次型的技巧:简要说明一下,以免显得突兀,其中 $\displaystyle \bm{S}$是对称矩阵,有二次型:
$$\begin{align}
Q
&=\frac{1}{2}(\bm{x}-\bm{\mu})^\text{T}\bm{S}(\bm{x}-\bm{\mu})
=\frac{1}{2}\bm{x}^\text{T}\bm{S}\bm{x}-\bm{x}^\text{T}\bm{S}\bm{\mu}+\frac{1}{2}\bm{\mu}^\text{T}\bm{S}\bm{\mu}\\
&=\frac{1}{2}\bm{x}^\text{T}\bm{A}\bm{x}-\bm{x}^\text{T}\bm{B}+\frac{1}{2}\bm{\mu}^\text{T}\bm{A}\bm{\mu}
\end{align}$$
其中:
$$\begin{align}
\bm{S}&=\bm{A}\\
\bm{\mu}&=\bm{A}^{-1}\bm{B}
\end{align}$$
也就是说我们可以通过观测一次项、二次项的系数来求得参数。

考察二次项:
$$\begin{align}
&-\frac{1}{2}\bm{x}^\text{T}[\bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}]\bm{x}-\frac{1}{2}\bm{y}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{y}+\frac{1}{2}\bm{y}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}\bm{x}+\frac{1}{2}\bm{x}^\text{T}\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{y}\\
&=-\frac{1}{2}\left[\begin{array}{c}\bm{x}\\\bm{y}\end{array}\right]^\text{T}\left[\begin{array}{cc}\bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}&-\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\\\ -\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}&\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\end{array}\right]\left[\begin{array}{c}\bm{x}\\\bm{y}\end{array}\right]
\end{align}$$
考虑一次项
$$
\begin{align}
\bm{x}^\text{T}\bm{\varSigma}_{\bm{x}}^{-1}\bm{\mu}_{\bm{x}}
=\left[\begin{array}{c}\bm{x}\\\bm{y}\end{array}\right]^\text{T}
\left[\begin{array}{c}\bm{\varSigma}_{\bm{x}}^{-1}\bm{\mu}_{\bm{x}}-\bm{A}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{b}\\\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{b}\end{array}\right]
\end{align}
$$
这样再通过分块矩阵求逆,和二次型的特点可以求得:
$$\begin{align}
\bm{\mu}_{\bm{x}}&=\begin{bmatrix} \bm{\mu}_{\bm{x}} \ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix}\\
\bm{\varSigma}_{\bm{z}}&=\begin{bmatrix}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\\\bm{A}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T} \end{bmatrix}
\end{align}$$
同时我们通过分块矩阵行列式注意到:
$$\begin{align}
\big|\bm{\varSigma}_{\bm{z}}\big|=\big|\bm{\varSigma}_{\bm{x}}\big|\big|\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big|
\end{align}$$
以上就是联合分布是高斯分布的推导细节。

【高斯线性回归模型的联合分布】
若: $\displaystyle p(\bm{x})=\mathcal{N}\big(\bm{\mu}_{\bm{x}},\bm{\varSigma}_{\bm{x}}\big)$,另外有: $\displaystyle p\big(\bm{y}\mid \bm{x}\big)=\mathcal{N}\big(\bm{A}\bm{x}+\bm{b},\bm{\varSigma}_{\bm{y}\mid \bm{x}}\big)$,令: $\displaystyle \bm{z}=\begin{bmatrix} \bm{x} \\\bm{y} \end{bmatrix}$ 则:
$$\begin{align}
\bm{z}\sim\mathcal{N}\Bigg(\begin{bmatrix} \bm{\mu}_{\bm{x}} \\\ \bm{A}\bm{\mu}_{\bm{x}}+\bm{b} \end{bmatrix},\begin{bmatrix}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\\\bm{A}\bm{\varSigma}_{\bm{x}}&\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T} \end{bmatrix}\Bigg)
\end{align}$$
其中:
$\displaystyle \bm{\varLambda}_{\bm{z}}=
\begin{bmatrix} \bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}&-\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\\\ -\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1}\bm{A}&\bm{\varSigma}_{\bm{y}\mid\bm{x}}^{-1} \end{bmatrix}$

###### 6、【引理6】精度矩阵与协方差矩阵
再看出显然之前我们有必要叙述一精度矩阵的概念: 精度矩阵是协方差矩阵的逆
$$\begin{align}
\bm{\varLambda}=\bm{\varSigma}^{-1}=\begin{bmatrix} \bm{\varLambda}_{11} & \bm{\varLambda}_{12} \\\bm{\varLambda}_{21}&\bm{\varLambda}_{22}\end{bmatrix}
\end{align}$$
若 $\displaystyle \bm{\varSigma}_{11},\bm{\varSigma}_{22}$可以逆,根据上述引理我们容易知道:
$\displaystyle \bm{\varLambda}_{11}=\big[\bm{\varSigma}_{11}-\bm{\varSigma}_{12}\bm{\varSigma}_{22}^{-1}\bm{\varSigma}_{21}\big]^{-1}$
$\displaystyle \bm{\varLambda}_{22}=\big[\bm{\varSigma}_{22}-\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}\bm{\varSigma}_{12}\big]^{-1}$
$\displaystyle \bm{\varLambda}_{12}=-\bm{\varLambda}_{11}\bm{\varSigma}_{12}\bm{\varSigma}_{22}^{-1}$
$\displaystyle \bm{\varLambda}_{21}=-\bm{\varLambda}_{22}\bm{\varSigma}_{21}\bm{\varSigma}_{11}^{-1}$

有了精度矩阵,我们可以重写条件分布定理:

若 $\displaystyle \begin{bmatrix} \bm{x}_1 \ \bm{x}_2 \end{bmatrix}\sim\mathcal{N}\bigg(\begin{bmatrix} \bm{\mu}_1 \ \bm{\mu}_2 \end{bmatrix},\begin{bmatrix} \bm{\varSigma}_{11} & \bm{\varSigma}_{12}\\\bm{\varSigma}_{21}&\bm{\varSigma}_{22} \end{bmatrix}\bigg)$, 那么条件分布 $\displaystyle \bm{x}_1\mid \bm{x}_2$依然是高斯分布:

$$\begin{align}
\bm{x}_1\mid \bm{x}_2\sim\mathcal{N}\bigg(\bm{\mu}_1+\bm{\varSigma}_{12}\bm{\varSigma}_{22}^{-1}\big[\bm{x}_2- \bm{\mu}_2\big],\bm{\varSigma}_{11}-\bm{\varSigma}_{12}\bm{\varSigma}_{22}^{-1}\bm{\varSigma}_{21}\bigg)
\end{align}$$
且有
$$\begin{align}
\bm{\mu}_{1\mid2}
&=\bm{\mu}_1+\bm{\varSigma}_{11}\bm{\varSigma}_{22}^{-1}\big[\bm{x}_2- \bm{\mu}_2\big]\\
&=\bm{\mu}_1- \bm{\varLambda}_{11}^{-1}\bm{\varLambda}_{12}\big[\bm{x}_2- \bm{\mu}_2\big]\\
&=\bm{\varLambda}_{11}^{-1}\bigg[\bm{\varLambda}_{11}\bm{\mu}_1-\bm{\varLambda}_{12}\big[\bm{x}_2- \bm{\mu}_2\big]\bigg]\\
&=\bm{\varSigma}_{1\mid 2}\bigg[\bm{\varLambda}_{11}\bm{\mu}_1-\bm{\varLambda}_{12}\big[\bm{x}_2- \bm{\mu}_2\big]\bigg]
\end{align}$$

$$\begin{align}
\bm{\varSigma}_{1\mid 2}=\bm{\varSigma}_{11}-\bm{\varSigma}_{12}\bm{\varSigma}_{22}^{-1}\bm{\varSigma}_{21}=\bm{\varLambda}_{11}^{-1}
\end{align}$$

7、最后的战斗

有了联合分布我们就可以用应用条件分布定理了:结论是显然的:
$$\begin{align}
p\big(\bm{x}\mid \bm{y}\big)=\mathcal{N}\big(\bm{\mu}_{\bm{x}\mid \bm{y}},\bm{\varSigma}_{\bm{x}\mid \bm{y}}\big)
\end{align}$$
其中
$\displaystyle \bm{\varSigma}_{\bm{x}\mid \bm{y}}=\big[\bm{\varSigma}_{\bm{x}}^{-1}+\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\bm{A}\big]^{-1}=\bm{\varLambda}_{xx}^{-1}$

$$\begin{align}
\bm{\mu}_{\bm{x}\mid \bm{y}}
&=\bm{\varSigma}_{\bm{x}\mid\bm{y}}\bigg[\bm{\varLambda}_{\bm{x}\bm{x}}\bm{\mu}_{\bm{x}}-\bm{\varLambda}_{\bm{x}\bm{y}}\big[\bm{y}-\bm{A}\bm{\mu}_{x}-\bm{b}\big]\bigg]\\
&=\bm{\varSigma}_{\bm{x}\mid \bm{y}}\big[\bm{A}^\text{T}\bm{\varSigma}_{\bm{y}\mid \bm{x}}^{-1}\big(\bm{y}-\bm{b}\big)+\bm{\varSigma}_{\bm{x}}^{-1} \bm{\mu}_{\bm{x}}\big]
\end{align}$$

还有:
$$\begin{align}
p\big(\bm{y}\big)=\mathcal{N}\big(\bm{A}\bm{\mu}_{\bm{x}}+\bm{b},\bm{\varSigma}_{\bm{y}\mid \bm{x}}+\bm{A}\bm{\varSigma}_{\bm{x}}\bm{A}^\text{T}\big)
\end{align}$$

四、评述

1、我们来总结一下:
$$\begin{align}\begin{cases}
\text{马哈拉诺比斯变换定理}\\
\text{一元高斯分布特征函数}\\
\text{特征函数性质}
\end{cases}\Rightarrow\text{多元高斯分布特征函数}
\end{align}$$
2、当然也可以使用定义与变量代换定理一步到位。
3、有了特征函数,我们就可以证明一系列关键定理
4、我们讨论一下逆矩阵计算的问题,定义精度矩阵,重写了条件分布定理
5、然后我们显然得出了高斯线性模型的结果。
6、当然高斯模型还有非常多性质,我们将继续踏上征途。


版权声明
引线小白创作并维护的柠檬CC博客采用署名-非商业-禁止演绎4.0国际许可证。
本文首发于柠檬CC [ http://www.limoncc.com ] , 版权所有、侵权必究。
本文永久链接http://www.limoncc.com/概率论/2017-01-09-多元高斯分布/

予汝玫瑰,渡人沃土。