next up previous contents
Next: Linear trial spaces Up: Parameterizing likelihoods: Variational methods Previous: General parameterizations   Contents


Gaussian priors for parameters

Up to now we assumed the prior to be given for a function $\phi (\xi)(x,y)$ depending on $x$ and $y$. Instead of a prior in a function $\phi (\xi)(x,y)$ also a prior in another not $(x,y)$-dependent function of the parameters $\psi(\xi)$ can be given. A Gaussian prior in $\psi (\xi) = W_{\psi} \xi$ being a linear function of $\xi$, results in a prior which is also Gaussian in the parameters $\xi$, giving a regularization term

\begin{displaymath}
\frac{1}{2} (\,\xi,\, W_{\psi}^T {{\bf K}}_\psi W_{\psi}\,\xi\,)
=
\frac{1}{2} (\,\xi,\, {{\bf K}}_\xi\,\xi\,),
\end{displaymath} (371)

where ${{\bf K}}_\xi$ = $W_{\psi}^T {{\bf K}}_\psi W_{\psi}$ is not an operator in a space of functions $\phi(x,y)$ but a matrix in the space of parameters $\xi$. The results of Section 4.1 apply to this case provided the following replacement is made
\begin{displaymath}
\Phi^\prime {{\bf K}} \phi
\rightarrow {{\bf K}}_\xi \xi
.
\end{displaymath} (372)

Similarly, a nonlinear $\psi $ requires the replacement
\begin{displaymath}
\Phi^\prime {{\bf K}} \phi
\rightarrow
{\Psi}^\prime {{\bf K}}_\psi \psi
,
\end{displaymath} (373)

where
\begin{displaymath}
\Psi^\prime (k,l)
=
\frac{\partial \psi_l (\xi) }{\partial \xi_k}
.
\end{displaymath} (374)

Thus, in the general case where a Gaussian (specific) prior in $\phi(\xi)$ and $\psi(\xi)$ is given,
$\displaystyle E_{\phi (\xi),\psi(\xi)}$ $\textstyle =$ $\displaystyle -(\,\ln P (\xi),\, N\,)+ (\,P( \xi ),\,\Lambda_X\,)$  
    $\displaystyle +\frac{1}{2} (\,\phi (\xi),\, {{\bf K}}\,\phi (\xi)\,)
+\frac{1}{2} (\,\psi (\xi),\, {{\bf K}}_\psi\,\psi (\xi)\,)
,$ (375)

or, including also non-zero template functions (means) $t$, $t_\psi$ for $\phi $ and $\psi $ as discussed in Section 3.5,
$\displaystyle E_{\phi (\xi),\psi(\xi)}$ $\textstyle =$ $\displaystyle -(\,\ln P (\xi),\, N\,)+ (\,P( \xi ),\,\Lambda_X\,)$  
    $\displaystyle +\frac{1}{2} (\,\phi (\xi)-t,\, {{\bf K}}\,(\phi (\xi)-t)\,)$  
    $\displaystyle +\frac{1}{2} (\,\psi (\xi)-t_\psi,\, {{\bf K}}_\psi\,(\psi (\xi)-t_\psi)\,)
.$ (376)

The $\phi $ and $\psi $-terms of the energy can be interpreted as corresponding to a probability $p(\xi\vert t,{{\bf K}},t_\psi,{{\bf K}}_\psi)$, ( $\ne p(\xi\vert t,{{\bf K}})$ $p(\xi\vert t_\psi,{{\bf K}}_\psi)$), or, for example, to $p(t_\psi\vert\xi,{{\bf K}}_\psi)$ $p(\xi\vert t,{{\bf K}})$ with one of the two terms term corresponding to a Gaussian likelihood with $\xi$-independent normalization.

The stationarity equation becomes

$\displaystyle 0$ $\textstyle =$ $\displaystyle {\bf P}_\xi^\prime {\bf P}^{-1} N
-{\Phi}^\prime {{\bf K}} (\phi-t)
-{\Psi}^\prime {{\bf K}}_\psi (\psi-t_\psi)
-{\bf P}_\xi^\prime \Lambda_X$ (377)
  $\textstyle =$ $\displaystyle G_{\phi,\psi}
-{\bf P}_\xi^\prime \Lambda_X
,$ (378)

which defines $G_{\phi,\psi}$, and for $\Lambda_X\ne 0$
\begin{displaymath}
\Lambda_X
=
{\bf I}_X {\bf P} \left(
({\bf P}^\prime_\xi)^{\char93 } G_{\phi,\psi} +\Lambda_X^0
\right)
,
\end{displaymath} (379)

for ${\bf P}_\xi^\prime \Lambda_X^0 = 0$.


Table 5: Summary of stationarity equations. For notations, conditions and comments see Sections 3.1.1, 3.2.1, 3.3.2, 3.3.1, 4.1 and 4.2.
Variable Error Stationarity equation $\Lambda_X$
$\!L(x,y)$ $E_L$ $ {{\bf K}} L = N - {\bf e^L} \Lambda_X$ ${\bf I}_X \left( N - {{\bf K}} L \right)$
$\!P(x,y)$ $E_P$ ${{\bf K}} P =
{\bf P}^{-1} N
-\Lambda_X$ ${\bf I}_X (N - {\bf P}{{\bf K}} P)$
$\!\phi=\sqrt{P}$ $E_{\sqrt{P}}$ ${{\bf K}} \phi =
2{\Phi}^{-1} N
-2\Phi \Lambda_X$ ${\bf I}_X (N - \frac{1}{2}\Phi {{\bf K}} \phi)$
$\!\phi (x,y)$ $E_{\phi}$ ${{\bf K}}\phi =
{\bf P}^\prime {\bf P}^{-1} N
-{\bf P}^\prime \Lambda_X$ ${\bf I}_X \left( N - {\bf P} {{\bf P}^\prime}^{-1} {{\bf K}}\, \phi \right)$
$\!\xi$ $E_{\phi(\xi)}$ $\Phi^\prime {{\bf K}}\phi
=
{\bf P}_\xi^\prime {\bf P}^{-1} N
-{\bf P}_\xi^\prime \Lambda_X$ ${\bf I}_X {\bf P} \left(
({\bf P}^\prime_\xi)^{\char93 } G_{\phi(\xi)} +\Lambda^0_X
\right)\!$
$\!\xi$ $E_{\phi(\xi)\psi(\xi)}$ $\Phi^\prime {{\bf K}}(\phi\!-\!t)
+ \Psi^\prime {{\bf K}}_\psi (\psi\!-\!t_\psi)$ ${\bf I}_X {\bf P} \left(
({\bf P}^\prime_\xi)^{\char93 } G_{\phi,\psi}+\Lambda^0_X
\right)\!$
    $={\bf P}_\xi^\prime {\bf P}^{-1} N
-{\bf P}_\xi^\prime \Lambda_X$  



next up previous contents
Next: Linear trial spaces Up: Parameterizing likelihoods: Variational methods Previous: General parameterizations   Contents
Joerg_Lemm 2001-01-21