next up previous contents
Next: Iteration procedures: Learning Up: Non-Gaussian prior factors Previous: Local mixtures   Contents


Non-quadratic potentials

Solving learning problems numerically by discretizing the $x$ and $y$ variables allows in principle to deal with arbitrary non-Gaussian priors. Compared to Gaussian priors, however, the resulting stationarity equations are intrinsically nonlinear.

As a typical example let us formulate a prior in terms of nonlinear and non-quadratic ``potential'' functions $\psi $ acting on ``filtered differences'' $\omega$ = ${\bf W}(\phi -t)$, defined with respect to some positive (semi-)definite inverse covariance ${\bf K}$ = ${\bf W}^T {\bf W}$. In particular, consider a prior factor of the following form

\begin{displaymath}
p(\phi)
=
e^{-\int \! dx\, \psi(\omega(x))-\ln Z_\phi}
=
\frac{e^{-E(\phi)}}{Z_\phi}
,
\end{displaymath} (595)

where $E(\phi)$ = $\int \! dx \,\psi(\omega(x))$. For general density estimation problems we understand $x$ to stand for a pair $(x,y)$. Such priors are for example used for image restoration [70,28,168,71,248,246].

For differentiable $\psi $ function the functional derivative with respect to $\phi(x)$ becomes

\begin{displaymath}
\delta_{\phi(x)} p(\phi)
=
-e^{-\int \! dx^\prime\, \psi(\o...
...\prime(\omega(x^{\prime\prime}))
{\bf W}(x^{\prime\prime},x)
,
\end{displaymath} (596)

with $\psi^\prime(s)$ = $d\psi(z)/dz$, from which follows
\begin{displaymath}
\delta_{\phi} E(\phi)
=
-\delta_{\phi} \ln p(\phi) = {\bf W}^T \psi^\prime
.
\end{displaymath} (597)

For nonlinear filters acting on $\phi - t$, ${\bf W}$ in Eq. (597) must be replaced by $\omega^\prime(x)$ = $\delta_{\phi(x)}\omega(x)$. Instead of one ${\bf W}$ a ``filter bank'' ${\bf W}_\alpha$ with corresponding ${\bf K}_\alpha$, $\omega_\alpha$, and $\psi_\alpha$ may be used, so that
\begin{displaymath}
e^{-\sum_\alpha
\int \! dx\, \psi_\alpha(\omega_\alpha(x))-\ln Z_\phi}
,
\end{displaymath} (598)

and
\begin{displaymath}
\delta_{\phi} E(\phi)
= \sum_\alpha {\bf W}_\alpha^T \psi_\alpha ^\prime
.
\end{displaymath} (599)

The potential functions $\psi $ may be fixed in advance for a given problem. Typical choices to allow discontinuities are symmetric ``cup'' functions with minimum at zero and flat tails for which one large step is cheaper than many small ones [238]). Examples are shown in Fig. 12 (a,b). The cusp in (b), where the derivative does not exist, requires special treatment [246]. Such functions can also be interpreted in the sense of robust statistics as flat tails reduce the sensitivity with respect to outliers [100,101,67,26].

Inverted ``cup'' functions, like those shown in Fig. 12 (c), have been obtained by optimizing a set of $\psi_\alpha$ with respect to a sample of natural images [246]. (For statistics of natural images their relation to wavelet-like filters and sparse coding see also [175,176].)

Figure 12: Non-quadratic potentials of the form $\psi (x)$ = $a( 1.0 - 1/(1+(\vert x-x_0\vert/b)^\gamma ))$, [246]: ``Diffusion terms'': (a) Winkler's cup function [238] ($a$= $5$, $b$ = $10$, $\gamma $ = $0.7$, $x_0$ = $0$), (b) with cusp ($a$= $1$, $b$ = $3$, $\gamma $ = $2$, $x_0$ = $0$), (c) ``Reaction term'' ($a$ = $-4.8$, $b$ = $15$, $\gamma $ = $2.0$ $x_0$ = $0$).
\begin{figure}\vspace{-0.5cm}
\begin{center}
\raisebox{2.2cm}{(a)$\;$}
\epsfig{f...
...$}
\epsfig{file=ps/psi1.eps, width=71mm}\end{center}\vspace{-0.5cm}
\end{figure}

While, for ${\bf W}$ which are differential operators, cup functions promote smoothness, inverse cup functions can be used to implement structure. For such ${\bf W}$ the gradient algorithm for minimizing $E(\phi)$,

\begin{displaymath}
\phi_{\rm new}
= \phi_{\rm old} - \eta \delta_\phi E(\phi_{\rm old})
,
\end{displaymath} (600)

becomes in the continuum limit a nonlinear parabolic partial differential equation,
\begin{displaymath}
\phi_\tau
= -\sum_\alpha {\bf W}_\alpha^T
\psi_\alpha^\prime ({\bf W}_\alpha(\phi-t))
.
\end{displaymath} (601)

Here a formal time variable $\tau$ have been introduced so that $(\phi_{\rm new}-\phi_{\rm old})/\eta\rightarrow \phi_\tau = d\phi/d\tau$. For cup functions this equation is of diffusion type [173,188], if also inverted cup functions are included the equation is of reaction-diffusion type [246]. Such equations are known to generate a great variety of patterns.

Alternatively to fixing $\psi $ in advance or, which is sometimes possible for low-dimensional discrete function spaces like images, to approximate $\psi $ by sampling from the prior distribution, one may also introduce hyperparameters and adapt potentials $\psi(\theta)$ to the data.

For example, attempting to adapt a unrestricted function $\psi (x)$ with hyperprior $p(\psi)$ by Maximum A Posteriori Approximation one has to solve the stationarity condition

\begin{displaymath}
0 =
\delta_{\psi(s)} \ln p(\phi,\psi)
=
\delta_{\psi(s)} \ln p(\phi\vert\psi)
+\delta_{\psi(s)} \ln p(\psi)
.
\end{displaymath} (602)

From
\begin{displaymath}
\delta_{\psi(s)} p(\phi\vert\psi)
=
-p(\phi\vert\psi) \int \...
...mega(x) \right)
- \frac{1}{Z_\phi^2} \delta_{\psi(s)} Z_\phi
,
\end{displaymath} (603)

it follows
\begin{displaymath}
-\delta_{\psi(s)} \ln p(\phi\vert\psi)
=
n(s) - <n(s)>
,
\end{displaymath} (604)

with integer
\begin{displaymath}
n(s) = \int \!dx\, \delta \left(s-\omega(x) \right)
,
\end{displaymath} (605)

being the histogram of the filtered differences, and average histogram
\begin{displaymath}
<n(s)> \; = \int\! d\phi \, p(\phi\vert\psi) \, n(s)
.
\end{displaymath} (606)

The right hand side of Eq. (604) is zero at $\phi^*$ if, e.g., $p(\phi\vert\psi)$ = $\delta(\phi-\phi^*)$, which is the case for $\psi(\omega(x;\phi))$ = $\beta \left(\omega(x;\phi)-\omega(x;\phi^*)\right)^2$ in the ${\beta\rightarrow\infty}$ limit.

Introducing hyperparameters one has to keep in mind that the resulting additional flexibility must be balanced by the number of training data and the hyperprior to be useful in practice.


next up previous contents
Next: Iteration procedures: Learning Up: Non-Gaussian prior factors Previous: Local mixtures   Contents
Joerg_Lemm 2001-01-21