next up previous
Next: 3 Combining quadratic concepts Up: fns98 Previous: 1 Introduction

2 Templates and concepts

Technically, one is interested in function approximation in finding a function or hypothesis $h(x)$ which minimizes a given error/energy functional $E[h]$. Using a Bayesian interpretation2 we understand the error functional up to a constant as proportional to the posterior log-probability for the function $h$ given training data $D$ and prior informations $D_0$, i.e.

\begin{displaymath}
p(h\vert D,D_0)\propto e^{-\beta E[h]}.
\end{displaymath} (1)

The error functional usually depends as well on a finite number of training data $D$ as also on additional prior informations $D_0$. Special cases of function approximation include density estimation where $h$ has to fulfill an additional normalization condition or classification or pattern recognition where the function $h(x)$ takes only discrete values representing the possible classes or patterns of $x$.

Let us consider as example an error functional with mean square data terms and a typical smoothness constraint for $d$-dimensional $x$

\begin{displaymath}
E[h] = \frac{1}{2} \sum_i^n (y_i-h(x_i))^2
+\frac{\lambda}{...
...um_{l=1}^d \left( \frac{\partial h(x)}{\partial x_l}\right)^2.
\end{displaymath} (2)

We now rewrite the terms to show their common form. A mean square error term can be written using the bra-ket notation for scalar products and matrix elements of symmetric operators
\begin{displaymath}
(y_i-h(x_i))^2
= \int_{-\infty}^\infty \!d^dx \, \delta (x-x_i) (h(x)-t_i(x))^2
= <\! h-t_i \vert P_i\vert h-t_i\!>
\end{displaymath} (3)

with projector $P_i (x,x^\prime)$ = $\delta(x-x_i)\delta(x-x^\prime)$ and $t_i$ the constant function $t_i(x) \equiv y_i$. Thus, (3) is a square distance of $h$ from $t_i$ at point $x_i$.

For the smoothness constraint in (2) we find the quite similar form

\begin{displaymath}
\int_{-\infty}^\infty \!d^dx\,
\sum_{l=1}^d \left( \frac{\p...
...rtial x_l}\right)^2
= - <\! h - t_0 \vert\Delta \vert h-t_0\!>
\end{displaymath} (4)

with zero function $t_0(x)\equiv 0$ and $\Delta (x,x^\prime )$ = $\delta (x-x^\prime)\sum_{l=1}^d \frac{\partial^2 h(x)}{\partial x_l^2}$ the negative (semi) definite $d$-dimensional Laplacian. Here we used partial integration assuming vanishing boundary terms. Also (4) can be interpreted as distance of $h$ from the zero function $t_0$ but now over all $x$. Hence, we obtain for functional (2)
\begin{displaymath}
E[h] = \frac{1}{2} \sum_i^n <\! h - t_i \vert P_i\vert h - t_i\!>
- \frac{\lambda}{2} <\! h - t_0 \vert\Delta \vert h-t_0\!>.
\end{displaymath} (5)

These examples motivate the following general definitions. Let $H$ denote a Hilbert space3of hypothesis functions $h(x)$ of $d$-dimensional $x$.



Definition 1 ( (prior) concept $(t,d)$ with (prior) template $t$ and template distance $d$): A prior concept is a pair $(t,d)$ with $t(x)$ a function in $H$ and $d: (H\times H) \rightarrow \{0\}\cup I\!\!R^+$ a (``distance'') functional with $d[t,t]=0$. Note that this allows $d[t,h]=0$ for $h\ne t$. We write $d[t,h]=d[h]$. The function $t(x)\in H$ will be called a (prior) template.

Template functions will be used to represent function prototypes. Notice that templates include standard training data which is the reason for the brackets around the word ``prior''. We are especially interested in distances quadratic in $h$, for which the functional derivative with respect to $h$ is linear. Such distances can be defined by positive semi-definite operators $O$. Such operators have a decomposition $O = W^T W$ with $W$ invertible if $O$ positive definite. More precisely, $\sqrt{<\!h\vert O\vert h\!>}$ = $\vert\vert h\vert\vert _O$ = $\vert\vert Wh\vert\vert$ defines a semi-norm on $H$ with $\vert\vert h\vert\vert _O=0$ if $h$ is in the zero space of $O$, i.e. if $O h = 0$. Typical $W$ are projectors into the space of training data like in % latex2html id marker 356
$(\ref{mse})$ and generators of infinitesimal transformations of continuous Lie groups, like the gradient $\nabla$ for translations in (4) with $\nabla^T \nabla = - \Delta$ under appropriate boundary conditions. Thus, we define:



Definition 2 (quadratic (prior) concept $(t,O)$ with template distance operator $O$): A quadratic (prior) concept is a pair $(t,O)$ with $t(x)$ a function in $H$ and a symmetric and positive semi-definite operator $O = W^T W$ which will be called a template distance operator. $O$ defines the square template distance:

\begin{displaymath}
d^2 [h]
= <\!h-t\vert O\vert h-t\!>
= <\!W(h-t)\vert W(h-t...
...vert\vert W(h-t)\vert\vert^2
= \vert\vert h-t\vert\vert^2_{O}.
\end{displaymath} (6)

Thus, a quadratic concept defines a $d$-dimensional Gaussian process with $p(h) = e^{-d^2[h]/2}$ and covariance operator $O^{-1}$. Its matrix elements are sometimes also called Greenīs function, propagator or two-point correlation function. The Laplacian (4), for example, corresponds to the Wiener measure known from Brownian motion or diffusion and is also used as kinetic energy for Euclidean scalar fields in physics (see for example [2]. The zero modes of $O$ represent the projections of $h$ which do not contribute to $d^2 [h]$. The projector $P_i$ in the mean square error term (3), for example, measures the distance only at (training data) point $x_i$. Also continuous template functions may be restricted to subspaces, e.g. parts of an image or a specific resolution.



Definition 3 (template space $H_t$ and template projectors $P_t$): The maximal subspace on which the positive semi-definite $O$ is positive definite will be called the template space $H_t$ of $O$. The corresponding hermitian projector $P_t$ in this subspace $H_t$, i.e. $P_t(H) = H_t$, $P_t(H\setminus H_t) = 0$, and $P_t^2 = P_t$ will be called template projector.

Hence $O$ commutes with the template projector $[P_t, O] = P_t O - O P_t = 0$. Maximality of $H_t$ means that $1-P_t$ is the projector in the zero space of $O$ i.e. $O(H\setminus H_t) = 0$. Our aim is to built an error functional $E[h]$ depending on $h$ over square distances $d^2 [h]$.


next up previous
Next: 3 Combining quadratic concepts Up: fns98 Previous: 1 Introduction
Joerg_Lemm 2000-09-22