2 Templates and concepts

Next: 3 Combining quadratic concepts Up: fns98 Previous: 1 Introduction

2 Templates and concepts

Technically, one is interested in function approximation in finding a function or hypothesis which minimizes a given error/energy functional . Using a Bayesian interpretation² we understand the error functional up to a constant as proportional to the posterior log-probability for the function given training data and prior informations , i.e.

$\begin{displaymath} p(h\vert D,D_0)\propto e^{-\beta E[h]}. \end{displaymath}$

(1)

The error functional usually depends as well on a finite number of training data as also on additional prior informations . Special cases of function approximation include density estimation where has to fulfill an additional normalization condition or classification or pattern recognition where the function takes only discrete values representing the possible classes or patterns of .

Let us consider as example an error functional with mean square data terms and a typical smoothness constraint for -dimensional

$\begin{displaymath} E[h] = \frac{1}{2} \sum_i^n (y_i-h(x_i))^2 +\frac{\lambda}{... ...um_{l=1}^d \left( \frac{\partial h(x)}{\partial x_l}\right)^2. \end{displaymath}$

(2)

We now rewrite the terms to show their common form. A mean square error term can be written using the bra-ket notation for scalar products and matrix elements of symmetric operators

$\begin{displaymath} (y_i-h(x_i))^2 = \int_{-\infty}^\infty \!d^dx \, \delta (x-x_i) (h(x)-t_i(x))^2 = <\! h-t_i \vert P_i\vert h-t_i\!> \end{displaymath}$

(3)

with projector $P_i (x,x^\prime)$ = $\delta(x-x_i)\delta(x-x^\prime)$ and

the constant function $t_i(x) \equiv y_i$ . Thus, (3) is a square distance of

from

at point

For the smoothness constraint in (2) we find the quite similar form

$\begin{displaymath} \int_{-\infty}^\infty \!d^dx\, \sum_{l=1}^d \left( \frac{\p... ...rtial x_l}\right)^2 = - <\! h - t_0 \vert\Delta \vert h-t_0\!> \end{displaymath}$

(4)

with zero function $t_0(x)\equiv 0$ and $\Delta (x,x^\prime )$ = $\delta (x-x^\prime)\sum_{l=1}^d \frac{\partial^2 h(x)}{\partial x_l^2}$ the negative (semi) definite

-dimensional Laplacian. Here we used partial integration assuming vanishing boundary terms. Also (4) can be interpreted as distance of

from the zero function

but now over all

. Hence, we obtain for functional (2)

$\begin{displaymath} E[h] = \frac{1}{2} \sum_i^n <\! h - t_i \vert P_i\vert h - t_i\!> - \frac{\lambda}{2} <\! h - t_0 \vert\Delta \vert h-t_0\!>. \end{displaymath}$

(5)

These examples motivate the following general definitions. Let denote a Hilbert space³of hypothesis functions of -dimensional .

Definition 1 ( (prior) concept with (prior) template and template distance ): A prior concept is a pair with a function in and $d: (H\times H) \rightarrow \{0\}\cup I\!\!R^+$ a (``distance'') functional with . Note that this allows for $h\ne t$ . We write . The function $t(x)\in H$ will be called a (prior) template.

Template functions will be used to represent function prototypes. Notice that templates include standard training data which is the reason for the brackets around the word ``prior''. We are especially interested in distances quadratic in , for which the functional derivative with respect to is linear. Such distances can be defined by positive semi-definite operators . Such operators have a decomposition with invertible if positive definite. More precisely, $\sqrt{<\!h\vert O\vert h\!>}$ = $\vert\vert h\vert\vert _O$ = $\vert\vert Wh\vert\vert$ defines a semi-norm on with $\vert\vert h\vert\vert _O=0$ if is in the zero space of , i.e. if . Typical are projectors into the space of training data like in $% latex2html id marker 356 $(\ref{mse})$$ and generators of infinitesimal transformations of continuous Lie groups, like the gradient $\nabla$ for translations in (4) with $\nabla^T \nabla = - \Delta$ under appropriate boundary conditions. Thus, we define:

Definition 2 (quadratic (prior) concept with template distance operator ): A quadratic (prior) concept is a pair with a function in and a symmetric and positive semi-definite operator which will be called a template distance operator. defines the square template distance:

$\begin{displaymath} d^2 [h] = <\!h-t\vert O\vert h-t\!> = <\!W(h-t)\vert W(h-t... ...vert\vert W(h-t)\vert\vert^2 = \vert\vert h-t\vert\vert^2_{O}. \end{displaymath}$

(6)

Thus, a quadratic concept defines a -dimensional Gaussian process with $p(h) = e^{-d^2[h]/2}$ and covariance operator $O^{-1}$ . Its matrix elements are sometimes also called Green´s function, propagator or two-point correlation function. The Laplacian (4), for example, corresponds to the Wiener measure known from Brownian motion or diffusion and is also used as kinetic energy for Euclidean scalar fields in physics (see for example [2]. The zero modes of represent the projections of which do not contribute to . The projector in the mean square error term (3), for example, measures the distance only at (training data) point . Also continuous template functions may be restricted to subspaces, e.g. parts of an image or a specific resolution.

Definition 3 (template space and template projectors ): The maximal subspace on which the positive semi-definite is positive definite will be called the template space of . The corresponding hermitian projector in this subspace , i.e. , $P_t(H\setminus H_t) = 0$ , and will be called template projector.

Hence commutes with the template projector . Maximality of means that is the projector in the zero space of i.e. $O(H\setminus H_t) = 0$ . Our aim is to built an error functional depending on over square distances .

Next: 3 Combining quadratic concepts Up: fns98 Previous: 1 Introduction

Joerg_Lemm 2000-09-22