next up previous
Next: 2 Templates and concepts Up: fns98 Previous: Abstract

1 Introduction

We choose in this paper the function approximation setting as formalization of learning. Hereby, we assume to have available a set of training data $D$ = $\{(x_i,y_i)\vert\, 1\!\le \!i\!\le\!n\}$. The learning problem consist in finding a function $h(x)$ so we can predict outcomes also for $x$ not contained in the training data by $y=h(x)$. For a face detection task, for example, the independent or input variable $x_i$ may be an array of grey level values representing example images, while the dependent or output variable $y_i$ can simply be a binary variable encoding the attributes ``image contains a face'' and ``image contains no face''. In the analysis of time series, $t_i=x_i$ is a time of measurement while the measured value $y_i$ might be an exchange rate of the Dollar at time $t_i$.

Learning, i.e. generalization from training data to new situations requires always the presence of prior knowledge. Besides stationarity assumptions for the empirical distribution of $y$ for fixed $x$ prior information has to specify the dependencies which allow to infer from one function value $y=h(x)$ to function values $y^\prime =h(x^\prime)$ with $x^\prime \ne x$. Hence prior information is especially needed in cases where not all values of independent variable $x$ are available for training or $x$ can take more values than training data are available. This is always the case for continuous $x$. In typical object recognition tasks, for example, the input space has very high dimensionality (e.g. images) while the number of available training data may be small (e.g. because costly to obtain). Nevertheless objects have often very specific characteristica and human observers may name a lot of them. Typical examples are constituents, like eyes, nose, and mouth for faces, and their, e.g. spatial, relations. Methods from fuzzy set theory and fuzzy logic [1] can be used to quantify such linguistic variables or ``fuzzy'' concepts, like ``eyes'' or ``typical distance between eyes''. Beginning with a set of elementary concepts, more complicated concepts can be created by a fuzzy logical combination of elementary concepts. For example we may require a face to have mouth, nose AND eyes, but all those parts may appear in different variants, e.g. an eye may be open OR closed. The problem consists 1. in the quantification of the elementary (fuzzy) concepts and 2. their probabilistic or fuzzy logical combination.

Despite the fact that generalization is essentially based on prior information most implementations of it are implicit in a chosen restricted parameterization (e.g. by choosing a specific neural network architecture) or implicit in the chosen algorithm (e.g. early stopping used for training of neural networks may be seen as regularization method). Implicit implementations are difficult to analyze and do usually not allow adaption to specific prior knowledge. As we want to deal with varying and relatively specific prior information we have to include it explicitly in the error functional. This is done, for example, in regularization theory (from a Bayesian point of view) but usually restricted to relatively simple (quadratic, i.e. convex) smoothness constraints.

The approach for explicit implementation of prior knowledge presented here has two steps. In a first step, components of prior information are formalized by defining template functions, being locally or globally defined function prototypes, and distances therefrom. In a second step the components are combined by the logical operations AND and OR, similar to fuzzy logical methods. Implementing a probabilistic AND yields the classical linear regularization methods, which include for example splines and Radial Basis Functions. For many problems knowledge formulated in terms of a probabilistic OR is available. Objects, for example consist of constituents, which may appear in one of several relatively well defined variants, e.g. eyes may be open OR closed, blue OR brown. Especially important are translation, scaling and other deformations. The implementation of a probabilistic OR yields nonlinear stationarity equations being the technical topic of this paper.

More than one approach is possible for implementing OR-like combinations. Interpreting the individual components as (shifted and rescaled) log-probabilities results in a mixture model over a function space, resembling a physical system at finite temperature. But one also can directly construct an effective total posterior by fuzzy logical-like combination of prior components. Introducing a temperature-like parameter this corresponds to the Landau-Ginzburg treatment of phase transitions. The resulting equations are similar, for example, to equations obtained in quantum mechanical scattering theory.


next up previous
Next: 2 Templates and concepts Up: fns98 Previous: Abstract
Joerg_Lemm 2000-09-22