Classification

In classification (or pattern recognition) tasks the independent visible variable

takes discrete values (group, cluster or pattern labels) [16,61,24,47]. We write

and $p(y\vert x,h)$ =

, i.e., $\sum_k P_k(x,h)$ =

. Having received classification data

= $\{(x_i,k_i)\vert 1\le i\le n\}$ the density estimation error functional for a prior on function $\phi$ (with components $\phi_k$ and

= $P(\phi)$ ) reads

$\begin{displaymath} E_{\rm cl.} = \sum_i^n \ln P_{k_i}(x_i;\phi) +\frac{1}{2}\Big(\phi-t,\, {\bf K}\,(\phi-t) \Big) +(P(\phi), \Lambda_X) . \end{displaymath}$

(335)

$\begin{displaymath} \Big(\phi-t,\, {\bf K}\,(\phi-t) \Big) = \sum_{k,k^\prime} \... ...x^\prime) (\phi_{k^\prime}(x^\prime)-t_{k^\prime}(x^\prime)) , \end{displaymath}$

(336)

For zero-one loss

= $\delta_{k,a(x)}$ -- a typical loss function for classification problems -- the optimal decision (or Bayes classifier) is given by the mode of the predictive density (see Section 2.2.2), i.e.,

$\begin{displaymath} a(x) = {\rm argmax}_k \, p(k\vert x,D,D_0) . \end{displaymath}$

(337)

For the choice $\phi_k=P_k$ non-negativity and normalization must be ensured. For $\phi=L$ with

non-negativity is automatically fulfilled but the Lagrange multiplier must be included to ensure normalization.

Normalization is guaranteed by using unnormalized probabilities $\phi_k=z_k$ , $P=z_k/\sum_l z_l$ (for which non-negativity has to be checked) or shifted log-likelihoods $\phi_k=g_k$ with $g_k = L_k +\ln \sum_l e^{L_l}$ , i.e.,

= $e^{g_k }/\sum_l e^{g_l}$ . In that case the nonlocal normalization terms are part of the likelihood and no Lagrange multiplier has to be used [236]. The resulting equation can be solved in the space defined by the

-data (see Eq. (153)). The restriction of $\phi_k$ =

to linear functions $\phi_k(x) = w_k x +b_k$ yields log-linear models [154]. Recently a mean field theory for Gaussian Process classification has been developed [177,179].

Table 3 lists some special cases of density estimation. The last line of the table, referring to inverse quantum mechanics, will be discussed in the next section.

Table 3: Special cases of density estimation

likelihood $p(y\vert x,h)$	problem type
of general form	density estimation
discrete	classification
Gaussian with fixed variance	regression
mixture of Gaussians	clustering
quantum mechanical likelihood	inverse quantum mechanics