Additive models

Next: Product ansatz Up: Parameterizing likelihoods: Variational methods Previous: Mixture models Contents

Additive models

Trial functions $\phi$ may be chosen as sum of simpler functions $\phi_l$ each depending only on part of the and variables. More precisely, we consider functions $\phi_l$ depending on projections = ${\bf I}_l^{(z)} z$ of the vector = of all and components. ${\bf I}_l^{(z)}$ denotes an projector in the vector space of (and not in the space of functions $\Phi(x,y)$ ). Hence, $\phi$ becomes of the form

$\begin{displaymath} \phi (z) = \sum_l \phi_l (z_l) , \end{displaymath}$

(391)

so only one-dimensional functions $\phi_l$ have to be determined. Restricting the functions $\phi_l$ to a parameterized function space yields a ``parameterized additive model''

$\begin{displaymath} \phi (z) = \sum_l \phi_l (\xi, z_l), \end{displaymath}$

(392)

which has to be solved for the parameters $\xi$ . The model can also be generalized to a model ``additive in parameters $\xi_l$ ''

$\begin{displaymath} \phi (z) = \sum_l \phi_l (\xi_l,x,y), \end{displaymath}$

(393)

where the functions $\phi_l (\xi_l,x,y)$ are not restricted to one-dimensional functions depending only on projections

on the coordinate axes. If the parameters $\xi_l$ determine the component functions $\phi_l$ completely, this yields just the mixture models of Section 4.4. Another example is projection pursuit, discussed in Section 4.8), where a parameter vector $\xi_l$ corresponds to a projections $\xi_l \cdot z$ . In that case even for given $\xi_l$ still a one-dimensional function $\phi_l(\xi_l \cdot z)$ has to be determined.

An ansatz like (391) is made more flexible by including also interactions

$\begin{displaymath} \phi (x,y) = \sum_l \phi_l (z_l) +\sum_{kl} \phi_{kl} (z_k, z_l) +\sum_{klm} \phi_{klm} (z_k, z_l, z_m) + \cdots . \end{displaymath}$

(394)

The functions $\phi_{kl\cdots} (z_k, z_l,\cdots)$ can be chosen to depend on product terms like $z_{l,i} z_{k,j}$ , or $z_{l,i} z_{k,j} z_{m,n}$ , where $z_{l,i}$ denotes one-dimensional sub-variables of

In additive models in the narrower sense [218,92,93,94] is a subset of , components, i.e., $z_l \subseteq \{ x_i \vert 1\le i\le d_x \}$ $\cup$ $\{ y_j \vert 1\le j\le d_y \}$ , denoting the dimension of , the dimension of . In regression, for example, one takes usually the one-element subsets = $\{x_l\}$ for $1\le l \le d_x$ .

In more general schemes the projections of do not have to be restricted to projections on the coordinates axes. In particular, the projections can be optimized too. For example, one-dimensional projections ${\bf I}_l^{(z)} z$ = $w\cdot z$ with $z,w\in X\times Y$ (where $\cdot$ denotes a scalar product in the space of variables) are used by ridge approximation schemes. They include for regression problems one-layer (and similarly multilayer) feedforward neural networks (see Section 4.9) projection pursuit regression (see Section 4.8), ridge regression [151,152], and hinge functions [31]. For a detailed discussion of the regression case see [76].

The stationarity equation for $E_\phi$ becomes for the ansatz (391)

$\begin{displaymath} 0 = {\bf P}^\prime_l {\bf P}^{-1} N -{{\bf K}} \phi -{\bf P}^\prime_l \Lambda_X , \end{displaymath}$

(395)

with

$\begin{displaymath} {\bf P}^\prime_l (z_l,z^\prime) = \frac{\delta P(z^\prime)}{\delta \phi_l(z_l)} . \end{displaymath}$

(396)

Considering a density

being also decomposed into components

determined by the components $\phi_l$

$\begin{displaymath} P(z) = \sum_l P_l(\phi_l(z_l)), \end{displaymath}$

(397)

the derivative (396) becomes

$\begin{displaymath} {\bf P}^\prime_l (z_l,z_l^\prime) = \frac{\delta P_l(z_l^\prime)}{\delta \phi_l(z_l)} , \end{displaymath}$

(398)

so that specifying an additive prior

$\begin{displaymath} \frac{1}{2} \sum_{kl}(\,\phi_k -t_k,\, {{\bf K}_{kl}}\,(\phi_l-t_l)\,) , \end{displaymath}$

(399)

the stationary conditions are coupled equations for the component functions $\phi_l$ which, because ${\bf P}$ is diagonal, only contain integrations over

-variables

$\begin{displaymath} 0= \frac{\delta P_l}{\delta \phi_l} {\bf P}^{-1} N -\sum_k ... ...{lk} (\phi_k-t_k) -\frac{\delta P_l}{\delta \phi_l} \Lambda_X. \end{displaymath}$

(400)

For the parameterized approach (392) one finds

$\begin{displaymath} 0 = \Phi_l^\prime {\bf P}^\prime_l {\bf P}^{-1} N -\Phi_l^\prime {{\bf K}} \phi -\Phi_l^\prime {\bf P}^\prime_l \Lambda_X , \end{displaymath}$

(401)

with

$\begin{displaymath} \Phi_l^\prime (k,z_l) = \frac{\partial \phi_l(z_l)}{\partial \xi_k} . \end{displaymath}$

(402)

For the ansatz (393) $\Phi_l^\prime (k,z)$ would be restricted to a subset of $\xi_k$ .

Next: Product ansatz Up: Parameterizing likelihoods: Variational methods Previous: Mixture models Contents

Joerg_Lemm 2001-01-21