next up previous contents
Next: Product ansatz Up: Parameterizing likelihoods: Variational methods Previous: Mixture models   Contents


Additive models

Trial functions $\phi $ may be chosen as sum of simpler functions $\phi_l$ each depending only on part of the $x$ and $y$ variables. More precisely, we consider functions $\phi_l$ depending on projections $z_l$ = ${\bf I}_l^{(z)} z$ of the vector $z$ = $(x,y)$ of all $x$ and $y$ components. ${\bf I}_l^{(z)}$ denotes an projector in the vector space of $z$ (and not in the space of functions $\Phi(x,y)$). Hence, $\phi $ becomes of the form

\begin{displaymath}
\phi (z) = \sum_l \phi_l (z_l)
,
\end{displaymath} (391)

so only one-dimensional functions $\phi_l$ have to be determined. Restricting the functions $\phi_l$ to a parameterized function space yields a ``parameterized additive model''
\begin{displaymath}
\phi (z) = \sum_l \phi_l (\xi, z_l),
\end{displaymath} (392)

which has to be solved for the parameters $\xi$. The model can also be generalized to a model ``additive in parameters $\xi_l$''
\begin{displaymath}
\phi (z) = \sum_l \phi_l (\xi_l,x,y),
\end{displaymath} (393)

where the functions $\phi_l (\xi_l,x,y)$ are not restricted to one-dimensional functions depending only on projections $z_l$ on the coordinate axes. If the parameters $\xi_l$ determine the component functions $\phi_l$ completely, this yields just the mixture models of Section 4.4. Another example is projection pursuit, discussed in Section 4.8), where a parameter vector $\xi_l$ corresponds to a projections $\xi_l \cdot z$. In that case even for given $\xi_l$ still a one-dimensional function $\phi_l(\xi_l \cdot z)$ has to be determined.

An ansatz like (391) is made more flexible by including also interactions

\begin{displaymath}
\phi (x,y) =
\sum_l \phi_l (z_l)
+\sum_{kl} \phi_{kl} (z_k, z_l)
+\sum_{klm} \phi_{klm} (z_k, z_l, z_m) +
\cdots .
\end{displaymath} (394)

The functions $\phi_{kl\cdots} (z_k, z_l,\cdots)$ can be chosen to depend on product terms like $z_{l,i} z_{k,j}$, or $z_{l,i} z_{k,j} z_{m,n}$, where $z_{l,i}$ denotes one-dimensional sub-variables of $z_l$.

In additive models in the narrower sense [218,92,93,94] $z_l$ is a subset of $x$, $y$ components, i.e., $z_l \subseteq
\{ x_i \vert 1\le i\le d_x \}$ $\cup$ $\{ y_j \vert 1\le j\le d_y \}$, $d_x$ denoting the dimension of $x$, $d_y$ the dimension of $y$. In regression, for example, one takes usually the one-element subsets $z_l$ = $\{x_l\}$ for $1\le l \le d_x$.

In more general schemes the projections of $z$ do not have to be restricted to projections on the coordinates axes. In particular, the projections can be optimized too. For example, one-dimensional projections ${\bf I}_l^{(z)} z$ = $w\cdot z$ with $z,w\in X\times Y$ (where $\cdot$ denotes a scalar product in the space of $z$ variables) are used by ridge approximation schemes. They include for regression problems one-layer (and similarly multilayer) feedforward neural networks (see Section 4.9) projection pursuit regression (see Section 4.8), ridge regression [151,152], and hinge functions [31]. For a detailed discussion of the regression case see [76].

The stationarity equation for $E_\phi$ becomes for the ansatz (391)

\begin{displaymath}
0 = {\bf P}^\prime_l {\bf P}^{-1} N
-{{\bf K}} \phi
-{\bf P}^\prime_l \Lambda_X
,
\end{displaymath} (395)

with
\begin{displaymath}
{\bf P}^\prime_l (z_l,z^\prime)
= \frac{\delta P(z^\prime)}{\delta \phi_l(z_l)}
.
\end{displaymath} (396)

Considering a density $P$ being also decomposed into components $P_l$ determined by the components $\phi_l$
\begin{displaymath}
P(z) = \sum_l P_l(\phi_l(z_l)),
\end{displaymath} (397)

the derivative (396) becomes
\begin{displaymath}
{\bf P}^\prime_l (z_l,z_l^\prime)
= \frac{\delta P_l(z_l^\prime)}{\delta \phi_l(z_l)}
,
\end{displaymath} (398)

so that specifying an additive prior
\begin{displaymath}
\frac{1}{2} \sum_{kl}(\,\phi_k -t_k,\, {{\bf K}_{kl}}\,(\phi_l-t_l)\,)
,
\end{displaymath} (399)

the stationary conditions are coupled equations for the component functions $\phi_l$ which, because ${\bf P}$ is diagonal, only contain integrations over $z_l$-variables
\begin{displaymath}
0=
\frac{\delta P_l}{\delta \phi_l}
{\bf P}^{-1} N
-\sum_k ...
...{lk} (\phi_k-t_k)
-\frac{\delta P_l}{\delta \phi_l} \Lambda_X.
\end{displaymath} (400)

For the parameterized approach (392) one finds

\begin{displaymath}
0 = \Phi_l^\prime {\bf P}^\prime_l {\bf P}^{-1} N
-\Phi_l^\prime {{\bf K}} \phi
-\Phi_l^\prime {\bf P}^\prime_l \Lambda_X
,
\end{displaymath} (401)

with
\begin{displaymath}
\Phi_l^\prime (k,z_l)
= \frac{\partial \phi_l(z_l)}{\partial \xi_k}
.
\end{displaymath} (402)

For the ansatz (393) $\Phi_l^\prime (k,z)$ would be restricted to a subset of $\xi_k$.


next up previous contents
Next: Product ansatz Up: Parameterizing likelihoods: Variational methods Previous: Mixture models   Contents
Joerg_Lemm 2001-01-21