Next: Prior models for potentials
Up: The likelihood model of
Previous: Likelihood in the canonical
  Contents
Maximum likelihood approximation
A maximum likelihood approach selects
the potential with
maximal likelihood
under the training data.
Beginning with a discussion of the parametric approach
we consider a potential
parameterized by a parameter vector
with components .
To find the parameter vector which maximizes the
training likelihood
we have to solve the stationarity equation
|
(21) |
=
denoting the gradient operator
with components
.
Obtaining from Eq. (20)
we see that to solve Eq. (21)
we have to calculate the derivatives of the eigenvalues
and of
the eigenfunctions at the data points
.
Those are implicitly defined by the eigenvalue equation for = .
To proceed we
take the derivative of the eigenvalue equation
(18)
|
(23) |
Projecting onto
yields, using
=
and the hermitian conjugate of Eq. (18)
we arrive at
Because all orbitals with energy
(which may be more than one if is degenerated)
are in the null space of the operator ,
Eq. (25) alone does not determine
uniquely.
We also notice, that because the
left hand side of Eq. (25) vanishes if projected
on a eigenfunction
with =
we find for degenerate eigenfunctions
= 0,
if we choose
=
.
A unique solution for
can be obtained
be setting
= 0
for eigenfunctions with = .
This corresponds to fixing normalization and phase of eigenfunctions
and, in case of degenerate eigenvalues, uses the freedom to
work with arbitrary, orthonormal linear combinations
of the corresponding eigenfunctions.
Because the operator is invertible in the space
spanned by all eigenfunctions with
different energy
, this yields,
using orthonormal eigenfunctions,
|
(26) |
For nondegenerated energies the sum becomes
.
The stationarity equation (21)
can now be solved iteratively by
starting from an initial guess for ,
calculating and
to obtain
and
from Eqs. (24,25)
and thus
from Eq. (22).
Then a new guess for is calculated
(switching to log-likelihoods)
|
(27) |
with some step width and some positive definite
operator (approximating for example the Hessian of
).
This procedure is now iterated till convergence.
While a parametric approach restricts the space of possible potentials ,
a nonparametric approach
treats each function value itself
as individual degree of freedom,
not restricting the space of potentials.
The corresponding nonparametric stationarity equation is obtained
analogous to
the parametric stationarity equation (21)
replacing
partial derivatives with the functional derivative operator
=
with components
=
[59].
Because the functional derivative of is simply
|
(28) |
we get, using the same arguments leading to Eq. (26)
and therefore
(The partial derivative with respect to parameters and the
functional derivative with respect to are related
according to the chain rule
=
=
with operator
=
.)
The large flexibility of the nonparametric approach
allows an optimal adaption of to the available training data.
However, as it is well known in the context of learning
it is the same flexibility
which makes a satisfactory generalization
to non-training data (e.g., in the future)
impossible,
leading, for example, to `pathological',
-functional like solutions.
Nonparametric approaches
require therefore additional restrictions in form
of a priori information.
In the next section we will include
a priori information in form of stochastic processes,
similarly to Bayesian statistics with Gaussian processes
[16, 37, 44, 48, 60-63] or to classical regularization theory
[2,4,16].
In particular, a priori information will be implemented
explicitly, by which we mean
it will be expressed directly in terms of the function values itself.
This is a great advantage over parametric methods
where a priori information
is implicit in the chosen parameterization,
thus typically difficult or impossible to analyze
and not easily adapted to the situation under study.
Indeed, because it is only a priori knowledge
which relates training to non-training data,
its explicit control is essential
for any successful learning.
Next: Prior models for potentials
Up: The likelihood model of
Previous: Likelihood in the canonical
  Contents
Joerg_Lemm
2000-06-06