Maximum posterior approximation

Next: Numerical case studies in Up: Prior models for potentials Previous: Average energy Contents

Maximum posterior approximation

Let us consider prior densities being a product of a Gaussian prior as in Eq. (34), or more general a mixture of Gaussian processes as in Eq. (40), and a non-Gaussian energy prior $p_{{}_{\scriptsize U}}$ of the form of Eq. (43). In that case, the stationarity equation we have to solve to maximize the posterior density of Eq. (33) reads

$\begin{displaymath} 0 = \delta_{v(x)} \ln p_0(v) +\delta_{v(x)} \ln p_{{}_{\scr... ...}}(v) %%p_U(v) +\sum_i \delta_{v(x)} \ln p(x_i\vert\hat x,v) . \end{displaymath}$

(44)

While $\delta_{v(x)} p(x_i\vert\hat x,v)$ has already been calculated in Sect. 3.1.3, we now need also $\delta_{v(x)} p_0(v)$ and $\delta_{v(x)} p_{{}_{\scriptsize U}}(v)$ For a Gaussian

the functional derivative is easily found to be

$\begin{displaymath} \delta_{v} \ln p_0 = \frac{\delta_{v} p_0}{p_0} = -\lambda{\bf K}_0(v-v_0) . \end{displaymath}$

(45)

Similarly, for a mixture of Gaussian processes it is not difficult to show that

$\begin{displaymath} \delta_{v} \ln p_0 =-\lambda \sum_k^M p_0(k\vert v) {\bf K}_k (v-v_k) \end{displaymath}$

(46)

where $p_0(k\vert v)$ = ${p_0(v,k)}/{p_0(v)}$ .

To get the functional derivative of the non-Gaussian we calculate first

$\begin{displaymath} \delta_{v(x)} U = <\delta_{v(x)} E> -\beta <E\, \delta_{v(x)} E> +\beta <E> <\delta_{v(x)} E> . \end{displaymath}$

(47)

As $\delta_{v(x)} E_\alpha$ has been found in Eq. (29) this yields

$\begin{displaymath} \delta_{v(x)} E_U = \mu\left(U-\kappa\right) \left(<\vert\... ...\phi (x)\vert^2> -U <\vert\phi (x)\vert^2> \right) \right) . \end{displaymath}$

(48)

Collecting all terms, we can now solve the stationarity equation (44) by iteration

$\begin{displaymath} v^{\rm new} = v^{\rm old}\! + \eta {\bf A}^{-1}\left( \lamb... ...lta_x\ln p(x_i\vert\hat x,v^{\rm old}) -\delta_x E_U \right) , \end{displaymath}$

(49)

where we introduced ${\bf K}_0$ = $\sum_k p_0(k\vert v)\,{\bf K}_k$ and

= ${\bf K}_0^{-1} \sum_k p_0(k\vert v)\, {\bf K}_k v_k$ and a step width $\eta$ and positive definite iteration matrix ${\bf A}$ has to be selected. Choosing ${\bf A}$ as the identity matrix means moving in the direction of the gradient of the posterior. Taking for ${\bf A}$ the Hessian one obtains the Newton method. Quasi-Newton methods, like the DFP (Davidon-Fletcher-Powell) or BFGS (Broyden-Fletcher-Goldfarb-Shanno) variable metric methods, approximate the Hessian iteratively [68-72] (For the case of solving for continuous functions see [73].)

A simple and useful choice in our case is ${\bf A}$ = ${\bf K}_0$ which approximates the Hessian. For a single Gaussian prior this choice does not depend on and has thus not to be recalculated during iteration. Eq. (49) then becomes

$\begin{displaymath} v^{\rm new} = (1-\eta) v^{\rm old}\! + \eta \left( v_0\! +(... ...ln p(x_i\vert\hat x,v^{\rm old}) -\delta_x E_U \Big) \right) . \end{displaymath}$

(50)

Due to the nonparametric approach for the potential combined with a priori information implemented as stochastic process, the Bayesian approach formulated in the previous sections is clearly computationally demanding. The situation for inverse quantum theory is worse than, e.g., for Gaussian process priors in regression problems (i.e., for a Gaussian likelihood, local in the regression function) where it is only necessary to work with matrices having a dimension equal to the number of training data [16,48]. In our case, where the likelihood is nonlocal in the potential and also non-Gaussian prior terms may occur, the stationarity equation has to be solved be discretizing the problem.

The following section demonstrates that at least one-dimensional problems can be solved numerically without further approximation. Higher dimensional problems, however, e.g., for many-body systems, require additional approximations. In such higher dimensional situations, the potential may be parameterized (without skipping necessarily the prior terms) or the problem has to be divided in lower dimensional subproblems, e.g., by restricting to certain (typically, additive or multiplicative) combinations of lower dimensional functions. (Similarly, for example to additive models [19] projection pursuit [20] or neural network like [22] approaches.)

Next: Numerical case studies in Up: Prior models for potentials Previous: Average energy Contents

Joerg_Lemm 2000-06-06