Mathematical models allow us to make predictions. To predict a future event, such as the result of a measurement, we must first derive a relationship between the model and measurand. The problem of deducing predictions from a mathematical model is called the forward problem. In contrast, an inverse problem consists of using experimental data to infer the values of the parameters that characterise our mathematical model.

As an example, suppose that you are in a spacecraft orbiting the Moon. As you orbit, you proceed to make measurements of the Moon’s gravitational ﬁeld.

Now, if you knew the distribution of mass inside the moon, then you could predict the values of your measurements by using Newtonian physics, that is, by solving the forward problem. However, you have been assigned the inverse problem: estimating the distribution of mass inside the moon based on your measurements. Since there are many different mass distributions that would yield exactly the same gravitational field (hence, the same set of measurements), the problem has multiple solutions (in fact, infinitely many solutions).

In order to constrain the number of possible solutions, you might attempt to incorporate any relevant a priori information you have available. For example, you would want to only consider solutions in which the density at any point in the Moon is nonnegative. You may also try to account for any uncertainties associated with the measurement process, so they can be accounted for when quantifying the uncertainty in your inferences. But how exactly would you do all this in a coherent and systematic manner?

A simple, yet powerful method, which is applicable to any kind of inverse problem, can be obtained by taking a probabilistic viewpoint. In this approach, we represent our a priori information by a probability distribution.

Informally, we may view the shape of this distribution as quantifying our current state of belief that a given possible solution is true. If we believed that all solutions were equally likely, then we would represent this by a uniform distribution; if we believed a certain range of solutions we more likely than others, then this range of solutions would have a higher concentration of probability mass. To account for the uncertainties in our measurement process, we must consider both the likelihood that our measurement device makes an error and how these errors distributed. If you think about it, this too can be represented by a probability distribution.

Once we have collected our data, we are able to update our prior beliefs in light of the current observations by an application of Bayes’ rule. Note that our updated beliefs do not correspond to a single solution, but a range of possible solutions, represented (again) as a probability distribution. You might exclaim that we are back to where we started. Not quite. Because our updated beliefs have incorporated all the information at hand, the resulting distribution will have a relatively higher probability mass concentrated around the ‘true’ solution. So while we may not have obtained an exact solution, we have effectively converged on a those which are most probable given our observations and prior knowledge.