Example 04 – Markov Chain Monte Carlo¶

Aims¶

Use mcmc() to create a PyMC MCMC object
Use Markov Chain Monte Carlo to create a posterior sample of CompositeHypothesis parameters
Compare two hypotheses using marginal_log_likelihood() to calculate the Bayes factor
Compare two hypotheses using plr() to calculate the Posterior Distribution of the Likelihood Ratio
Use a JeffreysPrior

Instructions¶

ReMU has built-in support for generating Markov Chain Monte Carlo (MCMC) samples using the PyMC package. This example will only give a basic example of how to use this package in conjunction with ReMU. For detailed description of PyMC’s functionality, please refer to its documentation:

https://pymcmc.readthedocs.io/en/latest/

Before we can use the MCMC, we have to create the response matrix, LikelihoodMachine, just like in the previous examples:

from six import print_
import numpy as np
from matplotlib import pyplot as plt
from remu import binning
from remu import likelihood
import pymc

with open("../01/reco-binning.yml", 'rt') as f:
    reco_binning = binning.yaml.load(f)
with open("../01/optimised-truth-binning.yml", 'rt') as f:
    truth_binning = binning.yaml.load(f)

reco_binning.fill_from_csv_file("../00/real_data.txt")
data = reco_binning.get_entries_as_ndarray()

response_matrix = "../03/response_matrix.npz"
lm = likelihood.LikelihoodMachine(data, response_matrix, limit_method='prohibit')

Because the MCMC is a Bayesian method, we need to define prior probabilities for the parameters of the CompositeHypothesis we want to test. Explaining how to choose an “objective” prior (or defining what that even is), is out of scope of this example. Let us just use a simple flat prior for the template weights. Because we already know that the total number of events is somewhere around 1000, we will constrain the prior to the interval (0, 2000):

truth_binning.fill_from_csv_file("../00/modelA_truth.txt")
modelA = truth_binning.get_values_as_ndarray()
modelA /= np.sum(modelA)
truth_binning.reset()
truth_binning.fill_from_csv_file("../00/modelB_truth.txt")
modelB = truth_binning.get_values_as_ndarray()
modelB /= np.sum(modelB)

def flat_prior(value=100):
    if value >= 0 and value <= 2000:
        return 0
    else:
        return -np.inf

modelA_shape = likelihood.TemplateHypothesis([modelA], parameter_priors=[flat_prior],
    parameter_names=["template_weight"])
modelB_shape = likelihood.TemplateHypothesis([modelB], parameter_priors=[flat_prior],
    parameter_names=["template_weight"])

The function defining the prior must take one keyword argument value and return the log of the prior probability. It must define a default value for value, which must lie inside the allowed region of the parameter space. It is used as the starting point for the MCMC evaluation. Since we are creating TemplateHypothesis with only one template each, the parameter is only a single floating point number: the template weight. If we used multiple templates, the parameter would be a 1D array with all template weights.

With these CompositeHypothesis we can now create a MCMC object and sample the posterior distribution of the parameter (see PyMC documentation):

mcmcA = lm.mcmc(modelA_shape)

mcmcA.sample(iter=1000)
pymc.Matplot.plot(mcmcA, suffix='_noburn')

../../_images/template_weight_noburn.png

One can clearly see the effect of starting point in the top left trace of the sample. Depending on how long it takes the MCMC to converge to the true posterior distribution, this kind of effect can seriously bias your results. To remove the biased part from the trace, we can use the burn argument. It tells the sampler to just throw away the specified number of samples at the beginning of the trace:

mcmcA.sample(iter=1000, burn=100)
pymc.Matplot.plot(mcmcA, suffix='_burn')

The bottom left plot still shows a considerable auto-correlation in the trace, meaning it is not a true random sample of the posterior distribution. This can be “fixed” with the thin argument. It tells the sampler to only store every Nth sample of the trace. Of course, since this means we have to generate more samples accordingly:

mcmcA.sample(iter=1000*10, burn=100, thin=10)
pymc.Matplot.plot(mcmcA, suffix='_A')

The histogram on the right now shows a good sample of the posterior distribution of the template weight. The solid vertical line shows the median of the sample. The dotted lines show the Highest Posterior Density (HPD) 95% credible interval.

The MCMC methods will not only return the template_weight parameter (which we have named in the creation of the CompositeHypothesis), but also for a variable called toy_index:

This variable is created automatically when calling mcmc(). For the MCMC, the choice of which of the toy response matrices to use is just another random parameter. So if the data is able to constrain the detector response, it will naturally happen during the sampling.

With the above settings we can do the same for model B:

mcmcB = lm.mcmc(modelB_shape)

mcmcB.sample(iter=1000*10, burn=100, thin=10)
pymc.Matplot.plot(mcmcB, suffix='_B')

The credible intervals and more can also be requested from the MCMC object directly:

print_(mcmcA.stats())
print_(mcmcB.stats())

{'toy_index': {'95% HPD interval': array([ 6., 98.]), 'n': 2890, 'quantiles': {2.5: 5.0, 25: 21.0, 50: 47.0, 75: 68.0, 97.5: 98.0}, 'standard deviation': 27.457854834276297, 'mc error': 0.9383861111623771, 'mean': 46.569204152249135}, 'template_weight': {'95% HPD interval': array([ 817.39586919, 1048.72264546]), 'n': 2890, 'quantiles': {2.5: 807.4967925851467, 25: 887.7430157657353, 50: 925.0862201449331, 75: 960.83055487454, 97.5: 1046.377713651184}, 'standard deviation': 68.71622360903692, 'mc error': 4.213118594059102, 'mean': 922.9383071702083}}
{'toy_index': {'95% HPD interval': array([ 6., 98.]), 'n': 990, 'quantiles': {2.5: 2.0, 25: 23.0, 50: 47.0, 75: 72.0, 97.5: 98.0}, 'standard deviation': 28.162040125498738, 'mc error': 0.9389776683864186, 'mean': 47.5010101010101}, 'template_weight': {'95% HPD interval': array([ 908.25504415, 1152.34693071]), 'n': 990, 'quantiles': {2.5: 909.6416808930901, 25: 985.5794604878171, 50: 1026.6923545918248, 75: 1070.1641984464632, 97.5: 1165.1074864018794}, 'standard deviation': 63.52466760250271, 'mc error': 2.5950428337944085, 'mean': 1028.6273729632112}}

If the information you require is not part of the standard output of stats(), you can of course access the trace directly and do your own analyses:

traceA = mcmcA.trace('template_weight')[:]
traceB = mcmcB.trace('template_weight')[:]
print_(np.percentile(traceA, [2.5, 16., 50, 84, 97.5]))
print_(np.percentile(traceB, [2.5, 16., 50, 84, 97.5]))

[ 827.86829661  873.71701564  928.41368425  982.15358462 1039.65978533]
[ 909.9407182   966.49689809 1026.591219   1090.97850769 1163.12892536]

Compare these central 1 and 2 sigma credible intervals with the confidence intervals of the previous examples.

To understand how the data favours one model over the other, we can calculate the (log of the) Bayes factor:

toysA = mcmcA.trace('toy_index')[:]
toysB = mcmcB.trace('toy_index')[:]
tA = traceA[:,np.newaxis]
iA = toysA[:,np.newaxis]
tB = traceB[:,np.newaxis]
iB = toysB[:,np.newaxis]

print_(lm.marginal_log_likelihood(modelA_shape, tA, iA)
    - lm.marginal_log_likelihood(modelB_shape, tB, iB))

-1.6202502816378512

Model B is strongly favoured. This Bayes factor used the posterior average likelihood. Usually the prior average likelihood is used to avoid “using the data twice”. We can do this by creating a MCMC object that ignores the data and sampling from it:

mcmcA_prior = lm.mcmc(modelA_shape, prior_only=True)
mcmcA_prior.sample(iter=1000*10, burn=100, thin=10)
mcmcB_prior = lm.mcmc(modelB_shape, prior_only=True)
mcmcB_prior.sample(iter=1000*10, burn=100, thin=10)
traceA_prior = mcmcA_prior.trace('template_weight')[:]
traceB_prior = mcmcB_prior.trace('template_weight')[:]
toysA_prior = mcmcA_prior.trace('toy_index')[:]
toysB_prior = mcmcB_prior.trace('toy_index')[:]
tAp = traceA_prior[:,np.newaxis]
iAp = toysA_prior[:,np.newaxis]
tBp = traceB_prior[:,np.newaxis]
iBp = toysB_prior[:,np.newaxis]

print_(lm.marginal_log_likelihood(modelA_shape, tAp, iAp)
    - lm.marginal_log_likelihood(modelB_shape, tBp, iBp), file=f)

-1.553604449794868

Model B is still strongly favoured.

Another possible value to check is the Posterior distribution of the Likelihood Ratio (PLR):

ratios, preference = lm.plr(modelA_shape, tA, iA, modelB_shape, tB, iB)
print_(preference)

0.897689011325

The preference measures the probability of the likelihood ratio preferring model B over model A, given the sampled distribution of the model parameters and the data. It shows a clear preference of model B.

As done in previous examples, let us now consider a model where both model A and model B can contribute with different weights:

mixed_model = likelihood.TemplateHypothesis([modelA, modelB])

This kind of model already makes the choice of prior non-obvious. Should it be flat in each weight? Should it be flat in the sum of weights? What if we re-parametrise the model with the sum and difference of the models as templates? As mentioned before, there is no universally agreed, one-size-fits-all best choice for the prior. Aside from some obviously bad choices, the prior should not matter that much when the data is actually able to constrain your model well.

At least the problem of parametrisation dependence can be solved by using a Jeffreys Prior. It yields identical posterior distributions, no matter how you parametrise your model. ReMU implements it in the JeffreysPrior class:

prior = likelihood.JeffreysPrior(
    response_matrix = response_matrix,
    translation_function = mixed_model.translate,
    parameter_limits = [(0,None), (0,None)],
    default_values = [10., 10.],
    total_truth_limit = 2000)

mixed_model.parameter_priors = [prior]
mixed_model.parameter_names = ["weights"]

It needs to know the used response matrices, the function that translates the model parameters into truth bin values, the limits of those parameters, and their default values (which are used as starting points for the MCMC). Optionally one can provide a total limit of the number of true events. All combinations of the parameters that predict more events than that get a prior probability of 0. This can be used to ensure that the prior is “proper”, i.e. it can be normalised to 1. Note that the vector of template weights is treated as a single parameter by the TemplateHypothesis. This is why we can use a single JeffreysPrior for it.

We can take a look at what the prior distribution actually looks like, by creating a MCMC object that ignores the data and sampling from it:

mcmc = lm.mcmc(mixed_model, prior_only=True)
mcmc.sample(iter=25000, burn=1000, tune_throughout=True, thin=25)
pymc.Matplot.plot(mcmc, suffix='_prior')

Since the variable weights is now a vector of floating point values, each value gets its own plot and is denoted with the suffix _i, where i is the index of the value. The template weight of model A is weights_0, the weight of model B is weights_1.

Now let us see what the posterior looks like when we take the actual data into account:

mcmc = lm.mcmc(mixed_model)
mcmc.sample(iter=250000, burn=10000, tune_throughout=True, thin=250)
pymc.Matplot.plot(mcmc, suffix='_mixed')

Compare the 95% credible interval of weights_0 with the confidence interval we constructed in a previous example.

PyMC will give you these 1D histograms for all parameters in your models. For two parameters, we can still plot everything into a single histogram though:

weights = mcmc.trace('weights')[:]
fig, ax = plt.subplots()
ax.hist2d(weights[:,0],weights[:,1], bins=20)
ax.set_xlabel("model A weight")
ax.set_ylabel("model B weight")
fig.savefig("posterior.png")

The preference for model B is quite clear. The trace contains all information about the posterior distribution of the model parameters, and we can do calculations with it. For example, let us take a look at the posterior probability of the sum of weights:

fig, ax = plt.subplots()
ax.hist(weights.sum(axis=1), bins=20)
ax.set_xlabel("sum of template weights")
fig.savefig("sum_posterior.png")

The response matrix (and thus the LikelihoodMachine) is only valid for a certain range of values in the truth bins. If the tested theory predicts more events than were simulated and used to build the matrix, the results are not well predicted. It is thus necessary to test whether the posterior distribution of the truth bin values as predicted by the models is far enough away from those limits:

truth_trace = mixed_model.translate(weights)
toys = mcmc.trace('toy_index')[:]
lm.plot_truth_bin_traces("truth_traces.png", truth_trace, plot_limits='relative')

The box plots show the full range of the sample (vertical lines), the quartiles (boxes), the median (horizontal lines), 5% and 95% percentile (error bars), and the mean values (points). Here it is clear that the posterior distributions are much lower than the limits.

There is a similar method to plot the predicted expectation values in reco space:

lm.plot_reco_bin_traces("reco_traces.png", truth_trace, toy_index=toys, plot_data=True)

This can be used as a visual check of whether the models can cover the measured data at all.