Example 02 – Simple model fits
Aims
Create a
LikelihoodCalculator
with the response matrix and experiment dataCalculate likelihoods of model predictions
Create a
HypothesisTester
and calculate p-valuesFit parameters and calculate p-values of composite hypotheses
Construct confidence intervals of parameters of composite hypotheses
Instructions
The calculation of likelihoods and p-values is handled by the classes in the
likelihood
module:
import numpy as np
from remu import binning
from remu import plotting
from remu import likelihood
Some calculations handled in this module can be parallelized by setting the
mapper
function to something that uses parallel processes or threads, e.g.
the map function of a multiprocess.Pool
object:
from multiprocess import Pool
pool = Pool(8)
likelihood.mapper = pool.map
This is completely optional, but can speed up the calculation of p-values
considerably. Please not the use of the multiprocess
package, instead of
Python’s native multiprocessing
. The latter does not support the pickling
of arbitrary functions, so it does not work.
First we will create DataModel
and ResponseMatrixPredictor
objects from the information of the previous examples:
response_matrix = "../01/response_matrix.npz"
with open("../01/reco-binning.yml", 'rt') as f:
reco_binning = binning.yaml.full_load(f)
with open("../01/optimised-truth-binning.yml", 'rt') as f:
truth_binning = binning.yaml.full_load(f)
reco_binning.fill_from_csv_file("../00/real_data.txt")
data = reco_binning.get_entries_as_ndarray()
data_model = likelihood.PoissonData(data)
matrix_predictor = likelihood.ResponseMatrixPredictor(response_matrix)
The data model knows how to compare event rate predictions to the data and
calculate the respective likelihoods, in this case using Poisson statistics.
The matrix predictor contains the information of the previously built response
matrix and is used to predict reco-space event rates from truth-space event
rates. These two can now be combined into a LikelihoodCalculator
and
HypothesisTester
:
calc = likelihood.LikelihoodCalculator(data_model, matrix_predictor)
test = likelihood.HypothesisTester(calc)
Likelihood calculators are in charge of computing likelihoods of parameter sets. In this case, it will calculate the likelihoods of truth-space event-rates, as that is what the predictor is expecting as parameters. Hypothesis testers use the likelihood calculator to do statistical tests and calculate p-values.
Now we need some models to test against the data. We will use the models A and B of the previous steps, but we will turn them into area-normalised templates:
truth_binning.fill_from_csv_file("../00/modelA_truth.txt")
modelA = truth_binning.get_values_as_ndarray()
modelA /= np.sum(modelA)
truth_binning.reset()
truth_binning.fill_from_csv_file("../00/modelB_truth.txt")
modelB = truth_binning.get_values_as_ndarray()
modelB /= np.sum(modelB)
Let us calculate some likelihoods and p-values with these templates, assuming total of 1000 expected events in the truth space (i.e. before efficiency effects):
print(calc(modelA*1000))
print(test.likelihood_p_value(modelA*1000))
print(calc(modelB*1000))
print(test.likelihood_p_value(modelB*1000))
-27.800234247380704
0.1968
-24.480283069694824
0.6804
The exact results may vary due to statistical fluctuations. Especially the p-values are calculated by generating random data sets assuming the tested model is true. The fraction of data sets with a worse likelihood than the actual measured one is the p-value. Depending on the required confidence level, we could exclude the “1000*A” hypothesis, while the “1000*B” hypothesis is more compatible with the data.
Models that predict the true distribution of events usually have some free parameters. For example we could assume that the shapes of models A and B are well motivated but the total number of events is not well predicted. To test these more flexible models, we can create predictors that take the free parameters of the models as inputs and predict event rates in truth space. By composing (i.e. “chaining”) these predictors to the matrix predictor, we can then build a likelihood calculator that takes these parameters as inputs directly:
modelA_shape = likelihood.TemplatePredictor([modelA])
modelA_reco_shape = matrix_predictor.compose(modelA_shape)
calcA = likelihood.LikelihoodCalculator(data_model, modelA_reco_shape)
This example uses the TemplatePredictor
class, which takes a list of
templates as its initialisation parameter and creates a predictor with one
template weight parameter per template. Since we only provide one template
here, it only takes one parameter.
We can now do a maximum likelihood fit with the model, using a
BasinHoppingMaximiser
:
maxi = likelihood.BasinHoppingMaximizer()
retA = maxi(calcA)
print(retA)
fun: 25.257278617610154
log_likelihood: -25.257278617610154
lowest_optimization_result: fun: 25.257278617610154
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([6.75015063e-06])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 20
nit: 4
njev: 10
status: 0
success: True
x: array([900.29272504])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 0
nfev: 140
nit: 10
njev: 70
success: True
x: array([900.29272504])
The parameter values of the maximum likelihood solution are returned as
ret.x
. The actual maximum log likelihood is stored in
ret.log_likelihood
. The other properties of the returned object show the
status of the optimisation and are not important for this example.
Instead of composing the predictors and building a new likelihood calculator with the result, it is also possible to directly compose the model predictor with the likelihood calculator:
modelB_shape = likelihood.TemplatePredictor([modelB])
calcB = calc.compose(modelB_shape)
retB = maxi(calcB)
print(retB)
fun: 24.439068081734
log_likelihood: -24.439068081734
lowest_optimization_result: fun: 24.439068081734
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([7.10542172e-07])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 18
nit: 4
njev: 9
status: 0
success: True
x: array([986.50497796])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 0
nfev: 104
nit: 10
njev: 52
success: True
x: array([986.50497796])
The maximum likelihood solutions for model A shows a lower number of events
than that of model B. This is due to the higher average efficiency of
reconstructing the events of model A, i.e. their distribution in y
. The
maximum log likelihood of model B is higher than for model A. So model B is
able to describe the given data better than model A. This is also reflected in
the p-values:
testA = likelihood.HypothesisTester(calcA, maximizer=maxi)
testB = likelihood.HypothesisTester(calcB, maximizer=maxi)
print(testA.max_likelihood_p_value())
print(testB.max_likelihood_p_value())
0.428
0.54
Here we explicitly told the hypothesis testers which maximiser to use, but this is optional.
Again the p-value is calculated from randomly generated data sets assuming the given model is true. This time it is the ratio of data sets that yield a worse maximum likelihood though. This means a fit is performed for each mock data set.
We can also take a qualitative look at the fit of data and the two models by plotting the result in reco space:
pltr = plotting.get_plotter(reco_binning)
pltr.plot_entries(label='data', hatch=None)
modelA_reco, modelA_weights = modelA_reco_shape(retA.x)
modelA_logL = calcA(retA.x)
modelA_p = testA.likelihood_p_value(retA.x)
modelB_reco, modelB_weights = calcB.predictor(retB.x)
modelB_logL = calcB(retB.x)
modelB_p = testB.likelihood_p_value(retB.x)
pltr.plot_array(modelA_reco,
label='model A: $\log L=%.1f$, $p=%.3f$'%(modelA_logL, modelA_p),
hatch=None, linestyle='dashed')
pltr.plot_array(modelB_reco,
label='model B: $\log L=%.1f$, $p=%.3f$'%(modelB_logL, modelB_p),
hatch=None, linestyle='dotted')
pltr.legend(loc='lower center')
pltr.savefig("reco-comparison.png")
The p-value shown in this plot is again the likelihood_p_value()
(as
opposed to the max_likelihood_p_value()
). This is a better
representation of the goodness of fit of the maximum likelihood solution,
roughly equivalent to checking for a “chi-square” close to the number of bins.
The return values of the predictors are formated to accomodate multiple weighted predictions per set of parameters, i.e. systematic incertainties. We can ignore the weights for now.
Usually there is more than one template to be fitted to the data. Let’s see what happens if we allow combinations of model A and B:
mix_model = likelihood.TemplatePredictor([modelA, modelB])
calc_mix = calc.compose(mix_model)
ret = maxi.maximize_log_likelihood(calc_mix)
print(ret)
fun: 23.818800345296474
log_likelihood: -23.818800345296474
lowest_optimization_result: fun: 23.818800345296474
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([0.00000000e+00, 2.84216869e-06])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 87
nit: 4
njev: 29
status: 0
success: True
x: array([351.46741373, 601.38025556])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 0
nfev: 273
nit: 10
njev: 91
success: True
x: array([351.46741373, 601.38025556])
test = likelihood.HypothesisTester(calc_mix)
print(test.max_likelihood_p_value())
0.664
The two parameters of this new combined model are the weights of model A and B respectively. This allows a contribution of model A in the maximum likelihood solution.
It might be useful to calculate a confidence interval for a parameter embedded in a larger hypothesis with more parameters. This can be done by fixing that parameter at different values (reducing the number of free parameters) and calculating the likelihood ratio of this new embedded hypothesis and the embedding original hypothesis. Comparing this likelihood ratio with the expected distribution of likelihood ratios assuming the embedded hypothesis is true yields p-values that can be used to construct the confidence interval:
p_values = []
A_values = np.linspace(0, 1000, 11)
for A in A_values:
p = test.max_likelihood_ratio_p_value((A,None))
print(A, p)
p_values.append(p)
Calculating these might take a while. The method
max_likelihood_ratio_p_value()
fixes the specified parameters and
generates toy data sets at the best fit point of the remaining parameters. It
then computes the maximum likelihoods for both the fixed and unfixed version of
the predictors for all toy data sets. The fraction of toy data sets with an
equal or worse maximum likelihood ratio then the real data is the p-value.
This p-value is sometimes called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis’ (nuisance) parameters to generate the toy data and calculate the p-value. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data).
In the limit of “large statistics”, the maximum log likelihood ratio should be distributed like a chi-square distribution, according to Wilks’ theorem. This can be used to speed up the calculation of p-values considerably, as it skips the generation of fit to toy data sets:
wilks_p_values = []
fine_A_values = np.linspace(0, 1000, 100)
for A in fine_A_values:
p = test.wilks_max_likelihood_ratio_p_value((A,None))
print(A, p)
wilks_p_values.append(p)
This can then be plotted with your usual plotting libraries:
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
ax.set_xlabel("Model A weight")
ax.set_ylabel("p-value")
ax.plot(A_values, p_values, label="Profile plug-in")
ax.plot(fine_A_values, wilks_p_values, label="Wilks")
ax.axvline(ret.x[0], color='k', linestyle='solid')
ax.axhline(0.32, color='k', linestyle='dashed')
ax.axhline(0.05, color='k', linestyle='dashed')
ax.legend(loc='best')
fig.savefig("p-values.png")
The confidence interval is the region of the parameter space with a p-value over the desired test significance. The maximum likelihood solution is shown as vertical line. Please note that this assumes that the full hypothesis with no fixed parameters is true, i.e. that some combination of model A and model B templates actually describes reality.