Example 03 – Detector uncertainties¶
Aims¶
- Generate random detector response variations
- Include detector uncertainties in the response matrix by using a
ResponseMatrixArrayBuilder
- Include statistical uncertainties by generating randomly varied matrices
- Use array of response matrices in
LikelihoodMachine
Instructions¶
The properties of experimental setups are usually only known to a finite precision. The remaining uncertainty on these properties, and thus the uncertainty on the detector response to true events, must be reflected in the response matrix.
We can create an event-by-event variation of the detector response with the provided script:
$ ../simple_experiment/vary_detector.py ../00/modelA_data.txt modelA_data.txt
$ ../simple_experiment/vary_detector.py ../00/modelB_data.txt modelB_data.txt
It creates 100 randomly generated variations of the simulated detector and varies the simulation that was created in example 00 accordingly. Each variation describes one possible real detector and is often called a “toy variation” or a “universe”.
Events are varied in two ways: The values of reconstructed variables can be different for each toy, and each event can get assigned a weight. The former can be used to cover uncertainties in the detector precision and accuracy, while the latter is a simple way to deal with uncertainties in the overall probability of reconstructing an event.
In this case, the values of the reconstructed reco_x
are varied and saved
as reco_x_0
to reco_x_99
. The detection efficiency is also varied and
the ratio of the events nominal efficiency and the efficiency assuming the toy
detector stored as weight_0
up to weight_99
. Let us plot the different
reconstructed distributions we get from the toy variations:
from remu import binning
import numpy as np
with open("../01/reco-binning.yml", 'r') as f:
reco_binning = binning.yaml.load(f)
# Get real data
reco_binning.fill_from_csv_file("../00/real_data.txt")
data = reco_binning.get_values_as_ndarray()
# Get toys
n_toys = 100
toyA = []
toyB = []
for i in range(n_toys):
reco_binning.reset()
reco_binning.fill_from_csv_file("modelA_data.txt",
weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'},
buffer_csv_files=True)
toyA.append(reco_binning.get_values_as_ndarray())
reco_binning.reset()
reco_binning.fill_from_csv_file("modelB_data.txt",
weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'},
buffer_csv_files=True)
toyB.append(reco_binning.get_values_as_ndarray())
# Plot
figax = None
for A, B in zip(toyA, toyB):
figax = reco_binning.plot_ndarray(None, A/10.,
kwargs1d={'alpha':0.05, 'color':'b'}, figax=figax)
figax = reco_binning.plot_ndarray(None, B/10.,
kwargs1d={'alpha':0.05, 'color':'r'}, figax=figax)
A = np.mean(toyA, axis=0)
B = np.mean(toyB, axis=0)
figax = reco_binning.plot_ndarray(None, A/10.,
kwargs1d={'color':'b', 'label':"Model A"}, figax=figax)
figax = reco_binning.plot_ndarray(None, B/10.,
kwargs1d={'color':'r', 'label':"Model B"}, figax=figax)
figax = reco_binning.plot_ndarray("data.png", data,
kwargs1d={'color':'k', 'label':"Data"}, sqrt_errors=True, figax=figax)
Here we plot the data with sqrt(N)
“error bars” just to give an indication
of the expected statistical fluctuations. Also we scaled the model predictions
by a factor 10, simply because that makes them correspond roughly to the data.
In this example, the systematic uncertainties seem to be of a similar size to
the expected statistical variations.
We are using a few specific arguments to the fill_from_csv_file
method
here:
weightfield
- Tells the object which field to use as the weight of the events. Does not need to follow any naming conventions. Here we tell it to use the field corresponding to each toy.
rename
- The binning of the response matrix expects a variable called
reco_x
. This variable is not present in the varied data sets. Here we can specify a dictionary with columns in the data that will be renamed before filling the matrix. buffer_csv_files
- Reading a CSV file from disk and turning it into an array of floating point values takes quite a bit of time. We can speed up the process by buffering the intermediate array on disk. This means the time consuming parsing of a text file only has to happen once per CSV file.
ReMU handles the detector uncertainties by building a response matrix for each
toy detector separately. These matrices are then used in parallel to calculate
likelihoods. In the end the marginal (i.e. average) likelihood is used as
the final answer. Some advanced features of ReMU (like sparse matrices, or
nuisance bins) require quite a bit of book-keeping to make the toy matrices
behave consistently and as expected. This is handled by the
ResponseMatrixArrayBuilder
:
import matplotlib.pyplot as plt
from scipy import stats
from remu import migration
from copy import deepcopy
builder = migration.ResponseMatrixArrayBuilder(1)
Its only argument is the number of randomly drawn statistical variations per toy response matrix. The number of simulated events that are used to build the response matrix influences the statistical uncertainty of the response matrix elements. This can be seen as an additional “systematic” uncertainty of the detector response. By generating randomly fluctuated response matrices from the toy matrices, it can be handled organically with the other systematics. In this case we generate 1 randomly varied matrix per toy matrix, yielding a total of 100 matrices in the end.
The toy matrices themselves are built like the nominal matrix in the previous examples and then added to the builder:
with open("../01/reco-binning.yml", 'rt') as f:
reco_binning = binning.yaml.load(f)
with open("../01/optimised-truth-binning.yml", 'rt') as f:
truth_binning = binning.yaml.load(f)
resp = migration.ResponseMatrix(reco_binning, truth_binning)
n_toys = 100
for i in range(n_toys):
resp.reset()
resp.fill_from_csv_file(["modelA_data.txt", "modelB_data.txt"],
weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'},
buffer_csv_files=True)
resp.fill_up_truth_from_csv_file(["../00/modelA_truth.txt",
"../00/modelB_truth.txt"], buffer_csv_files=True)
builder.add_matrix(resp)
The builder generates the actual array of floating point numbers as soon as the
ResponseMatrix
object is added. It retains no connection to the object
itself.
Now we just need to save the set of response matrices for later use in the likelihood fits:
M = builder.get_random_response_matrices_as_ndarray()
np.save("response_matrix.npy", M)
entries = builder.get_truth_entries_as_ndarray()
np.save("generator_truth.npy", entries)
The LikelihoodMachine
is created just like in the previous example:
from six import print_
from matplotlib import pyplot as plt
from remu import likelihood
with open("../01/reco-binning.yml", 'rt') as f:
reco_binning = binning.yaml.load(f)
with open("../01/optimised-truth-binning.yml", 'rt') as f:
truth_binning = binning.yaml.load(f)
reco_binning.fill_from_csv_file("../00/real_data.txt")
data = reco_binning.get_entries_as_ndarray()
# No systematics LikelihoodMachine
response_matrix = np.load("../01/response_matrix.npy")
generator_truth = np.load("../01/generator_truth.npy")
lm = likelihood.LikelihoodMachine(data, response_matrix,
truth_limits=generator_truth, limit_method='prohibit')
# Systematics LikelihoodMachine
response_matrix_syst = np.load("response_matrix.npy")
generator_truth_syst = np.load("generator_truth.npy")
response_matrix_syst.shape = (np.prod(response_matrix_syst.shape[:-2]),) \
+ response_matrix_syst.shape[-2:]
lm_syst = likelihood.LikelihoodMachine(data, response_matrix_syst,
truth_limits=generator_truth_syst, limit_method='prohibit')
To show the influence of the systematic uncertainties, we create two
LikelihoodMachine
objects here. One with the average detector
response and one with the set of systematically varied responses. The set of
response matrices is saved as an A x B x X x Y
shaped NumPy array, with the
number of systematic toy variations A
, the number of randomly drawn
matrices per toy B
, the number of reco bins X
, and the number of truth
bins Y
. To make handling the matrices easier, we reshape the matrices to an
N x X x Y
shaped array here.
Now we can test some models against the data, just like in the previous example:
truth_binning.fill_from_csv_file("../00/modelA_truth.txt")
modelA = truth_binning.get_values_as_ndarray()
modelA /= np.sum(modelA)
truth_binning.reset()
truth_binning.fill_from_csv_file("../00/modelB_truth.txt")
modelB = truth_binning.get_values_as_ndarray()
modelB /= np.sum(modelB)
modelA_shape = likelihood.TemplateHypothesis([modelA])
retA = lm.max_log_likelihood(modelA_shape)
print_(retA)
L: -28.188288113161803
fun: 28.188288113161803
lowest_optimization_result: fun: 28.188288113161803
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([ 0.])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 22
nit: 9
status: 0
success: True
x: array([ 935.84655028])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 3
nfev: 1758
nit: 100
x: array([ 935.84655028])
retA_syst = lm_syst.max_log_likelihood(modelA_shape)
print_(retA_syst)
L: -28.676388171194702
fun: 28.676388171194702
lowest_optimization_result: fun: 28.676388171194702
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([ 0.])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 10
nit: 4
status: 0
success: True
x: array([ 941.15273763])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 1
nfev: 1462
nit: 100
x: array([ 941.15273763])
modelB_shape = likelihood.TemplateHypothesis([modelB])
retB = lm.max_log_likelihood(modelB_shape)
print_(retB)
L: -24.840381628756134
fun: 24.840381628756134
lowest_optimization_result: fun: 24.840381628756134
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([ 9.23705556e-06])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 38
nit: 11
status: 0
success: True
x: array([ 1042.03782852])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 3
nfev: 2156
nit: 100
x: array([ 1042.03782852])
retB_syst = lm_syst.max_log_likelihood(modelB_shape)
print_(retB_syst)
L: -25.701169780679152
fun: 25.701169780679152
lowest_optimization_result: fun: 25.701169780679152
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
jac: array([ -1.42108547e-06])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 14
nit: 6
status: 0
success: True
x: array([ 1039.7300549])
message: ['requested number of basinhopping iterations completed successfully']
minimization_failures: 1
nfev: 1644
nit: 100
x: array([ 1039.7300549])
Let us take another look at how the fitted templates and the data compare in reco space. This time we are going to use ReMU’s built in function to plot the mean and average value of projections of binnings, when given a set of bin contents rather than a single one:
figax = reco_binning.plot_values(None,
kwargs1d={'color': 'k', 'label': 'data'}, sqrt_errors=True)
modelA_reco = response_matrix.dot(modelA_shape.translate(retA.x))
modelB_reco = response_matrix.dot(modelB_shape.translate(retB.x))
modelA_reco_syst = response_matrix_syst.dot(
modelA_shape.translate(retA_syst.x))
modelB_reco_syst = response_matrix_syst.dot(
modelB_shape.translate(retB_syst.x))
reco_binning.plot_ndarray(None, modelA_reco,
kwargs1d={'color': 'b', 'label': 'model A'},
error_xoffset=-.1, figax=figax)
reco_binning.plot_ndarray(None, modelA_reco_syst,
kwargs1d={'color': 'b', 'label': 'model A syst'},
error_xoffset=-.1, figax=figax)
reco_binning.plot_ndarray(None, modelB_reco,
kwargs1d={'color': 'r', 'label': 'model B'},
error_xoffset=+.1, figax=figax)
reco_binning.plot_ndarray("reco-comparison.png", modelB_reco_syst,
kwargs1d={'color': 'r', 'label': 'model B syst'},
error_xoffset=+.1, figax=figax)
And of course we can compare the p-values of the two fitted models assuming the nominal detector response:
print_(lm.max_likelihood_p_value(modelA_shape, nproc=4))
print_(lm.max_likelihood_p_value(modelB_shape, nproc=4))
0.084
0.508
With the p-values yielded when taking the systematic uncertainties into account:
print_(lm_syst.max_likelihood_p_value(modelA_shape, nproc=4))
print_(lm_syst.max_likelihood_p_value(modelB_shape, nproc=4))
0.104
0.516
We can construct confidence intervals on the template weights of the models using the maximum likelihood p-value:
p_values_A = []
p_values_B = []
p_values_A_syst = []
p_values_B_syst = []
values = np.linspace(600, 1600, 21)
for v in values:
fixed_model_A = modelA_shape.fix_parameters((v,))
fixed_model_B = modelB_shape.fix_parameters((v,))
A = lm.max_likelihood_p_value(fixed_model_A, nproc=4)
A_syst = lm_syst.max_likelihood_p_value(fixed_model_A, nproc=4)
B = lm.max_likelihood_p_value(fixed_model_B, nproc=4)
B_syst = lm_syst.max_likelihood_p_value(fixed_model_B, nproc=4)
print_(v, A, A_syst, B, B_syst)
p_values_A.append(A)
p_values_B.append(B)
p_values_A_syst.append(A_syst)
p_values_B_syst.append(B_syst)
fig, ax = plt.subplots()
ax.set_xlabel("Model weight")
ax.set_ylabel("p-value")
ax.plot(values, p_values_A, label="Model A", color='b', linestyle='dotted')
ax.plot(values, p_values_A_syst, label="Model A syst", color='b',
linestyle='solid')
ax.plot(values, p_values_B, label="Model B", color='r', linestyle='dotted')
ax.plot(values, p_values_B_syst, label="Model B syst", color='r',
linestyle='solid')
ax.axvline(retA.x[0], color='b', linestyle='dotted')
ax.axvline(retA_syst.x[0], color='b', linestyle='solid')
ax.axvline(retB.x[0], color='r', linestyle='dotted')
ax.axvline(retB_syst.x[0], color='r', linestyle='solid')
ax.axhline(0.32, color='k', linestyle='dashed')
ax.axhline(0.05, color='k', linestyle='dashed')
ax.legend(loc='best')
fig.savefig("p-values.png")
The “fixed” models have no free parameters here, making them degenerate
CompositeHypothesis
, equivalent to simple hypotheses. This means
using the max_likelihood_p_value()
is equivalent to testing the
corresponding simple hypotheses using the likelihood_p_value()
. No
actual maximisation is taking place. Note that the p-values for model A are
generally smaller than those for model B. This is consistent with previous
results showing that the data is better described by model B.
Finally, let us construct confidence intervals of the template weights, assuming that the corresponding model is correct:
p_values_A = []
p_values_B = []
p_values_A_syst = []
p_values_B_syst = []
values = np.linspace(600, 1600, 21)
for v in values:
fixed_model_A = modelA_shape.fix_parameters((v,))
fixed_model_B = modelB_shape.fix_parameters((v,))
A = lm.max_likelihood_ratio_p_value(fixed_model_A, modelA_shape,
nproc=4)
A_syst = lm_syst.max_likelihood_ratio_p_value(fixed_model_A,
modelA_shape, nproc=4)
B = lm.max_likelihood_ratio_p_value(fixed_model_B, modelB_shape,
nproc=4)
B_syst = lm_syst.max_likelihood_ratio_p_value(fixed_model_B,
modelB_shape, nproc=4)
print_(v, A, A_syst, B, B_syst)
p_values_A.append(A)
p_values_B.append(B)
p_values_A_syst.append(A_syst)
p_values_B_syst.append(B_syst)
fig, ax = plt.subplots()
ax.set_xlabel("Model weight")
ax.set_ylabel("p-value")
ax.plot(values, p_values_A, label="Model A", color='b', linestyle='dotted')
ax.plot(values, p_values_A_syst, label="Model A syst", color='b',
linestyle='solid')
ax.plot(values, p_values_B, label="Model B", color='r', linestyle='dotted')
ax.plot(values, p_values_B_syst, label="Model B syst", color='r',
linestyle='solid')
ax.axvline(retA.x[0], color='b', linestyle='dotted')
ax.axvline(retA_syst.x[0], color='b', linestyle='solid')
ax.axvline(retB.x[0], color='r', linestyle='dotted')
ax.axvline(retB_syst.x[0], color='r', linestyle='solid')
ax.axhline(0.32, color='k', linestyle='dashed')
ax.axhline(0.05, color='k', linestyle='dashed')
ax.legend(loc='best')
fig.savefig("ratio-p-values.png")
Note that the p-values for both models go up to 1.0 here (within the granularity of the parameter scan) and that the constructed confidence intervals are very different from the ones before. This is because we are asking a different question now. Before we asked the question “What is the probability of getting a worse likelihood assuming that the tested model-parameter is true?”. Now we ask the question “What is the probability of getting a worse best-fit likelihood ratio, assuming the tested model-parameter is true?”. Since the likelihood ratio is 1.0 at the best fit point and the likelihood ratio of nested hypotheses is less than or equal to 1.0 by construction, the p-value is 100% there.