.. _example05: ========================= Example 05 -- Backgrounds ========================= Aims ==== * Understand and deal with background events in measurements Instructions ============ Real experiments will almost always include some sort of background or noise events in the recorded data. ReMU is well equipped to deal with those in a consistent fashion. Let us first create the noisy data as well as simulations of background and noise events:: # Create "real" data ../simple_experiment/run_experiment.py 10 real_data.txt --enable-background # Create BG simulation ../simple_experiment/simulate_experiment.py 100 background nominal_bg_data.txt bg_truth.txt # Create Noise simulation ../simple_experiment/simulate_experiment.py 100 noise noise_data.txt noise_truth.txt # Create Variations ../simple_experiment/vary_detector.py nominal_bg_data.txt bg_data.txt In this context, we call "noise" events that are recorded, but that do not have a meaningful true information associated with them. "Background" on the other hand can be understood as regular events in the detector, but which we are not interested in (e.g. because they are created by a different process than our main subject of study). Background events can behave differently than signal events, but because they also have a defined true information, we can treat them just like signal. The only difference is that they will occupy different truth bins than the signal events. Noise events on the other hand do not have meaningful true properties, so we will just assign them all to a single truth bin. The content of that single bin will then determine the number of expected noise events in the reconstructed binnings. Starting from the truth binning we created for the signal models in a previous example, let us create a binning for the background events:: import numpy as np from remu import binning from remu import migration from remu import matrix_utils from remu import plotting builder = migration.ResponseMatrixArrayBuilder(1) with open("../01/reco-binning.yml", 'rt') as f: reco_binning = binning.yaml.full_load(f) with open("../01/optimised-truth-binning.yml", 'rt') as f: signal_binning = binning.yaml.full_load(f) bg_binning = signal_binning.clone() resp = migration.ResponseMatrix(reco_binning, bg_binning) i = 0 resp.fill_from_csv_file("bg_data.txt", weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'}, buffer_csv_files=True) resp.fill_up_truth_from_csv_file("bg_truth.txt", buffer_csv_files=True) entries = resp.get_truth_entries_as_ndarray() while np.min(entries) < 10: resp = matrix_utils.improve_stats(resp) entries = resp.get_truth_entries_as_ndarray() bg_binning = resp.truth_binning bg_binning.reset() reco_binning.reset() This ``bg_binning`` will now include at least 10 events in each truth bin, when building the response matrix. Next we will create a binning that can distinguish between noise, background and signal events, according to a variable aptly called ``event_type``:: truth_binning = binning.LinearBinning( variable = 'event_type', bin_edges = [-1.5, -0.5, 0.5, 1.5], subbinnings = { 1: bg_binning, 2: signal_binning, } ) with open("truth-binning.yml", 'wt') as f: binning.yaml.dump(truth_binning, f) The :class:`.LinearBinning` would have only three bins, but we insert the previously created background and signal binnings as subbinnings into the second and third bin respectively. This means that events which would fall into those bins will get further subdivided by the subbinnings. We thus have a binning that puts noise events (``event_type == -1``) into a single bin, sorts background events (``event_type == 0``) according to ``bg_binning``, and sorts signal events (``event_type == 1``) according to ``signal_binning``. Now we create a :class:`.ResponseMatrix` with this binning, just like we did in previous examples:: resp = migration.ResponseMatrix(reco_binning, truth_binning, nuisance_indices=[0]) To fill the matrix, we need to tell it which ``event_type`` each event is, though. This information might already be part of the simulated data, but in this case we have to add that variable by hand. For this we can use the ``cut_function`` parameter. A cut function takes the data (a structured numpy array) as its only argument and returns the data that should be filled into the binning:: import numpy.lib.recfunctions as rfn def set_signal(data): return rfn.append_fields(data, 'event_type', np.full_like(data['true_x'], 1.)) def set_bg(data): return rfn.append_fields(data, 'event_type', np.full_like(data['true_x'], 0.)) def set_noise(data): return rfn.append_fields(data, 'event_type', np.full_like(data['reco_x'], -1.)) n_toys = 100 for i in range(n_toys): resp.reset() resp.fill_from_csv_file(["../03/modelA_data.txt", "../03/modelB_data.txt"], weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'}, cut_function=set_signal, buffer_csv_files=True) resp.fill_up_truth_from_csv_file( ["../00/modelA_truth.txt", "../00/modelB_truth.txt"], cut_function=set_signal, buffer_csv_files=True) resp.fill_from_csv_file("bg_data.txt", weightfield='weight_%i'%(i,), rename={'reco_x_%i'%(i,): 'reco_x'}, cut_function=set_bg, buffer_csv_files=True) resp.fill_up_truth_from_csv_file("bg_truth.txt", cut_function=set_bg, buffer_csv_files=True) # Calling `fill_up_truth_from_csv` twice only works because # the files fill completely different bins resp.fill_from_csv_file("noise_data.txt", cut_function=set_noise, buffer_csv_files=True) builder.add_matrix(resp) builder.export("response_matrix.npz") We can take a look at the truth information that has been filled into the last of the matrices:: pltr = plotting.get_plotter(truth_binning) pltr.plot_values(density=False) pltr.savefig('truth.png') .. image:: truth.png The base binning is a :class:`.LinearBinning` with only three bins. The corresponding plotter does not know how to plot the subbinnings, so it just marginalizes them out. To plot the content of all truth bins, we can use the basic :class:`.BinningPlotter`, which simply plots the content of each bin:: pltr = plotting.BinningPlotter(truth_binning) pltr.plot_values(density=False) pltr.savefig('all_truth.png') .. image:: all_truth.png We can look at the content of the subbinings directly for some nicer plots:: pltr = plotting.get_plotter(signal_binning) pltr.plot_values() pltr.savefig('signal_truth.png') .. image:: signal_truth.png :: pltr = plotting.get_plotter(bg_binning) pltr.plot_values() pltr.savefig('bg_truth.png') .. image:: bg_truth.png And we can take a look at the efficiencies using the corresponding convenience function:: matrix_utils.plot_mean_efficiency(resp, "efficiency.png") .. image:: efficiency.png Next we can use the response matrix for some hypothesis tests. First we need to create the :class:`.ResponseMatrixPredictor` and the templates to be used with the :class:`.TemplatePredictor`:: from remu import likelihood from multiprocess import Pool pool = Pool(8) likelihood.mapper = pool.map with open("../01/reco-binning.yml", 'rt') as f: reco_binning = binning.yaml.full_load(f) with open("truth-binning.yml", 'rt') as f: truth_binning = binning.yaml.full_load(f) reco_binning.fill_from_csv_file("real_data.txt") data = reco_binning.get_entries_as_ndarray() data_model = likelihood.PoissonData(data) response_matrix = "response_matrix.npz" matrix_predictor = likelihood.ResponseMatrixPredictor(response_matrix) calc = likelihood.LikelihoodCalculator(data_model, matrix_predictor) maxi = likelihood.BasinHoppingMaximizer() import numpy.lib.recfunctions as rfn def set_signal(data): return rfn.append_fields(data, 'event_type', np.full_like(data['true_x'], 1.)) def set_bg(data): return rfn.append_fields(data, 'event_type', np.full_like(data['true_x'], 0.)) truth_binning.fill_from_csv_file("../00/modelA_truth.txt", cut_function=set_signal) modelA = truth_binning.get_values_as_ndarray() modelA /= np.sum(modelA) truth_binning.reset() truth_binning.fill_from_csv_file("../00/modelB_truth.txt", cut_function=set_signal) modelB = truth_binning.get_values_as_ndarray() modelB /= np.sum(modelB) truth_binning.reset() truth_binning.fill_from_csv_file("bg_truth.txt", cut_function=set_bg) bg = truth_binning.get_values_as_ndarray() bg /= np.sum(bg) truth_binning.reset() noise = truth_binning.get_values_as_ndarray() noise[0] = 1. Since we put all noise events into the first truth bin, the noise template is just a value of 1 in that bin. Now we start by trying to fit the model A template without background or noise events:: modelA_only = likelihood.TemplatePredictor([modelA]) calcA_only = calc.compose(modelA_only) retA_only = maxi(calcA_only) print(retA_only) .. include:: modelA_only_fit.txt :literal: To judge how well the result actually fit, we can consult the results p-value:: testA_only = likelihood.HypothesisTester(calcA_only) print(testA_only.likelihood_p_value(retA_only.x)) .. include:: modelA_only_gof.txt :literal: It clearly is a very bad fit, which is reflected in the maximum likelihood p-value as well:: print(testA_only.max_likelihood_p_value()) .. include:: modelA_only_p_value.txt :literal: So the signal-only model A hypothesis can be excluded. Now let us try again with background and noise templates added in:: modelA_bg = likelihood.TemplatePredictor([noise, bg, modelA]) calcA_bg = calc.compose(modelA_bg) retA_bg = maxi(calcA_bg) print(retA_bg) .. include:: modelA_bg_fit.txt :literal: :: testA_bg = likelihood.HypothesisTester(calcA_bg) print(testA_bg.likelihood_p_value(retA_bg.x)) .. include:: modelA_bg_gof.txt :literal: :: print(testA_bg.max_likelihood_p_value(), file=f) .. include:: modelA_bg_p_value.txt :literal: This fit is clearly much better. We can repeat the same with model B:: modelB_only = likelihood.TemplatePredictor([modelB]) calcB_only = calc.compose(modelB_only) retB_only = maxi(calcB_only) print(retB_only) .. include:: modelB_only_fit.txt :literal: :: testB_only = likelihood.HypothesisTester(calcB_only) print(testB_only.likelihood_p_value(retB_only.x)) .. include:: modelB_only_gof.txt :literal: :: print(testB_only.max_likelihood_p_value()) .. include:: modelB_only_p_value.txt :literal: :: modelB_bg = likelihood.TemplatePredictor([noise, bg, modelB]) calcB_bg = calc.compose(modelB_bg) retB_bg = maxi(calcB_bg) print(retB_bg) .. include:: modelB_bg_fit.txt :literal: :: testB_bg = likelihood.HypothesisTester(calcB_bg) print(testB_bg.likelihood_p_value(retB_bg.x)) .. include:: modelB_bg_gof.txt :literal: :: print(testB_bg.max_likelihood_p_value()) .. include:: modelB_bg_p_value.txt :literal: We can also take a qualitative look at the results by plotting the maximum likelihood predictions in reco and truth space:: pltr = plotting.get_plotter(reco_binning) modelA_reco, modelA_weights = calcA_only.predictor(retA_only.x) modelB_reco, modelB_weights = calcB_only.predictor(retB_only.x) modelA_bg_reco, modelA_bg_weights = calcA_bg.predictor(retA_bg.x) modelB_bg_reco, modelB_bg_weights = calcB_bg.predictor(retB_bg.x) pltr.plot_array(modelA_reco, label='model A only', stack_function=0.68, hatch=r'//', edgecolor='C1') pltr.plot_array(modelA_bg_reco, label='model A + bg', stack_function=0.68, hatch=r'*', edgecolor='C1') pltr.plot_array(modelB_reco, label='model B only', stack_function=0.68, hatch=r'\\', edgecolor='C2') pltr.plot_array(modelB_bg_reco, label='model B + bg', stack_function=0.68, hatch=r'O', edgecolor='C2') pltr.plot_entries(edgecolor='C0', label='data', hatch=None, linewidth=2.) pltr.legend() pltr.savefig('reco-comparison.png') .. image:: reco-comparison.png :: pltr = plotting.get_plotter(truth_binning) pltr.plot_array(modelA_only(retA_only.x)[0], label='model A only', hatch=r'//', edgecolor='C1', density=False) pltr.plot_array(modelA_bg(retA_bg.x)[0], label='model A + bg', hatch=r'*', edgecolor='C1', density=False) pltr.plot_array(modelB_only(retB_only.x)[0], label='model B only', hatch=r'\\', edgecolor='C2', density=False) pltr.plot_array(modelB_bg(retB_bg.x)[0], label='model B + bg', hatch=r'O', edgecolor='C2', density=False) pltr.legend(loc='upper left') pltr.savefig('truth-comparison.png') .. image:: truth-comparison.png