ResponseMatrix
- class remu.migration.ResponseMatrix(reco_binning, truth_binning, nuisance_indices=None, impossible_indices=None, response_binning=None)[source]
Matrix that describes the detector response to true events.
- Parameters
- reco_binningRectangularBinning
The Binning object describing the reco categorization.
- truth_binningRectangularBinning
The Binning object describing the truth categorization.
- nuisance_indiceslist of ints, optional
List of indices of nuisance truth bins. These are treated like their efficiency is exactly 1.
- impossible_indices :list of ints, optional
List of indices of impossible reco bins. These are treated like their probability is exactly 0.
- response_binningCartesianProductBinning, optional
The Binning object describing the reco and truth categorization. Usually this will be generated from the truth and reco binning.
Notes
The truth and reco binnings will be combined with their cartesian_product method.
The truth bins corresonding to the nuisance_indices will be treated like they have a total efficiency of 1.
The reco bins corresonding to the impossible_indices will be treated like they are filled with a probability of 0.
Two response matrices can be combined by adding them
new_resp = respA + respB
. This yields a new matrix that is equivalent to one that has been filled with the data in bothrespA
andrespB
. The truth and reco binnings inrespA
andrespB
must be identical for this to make sense.Attributes
truth_binning
(Binning) The
Binning
object for the truth information of the events.reco_binning
(Binning) The
Binning
object for the reco information of the events.response_binning
(CartesianProductBinning) The
CartesianProductBinning
of reco and truth binning.nuisance_indices
(list of int) The truth data indices that will be handled as nuisance bins.
impossible_indices
(list of int) The reco data indices that will be treated as impossible to occur.
filled_truth_indices
(list of int) The data indices of truth bins that have at least one event in them.
Methods
clone
()Create a functioning copy of the response matrix.
export
(filename[, compress, nstat, sparse])Save all necessary information for using the response matrix.
fill
(*args, **kwargs)Fill events into the binnings.
fill_from_csv_file
(*args, **kwargs)Fill binnings from csv file.
fill_up_truth
(*args, **kwargs)Re-fill the truth bins with the given events file.
fill_up_truth_from_csv_file
(*args, **kwargs)Re-fill the truth bins with the given csv file.
generate_random_response_matrices
([size, shape])Generate random response matrices according to the estimated variance.
get_in_bin_variation_as_ndarray
([shape, ...])Get an estimate for the variation of the response within a bin.
get_mean_response_matrix_as_ndarray
([shape])Get the means of the posterior distributions of the response matrix elements.
get_reco_entries_as_ndarray
(*args, **kwargs)Get the number of entries in the reco binning as ndarray.
get_reco_sumw2_as_ndarray
(*args, **kwargs)Get the sum of squared weights in the reco binning as ndarray.
get_reco_values_as_ndarray
(*args, **kwargs)Get the values of the reco binning as ndarray.
get_response_entries_as_ndarray
(*args, **kwargs)Get the number of entries in the response binning as ndarray.
get_response_matrix_as_ndarray
([shape, ...])Return the ResponseMatrix as a ndarray.
get_response_sumw2_as_ndarray
(*args, **kwargs)Get the sum of squared weights in the response binning as ndarray.
get_response_values_as_ndarray
(*args, **kwargs)Get the values of the response binning as ndarray.
get_statistical_variance_as_ndarray
([shape])Get the statistical variance of the single ResponseMatrix elements as ndarray.
get_truth_entries_as_ndarray
(*args, **kwargs)Get the number of entries in the truth binning as ndarray.
get_truth_sumw2_as_ndarray
(*args, **kwargs)Get the sum of squared weights in the truth binning as ndarray.
get_truth_values_as_ndarray
(*args, **kwargs)Get the values of the truth binning as ndarray.
reset
()Reset all binnings.
set_reco_entries_from_ndarray
(*args, **kwargs)Set the number of entries in the reco binning as ndarray.
set_reco_sumw2_from_ndarray
(*args, **kwargs)Set the sum of squared weights in the reco binning as ndarray.
set_reco_values_from_ndarray
(*args, **kwargs)Set the values of the reco binning as ndarray.
set_response_entries_from_ndarray
(*args, ...)Set the number of entries in the response binning as ndarray.
set_response_sumw2_from_ndarray
(*args, **kwargs)Set the sum of squared weights in the response binning as ndarray.
set_response_values_from_ndarray
(*args, **kwargs)Set the values of the response binning as ndarray.
set_truth_entries_from_ndarray
(*args, **kwargs)Set the number of entries in the truth binning as ndarray.
set_truth_sumw2_from_ndarray
(*args, **kwargs)Set the sum of squared weights in the truth binning as ndarray.
set_truth_values_from_ndarray
(*args, **kwargs)Set the values of the truth binning as ndarray.
- export(filename, compress=False, nstat=None, sparse=True)[source]
Save all necessary information for using the response matrix.
Saves all necessary information for using the response matrix` in a NumPy
.npz
archive.- Parameters
- filenamestr or file
Where to store the arrays.
- compressbool, optional
Whether to use compression.
- nstatint, optional
How many random variations of the matrix to generate. Default: Export mean matrix, no random variation
- sparsebool, optional
Should a sparse version be exported, or the full matrix.
See also
- fill_from_csv_file(*args, **kwargs)[source]
Fill binnings from csv file.
See
Binning.fill_from_csv_file
for a description of the parameters.See also
fill_up_truth_from_csv_file
Re-fill only truth bins from different file.
- fill_up_truth(*args, **kwargs)[source]
Re-fill the truth bins with the given events file.
This can be used to get proper efficiencies if the true signal events are stored separate from the reconstructed events.
It takes the same parameters as
fill()
.Notes
A new truth binning is created and filled with the events from the provided events. Each bin is compared to the corresponding bin in the already present truth binning. The larger value of the two is taken as the new truth. This way, event types that are not present in the pure truth data, e.g. background, are not affected by this. It can only increase the value of the truth bins, lowering their efficiency.
For each truth bin, one of the following must be true for this operation to make sense:
All events in the migration matrix are also present in the new truth events. In this case, the additional truth events lower the efficiency of the truth bin. This is the case, for example, if not all true signal events are reconstructed.
All events in the new truth events are also present in the migration matrix. In this case, the events in the new truth events have no influence on the response matrix. This is the case, for example, if only a subset of the reconstructed background is saved in the truth file.
If there are events in the response matrix that are not in the new truth events and there are events in the new truth events that are not in the response matrix, this method will lead to a wrong efficiency of the affected truth bin.
- fill_up_truth_from_csv_file(*args, **kwargs)[source]
Re-fill the truth bins with the given csv file.
This can be used to get proper efficiencies if the true signal events are saved in a separate file from the reconstructed events.
It takes the same parameters as
fill_from_csv_file()
.Notes
A new truth binning is created and filled with the events from the provided file. Each bin is compared to the corresponding bin in the already present truth binning. The larger value of the two is taken as the new truth. This way, event types that are not present in the pure truth data, e.g. background, are not affected by this. It can only increase the value of the truth bins, lowering their efficiency.
For each truth bin, one of the following must be true for this operation to make sense:
All events in the migration matrix are also present in the truth file. In this case, the additional truth events lower the efficiency of the truth bin. This is the case, for example, if not all true signal events are reconstructed.
All events in the truth file are also present in the migration matrix. In this case, the events in the truth file have no influence on the response matrix. This is the case, for example, if only a subset of the reconstructed background is saved in the truth file.
If there are events in the response matrix that are not in the truth tree and there are events in the truth tree that are not in the response matrix, this method will lead to a wrong efficiency of the affected truth bin.
- generate_random_response_matrices(size=None, shape=None, **kwargs)[source]
Generate random response matrices according to the estimated variance.
- Parameters
- sizeint or tuple of ints, optional
How many random matrices should be generated.
- shapetuple of ints, optional
The shape of the returned matrices. Defaults to
(#(reco bins), #(truth bins))
.- kwargsoptional
See
get_mean_response_matrix_as_ndarray()
for a description of more optional kwargs.
- Returns
- ndarray
Notes
This is a three step process:
Draw the binomal efficiencies from Beta distributions
Draw the multinomial reconstruction probabilities from a Dirichlet distribution.
Draw weight corrections from normal distributions.
If no shape is specified, it will be set to
(#(reco bins, #(truth bins))
.If truth_indices are provided, a sliced matrix with only the given columns will be returned.
- get_in_bin_variation_as_ndarray(shape=None, truth_indices=None, normalize=True, **kwargs)[source]
Get an estimate for the variation of the response within a bin.
The in-bin variation is estimated from the maximum difference to the surrounding truth bins. The differences can be normalized to the estimated statistical errors, so values close to one indicate a statistically dominated variation.
- Parameters
- shapetuple of ints, optional
The shape of the returned ndarray. Default:
(#(reco bins), #(truth bins))
- truth_indiceslist of ints, optional
Return a sliced matrix with only the given columns.
- normalizebool, optional
Divide the variation by the statistical variance
- **kwargsoptional
Additional keyword arguments are passed to
get_mean_response_matrix_as_ndarray()
andget_statistical_variance_as_ndarray()
.
- Returns
- ndarray
See also
- get_mean_response_matrix_as_ndarray(shape=None, **kwargs)[source]
Get the means of the posterior distributions of the response matrix elements.
This is different from the “raw” matrix one gets from
get_response_matrix_as_ndarray()
. The latter simply divides the sum of weights in the respective bins.- Parameters
- shapetuple of ints, optional
The shape of the returned matrices. Defaults to
(#(reco bins), #(truth bins))
.- expected_weightfloat, optional
The expected average weight of the events. This is used int the calculation of the weight variance. Default: 1.0
- nuisance_indiceslist of ints, optional
List of truth bin indices. These bins will be treated like their efficiency is exactly 1. Default: Use the nuisance_indices attribute of the ResponseMatrix.
- impossible_indiceslist of ints, optional
List of reco bin indices. These bins will be treated like their probability is exactly 0. Default: Use the impossible_indices attribute of the ResponseMatrix.
- truth_indiceslist of ints, optional
List of truth bin indices. Only return the response of the given truth bins. Default: Return full matrices.
- Returns
- ndarray
- get_reco_entries_as_ndarray(*args, **kwargs)[source]
Get the number of entries in the reco binning as ndarray.
- get_reco_sumw2_as_ndarray(*args, **kwargs)[source]
Get the sum of squared weights in the reco binning as ndarray.
- get_response_entries_as_ndarray(*args, **kwargs)[source]
Get the number of entries in the response binning as ndarray.
- get_response_matrix_as_ndarray(shape=None, truth_indices=None)[source]
Return the ResponseMatrix as a ndarray.
Uses the information in the truth and response binnings to calculate the response matrix.
- Parameters
- shapetuple of ints, optional
The shape of the returned ndarray. Default:
(#(reco bins), #(truth bins))
- truth_indiceslist of ints, optional
Only return the response of the given truth bins. Default: Return full matrix.
- Returns
- ndarray
See also
Notes
If shape is None, it s set to
(#(reco bins), #(truth bins))
. The expected response of a truth vector can then be calculated like this:v_reco = response_matrix.dot(v_truth)
If truth_indices are provided, a sliced matrix with only the given columns will be returned.
- get_response_sumw2_as_ndarray(*args, **kwargs)[source]
Get the sum of squared weights in the response binning as ndarray.
- get_response_values_as_ndarray(*args, **kwargs)[source]
Get the values of the response binning as ndarray.
- get_statistical_variance_as_ndarray(shape=None, **kwargs)[source]
Get the statistical variance of the single ResponseMatrix elements as ndarray.
The variance is estimated from the actual bin contents in a Bayesian motivated way.
- Parameters
- shapetuple of ints, optional
The shape of the returned matrix. Defaults to
(#(reco bins), #(truth bins))
.- kwargsoptional
See
get_mean_response_matrix_as_ndarray()
for a description of more optional kwargs.
- Returns
- ndarray
See also
Notes
The response matrix creation is modeled as a three step process:
Reconstruction efficiency according to a binomial process.
Distribution of truth events among the reco bins according to a multinomial distribution.
Correction of the categorical probabilities according to the mean weights of the events in each bin.
So the response matrix element can be written like this:
R_ij = m_ij * p_ij * eff_j
where
eff_j
is the total efficiency of events in truth binj
,p_ij
is the unweighted multinomial reconstruction probability in reco bini
andm_ij
the weight correction. The variance ofR_ij
is estimated by estimating the variances of these values separately.The variance of
eff_j
is estimated by using the Bayesian conjugate prior for biinomial distributions: the Beta distribution. We assume a prior that is uniform in the reconstruction efficiency. We then update it with the simulated events. The variance of the posterior distribution is taken as the variance of the efficiency.The variance of
p_ij
is estimated by using the Bayesian conjugate prior for multinomial distributions: the Dirichlet distribution. We assume a prior that is uniform in the ignorant about reconstruction probabilities. We then update it with the simulated events. The variance of the posterior distribution is taken as the variance of the transition probability.If a list of nuisance_indices is provided, the probabilities of not reconstructing events in the respective truth categories will be fixed to 0. This is useful for background categories where one is not interested in the true number of events.
If a list of impossible_indices is provided, the probabilities of reconstructing events in the respective reco categories will be fixed to 0. This is useful for bins that are impossible to have any events in them by their definition.
The variances of m_ij is estimated from the errors of the average weights in the matrix elements as classical “standard error of the mean”. To avoid problems with bins with 0 or 1 entries, we add a “prior expectation” point to the data. This ensures that all bins have at least 1 entry (no divisions by zero) and that variances can be estimated even for bins with only one (true) entry (from the difference to the expected value).
This is just an estimate! The true variance of the randomly generated response matrices can deviate from the returned numbers. Also, these variances ignore the correlations between matrix elements.
If no shape is specified, it will be set to (N_reco, N_truth).
If truth_indices are provided, a sliced matrix with only the given columns will be returned.
- get_truth_entries_as_ndarray(*args, **kwargs)[source]
Get the number of entries in the truth binning as ndarray.
- get_truth_sumw2_as_ndarray(*args, **kwargs)[source]
Get the sum of squared weights in the truth binning as ndarray.
- get_truth_values_as_ndarray(*args, **kwargs)[source]
Get the values of the truth binning as ndarray.
- set_reco_entries_from_ndarray(*args, **kwargs)[source]
Set the number of entries in the reco binning as ndarray.
- set_reco_sumw2_from_ndarray(*args, **kwargs)[source]
Set the sum of squared weights in the reco binning as ndarray.
- set_reco_values_from_ndarray(*args, **kwargs)[source]
Set the values of the reco binning as ndarray.
- set_response_entries_from_ndarray(*args, **kwargs)[source]
Set the number of entries in the response binning as ndarray.
- set_response_sumw2_from_ndarray(*args, **kwargs)[source]
Set the sum of squared weights in the response binning as ndarray.
- set_response_values_from_ndarray(*args, **kwargs)[source]
Set the values of the response binning as ndarray.
- set_truth_entries_from_ndarray(*args, **kwargs)[source]
Set the number of entries in the truth binning as ndarray.