ResponseMatrix

class remu.migration.ResponseMatrix(reco_binning, truth_binning, nuisance_indices=None, impossible_indices=None, response_binning=None)[source]

Matrix that describes the detector response to true events.

Parameters
reco_binningRectangularBinning

The Binning object describing the reco categorization.

truth_binningRectangularBinning

The Binning object describing the truth categorization.

nuisance_indiceslist of ints, optional

List of indices of nuisance truth bins. These are treated like their efficiency is exactly 1.

impossible_indices :list of ints, optional

List of indices of impossible reco bins. These are treated like their probability is exactly 0.

response_binningCartesianProductBinning, optional

The Binning object describing the reco and truth categorization. Usually this will be generated from the truth and reco binning.

Notes

The truth and reco binnings will be combined with their cartesian_product method.

The truth bins corresonding to the nuisance_indices will be treated like they have a total efficiency of 1.

The reco bins corresonding to the impossible_indices will be treated like they are filled with a probability of 0.

Two response matrices can be combined by adding them new_resp = respA + respB. This yields a new matrix that is equivalent to one that has been filled with the data in both respA and respB. The truth and reco binnings in respA and respB must be identical for this to make sense.

Attributes

truth_binning

(Binning) The Binning object for the truth information of the events.

reco_binning

(Binning) The Binning object for the reco information of the events.

response_binning

(CartesianProductBinning) The CartesianProductBinning of reco and truth binning.

nuisance_indices

(list of int) The truth data indices that will be handled as nuisance bins.

impossible_indices

(list of int) The reco data indices that will be treated as impossible to occur.

filled_truth_indices

(list of int) The data indices of truth bins that have at least one event in them.

Methods

clone()

Create a functioning copy of the response matrix.

export(filename[, compress, nstat, sparse])

Save all necessary information for using the response matrix.

fill(*args, **kwargs)

Fill events into the binnings.

fill_from_csv_file(*args, **kwargs)

Fill binnings from csv file.

fill_up_truth(*args, **kwargs)

Re-fill the truth bins with the given events file.

fill_up_truth_from_csv_file(*args, **kwargs)

Re-fill the truth bins with the given csv file.

generate_random_response_matrices([size, shape])

Generate random response matrices according to the estimated variance.

get_in_bin_variation_as_ndarray([shape, ...])

Get an estimate for the variation of the response within a bin.

get_mean_response_matrix_as_ndarray([shape])

Get the means of the posterior distributions of the response matrix elements.

get_reco_entries_as_ndarray(*args, **kwargs)

Get the number of entries in the reco binning as ndarray.

get_reco_sumw2_as_ndarray(*args, **kwargs)

Get the sum of squared weights in the reco binning as ndarray.

get_reco_values_as_ndarray(*args, **kwargs)

Get the values of the reco binning as ndarray.

get_response_entries_as_ndarray(*args, **kwargs)

Get the number of entries in the response binning as ndarray.

get_response_matrix_as_ndarray([shape, ...])

Return the ResponseMatrix as a ndarray.

get_response_sumw2_as_ndarray(*args, **kwargs)

Get the sum of squared weights in the response binning as ndarray.

get_response_values_as_ndarray(*args, **kwargs)

Get the values of the response binning as ndarray.

get_statistical_variance_as_ndarray([shape])

Get the statistical variance of the single ResponseMatrix elements as ndarray.

get_truth_entries_as_ndarray(*args, **kwargs)

Get the number of entries in the truth binning as ndarray.

get_truth_sumw2_as_ndarray(*args, **kwargs)

Get the sum of squared weights in the truth binning as ndarray.

get_truth_values_as_ndarray(*args, **kwargs)

Get the values of the truth binning as ndarray.

reset()

Reset all binnings.

set_reco_entries_from_ndarray(*args, **kwargs)

Set the number of entries in the reco binning as ndarray.

set_reco_sumw2_from_ndarray(*args, **kwargs)

Set the sum of squared weights in the reco binning as ndarray.

set_reco_values_from_ndarray(*args, **kwargs)

Set the values of the reco binning as ndarray.

set_response_entries_from_ndarray(*args, ...)

Set the number of entries in the response binning as ndarray.

set_response_sumw2_from_ndarray(*args, **kwargs)

Set the sum of squared weights in the response binning as ndarray.

set_response_values_from_ndarray(*args, **kwargs)

Set the values of the response binning as ndarray.

set_truth_entries_from_ndarray(*args, **kwargs)

Set the number of entries in the truth binning as ndarray.

set_truth_sumw2_from_ndarray(*args, **kwargs)

Set the sum of squared weights in the truth binning as ndarray.

set_truth_values_from_ndarray(*args, **kwargs)

Set the values of the truth binning as ndarray.

clone()[source]

Create a functioning copy of the response matrix.

export(filename, compress=False, nstat=None, sparse=True)[source]

Save all necessary information for using the response matrix.

Saves all necessary information for using the response matrix` in a NumPy .npz archive.

Parameters
filenamestr or file

Where to store the arrays.

compressbool, optional

Whether to use compression.

nstatint, optional

How many random variations of the matrix to generate. Default: Export mean matrix, no random variation

sparsebool, optional

Should a sparse version be exported, or the full matrix.

fill(*args, **kwargs)[source]

Fill events into the binnings.

fill_from_csv_file(*args, **kwargs)[source]

Fill binnings from csv file.

See Binning.fill_from_csv_file for a description of the parameters.

See also

fill_up_truth_from_csv_file

Re-fill only truth bins from different file.

fill_up_truth(*args, **kwargs)[source]

Re-fill the truth bins with the given events file.

This can be used to get proper efficiencies if the true signal events are stored separate from the reconstructed events.

It takes the same parameters as fill().

Notes

A new truth binning is created and filled with the events from the provided events. Each bin is compared to the corresponding bin in the already present truth binning. The larger value of the two is taken as the new truth. This way, event types that are not present in the pure truth data, e.g. background, are not affected by this. It can only increase the value of the truth bins, lowering their efficiency.

For each truth bin, one of the following must be true for this operation to make sense:

  • All events in the migration matrix are also present in the new truth events. In this case, the additional truth events lower the efficiency of the truth bin. This is the case, for example, if not all true signal events are reconstructed.

  • All events in the new truth events are also present in the migration matrix. In this case, the events in the new truth events have no influence on the response matrix. This is the case, for example, if only a subset of the reconstructed background is saved in the truth file.

If there are events in the response matrix that are not in the new truth events and there are events in the new truth events that are not in the response matrix, this method will lead to a wrong efficiency of the affected truth bin.

fill_up_truth_from_csv_file(*args, **kwargs)[source]

Re-fill the truth bins with the given csv file.

This can be used to get proper efficiencies if the true signal events are saved in a separate file from the reconstructed events.

It takes the same parameters as fill_from_csv_file().

Notes

A new truth binning is created and filled with the events from the provided file. Each bin is compared to the corresponding bin in the already present truth binning. The larger value of the two is taken as the new truth. This way, event types that are not present in the pure truth data, e.g. background, are not affected by this. It can only increase the value of the truth bins, lowering their efficiency.

For each truth bin, one of the following must be true for this operation to make sense:

  • All events in the migration matrix are also present in the truth file. In this case, the additional truth events lower the efficiency of the truth bin. This is the case, for example, if not all true signal events are reconstructed.

  • All events in the truth file are also present in the migration matrix. In this case, the events in the truth file have no influence on the response matrix. This is the case, for example, if only a subset of the reconstructed background is saved in the truth file.

If there are events in the response matrix that are not in the truth tree and there are events in the truth tree that are not in the response matrix, this method will lead to a wrong efficiency of the affected truth bin.

generate_random_response_matrices(size=None, shape=None, **kwargs)[source]

Generate random response matrices according to the estimated variance.

Parameters
sizeint or tuple of ints, optional

How many random matrices should be generated.

shapetuple of ints, optional

The shape of the returned matrices. Defaults to (#(reco bins), #(truth bins)).

kwargsoptional

See get_mean_response_matrix_as_ndarray() for a description of more optional kwargs.

Returns
ndarray

Notes

This is a three step process:

  1. Draw the binomal efficiencies from Beta distributions

  2. Draw the multinomial reconstruction probabilities from a Dirichlet distribution.

  3. Draw weight corrections from normal distributions.

If no shape is specified, it will be set to (#(reco bins, #(truth bins)).

If truth_indices are provided, a sliced matrix with only the given columns will be returned.

get_in_bin_variation_as_ndarray(shape=None, truth_indices=None, normalize=True, **kwargs)[source]

Get an estimate for the variation of the response within a bin.

The in-bin variation is estimated from the maximum difference to the surrounding truth bins. The differences can be normalized to the estimated statistical errors, so values close to one indicate a statistically dominated variation.

Parameters
shapetuple of ints, optional

The shape of the returned ndarray. Default: (#(reco bins), #(truth bins))

truth_indiceslist of ints, optional

Return a sliced matrix with only the given columns.

normalizebool, optional

Divide the variation by the statistical variance

**kwargsoptional

Additional keyword arguments are passed to get_mean_response_matrix_as_ndarray() and get_statistical_variance_as_ndarray().

Returns
ndarray
get_mean_response_matrix_as_ndarray(shape=None, **kwargs)[source]

Get the means of the posterior distributions of the response matrix elements.

This is different from the “raw” matrix one gets from get_response_matrix_as_ndarray(). The latter simply divides the sum of weights in the respective bins.

Parameters
shapetuple of ints, optional

The shape of the returned matrices. Defaults to (#(reco bins), #(truth bins)).

expected_weightfloat, optional

The expected average weight of the events. This is used int the calculation of the weight variance. Default: 1.0

nuisance_indiceslist of ints, optional

List of truth bin indices. These bins will be treated like their efficiency is exactly 1. Default: Use the nuisance_indices attribute of the ResponseMatrix.

impossible_indiceslist of ints, optional

List of reco bin indices. These bins will be treated like their probability is exactly 0. Default: Use the impossible_indices attribute of the ResponseMatrix.

truth_indiceslist of ints, optional

List of truth bin indices. Only return the response of the given truth bins. Default: Return full matrices.

Returns
ndarray
get_reco_entries_as_ndarray(*args, **kwargs)[source]

Get the number of entries in the reco binning as ndarray.

get_reco_sumw2_as_ndarray(*args, **kwargs)[source]

Get the sum of squared weights in the reco binning as ndarray.

get_reco_values_as_ndarray(*args, **kwargs)[source]

Get the values of the reco binning as ndarray.

get_response_entries_as_ndarray(*args, **kwargs)[source]

Get the number of entries in the response binning as ndarray.

get_response_matrix_as_ndarray(shape=None, truth_indices=None)[source]

Return the ResponseMatrix as a ndarray.

Uses the information in the truth and response binnings to calculate the response matrix.

Parameters
shapetuple of ints, optional

The shape of the returned ndarray. Default: (#(reco bins), #(truth bins))

truth_indiceslist of ints, optional

Only return the response of the given truth bins. Default: Return full matrix.

Returns
ndarray

Notes

If shape is None, it s set to (#(reco bins), #(truth bins)). The expected response of a truth vector can then be calculated like this:

v_reco = response_matrix.dot(v_truth)

If truth_indices are provided, a sliced matrix with only the given columns will be returned.

get_response_sumw2_as_ndarray(*args, **kwargs)[source]

Get the sum of squared weights in the response binning as ndarray.

get_response_values_as_ndarray(*args, **kwargs)[source]

Get the values of the response binning as ndarray.

get_statistical_variance_as_ndarray(shape=None, **kwargs)[source]

Get the statistical variance of the single ResponseMatrix elements as ndarray.

The variance is estimated from the actual bin contents in a Bayesian motivated way.

Parameters
shapetuple of ints, optional

The shape of the returned matrix. Defaults to (#(reco bins), #(truth bins)).

kwargsoptional

See get_mean_response_matrix_as_ndarray() for a description of more optional kwargs.

Returns
ndarray

Notes

The response matrix creation is modeled as a three step process:

  1. Reconstruction efficiency according to a binomial process.

  2. Distribution of truth events among the reco bins according to a multinomial distribution.

  3. Correction of the categorical probabilities according to the mean weights of the events in each bin.

So the response matrix element can be written like this:

R_ij = m_ij * p_ij * eff_j

where eff_j is the total efficiency of events in truth bin j, p_ij is the unweighted multinomial reconstruction probability in reco bin i and m_ij the weight correction. The variance of R_ij is estimated by estimating the variances of these values separately.

The variance of eff_j is estimated by using the Bayesian conjugate prior for biinomial distributions: the Beta distribution. We assume a prior that is uniform in the reconstruction efficiency. We then update it with the simulated events. The variance of the posterior distribution is taken as the variance of the efficiency.

The variance of p_ij is estimated by using the Bayesian conjugate prior for multinomial distributions: the Dirichlet distribution. We assume a prior that is uniform in the ignorant about reconstruction probabilities. We then update it with the simulated events. The variance of the posterior distribution is taken as the variance of the transition probability.

If a list of nuisance_indices is provided, the probabilities of not reconstructing events in the respective truth categories will be fixed to 0. This is useful for background categories where one is not interested in the true number of events.

If a list of impossible_indices is provided, the probabilities of reconstructing events in the respective reco categories will be fixed to 0. This is useful for bins that are impossible to have any events in them by their definition.

The variances of m_ij is estimated from the errors of the average weights in the matrix elements as classical “standard error of the mean”. To avoid problems with bins with 0 or 1 entries, we add a “prior expectation” point to the data. This ensures that all bins have at least 1 entry (no divisions by zero) and that variances can be estimated even for bins with only one (true) entry (from the difference to the expected value).

This is just an estimate! The true variance of the randomly generated response matrices can deviate from the returned numbers. Also, these variances ignore the correlations between matrix elements.

If no shape is specified, it will be set to (N_reco, N_truth).

If truth_indices are provided, a sliced matrix with only the given columns will be returned.

get_truth_entries_as_ndarray(*args, **kwargs)[source]

Get the number of entries in the truth binning as ndarray.

get_truth_sumw2_as_ndarray(*args, **kwargs)[source]

Get the sum of squared weights in the truth binning as ndarray.

get_truth_values_as_ndarray(*args, **kwargs)[source]

Get the values of the truth binning as ndarray.

reset()[source]

Reset all binnings.

set_reco_entries_from_ndarray(*args, **kwargs)[source]

Set the number of entries in the reco binning as ndarray.

set_reco_sumw2_from_ndarray(*args, **kwargs)[source]

Set the sum of squared weights in the reco binning as ndarray.

set_reco_values_from_ndarray(*args, **kwargs)[source]

Set the values of the reco binning as ndarray.

set_response_entries_from_ndarray(*args, **kwargs)[source]

Set the number of entries in the response binning as ndarray.

set_response_sumw2_from_ndarray(*args, **kwargs)[source]

Set the sum of squared weights in the response binning as ndarray.

set_response_values_from_ndarray(*args, **kwargs)[source]

Set the values of the response binning as ndarray.

set_truth_entries_from_ndarray(*args, **kwargs)[source]

Set the number of entries in the truth binning as ndarray.

set_truth_sumw2_from_ndarray(*args, **kwargs)[source]

Set the sum of squared weights in the truth binning as ndarray.

set_truth_values_from_ndarray(*args, **kwargs)[source]

Set the values of the truth binning as ndarray.