matrix_utils

Utility functions for the work with response matrices.

remu.matrix_utils.compatibility(first, second, N=None, return_all=False, truth_indices=None, min_quality=0.95, **kwargs)[source]

Calculate the compatibility between this and another response matrix.

Basically, this checks whether the point of “the matrices are identical” is an outlier in the distribution of matrix differences as defined by the statistical uncertainties of the matrix elements. This is done using the Mahalanobis distance as the test statistic. If the point “the matrices are identical” is not a reasonable part of the distribution, it is not reasonable to assume that the true matrices are identical.

Parameters
secondResponseMatrix

The second response matrix.

Nint, optional

Number of random matrices to be generated for the calculation. This number must be larger than the number of reco bins! Otherwise the covariances cannot be calculated correctly. Defaults to #(reco bins) + 100).

return_allbool, optional

If False, return only null_prob_count, and null_prob_chi2.

truth_indiceslist of ints, optional

Only use the given truth indices to calculate the compatibility. If this is not specified, only indices with a minimum “quality” are used. This quality requires enough statistics in the bins to make the difference between the mean matrices not be dominated by the shared prior.

Returns
null_prob_countfloat

The Bayesian p-value evaluated by counting the expected number of random matrix differences more extreme than the mean difference.

null_prob_chi2float

The Bayesian p-value evaluated by assuming a chi-square distribution of the squares of Mahalanobis distances.

null_distancefloat, optional

The squared Mahalanobis distance of the mean differences between the two matrices:

D_M^2( mean(first.random_matrices - second.random_matrices) )
distancesndarray, optional

The set of squared Mahalanobis distances between randomly generated matrix differences and the mean matrix difference:

D_M^2( (first.random_matrices - second.random_matrices)
     - mean(first.random_matrices - second.random_matrices) )
dfint, optional

Degrees of freedom of the assumed chi-squared distribution of the squared Mahalanobis distances. This is equal to the number of matrix elements that are considered for the calculation:

df = len(truth_indices) * #(reco_bins in matrix)

Notes

The distribution of matrix differences is evaluated by generating N random response matrices from both compared matrices and calculating the (n-dimensional) differences. The resulting set of matrix differences defines the mean mean(differences) and the covariance matrix cov(differences). The covariance in turn defines a metric for the Mahalanobis distance D_M(x) on the space of matrix differences, where x is a set of matrix element differences.

The distance between the mean difference and the Null hypothesis, that the two true matrices are identical, is the null_distance:

null_distance = D_M(0 - mean(differences)) = D_M(mean(differences))

The compatibility between the matrices is now defined as the Bayesian probability that the true difference between the matrices is more extreme (has a larger distance from the mean difference) than the Null hypothesis. For this, we can just evaluate the set of matrix differences that was used to calculate the covariance matrix:

distances = D_M(differences - mean(differences))
null_prob_count = np.sum(distances >= null_distance) / distances.size

It will be 1 if the mean difference between the matrices is 0, and tend to 0 when the mean difference between the matrices is far from 0. “Far” in this case is determined by the uncertainty, i.e. the covariance, of the difference determination.

In the case of normal distributed differences, the distribution of squared Mahalanobis distances becomes chi-squared distributed. The numbers of degrees of freedom of that distribution is the number of variates, i.e. the number of response matrix elements that are being considered. This can be used to calculate a theoretical value for the compatibility:

df = len(truth_indices) * #(reco_bins)
null_prob_chi2 = chi2.sf(null_distance**2, df)

Since the distribution of differences is not necessarily Gaussian, this is only an estimate. Its advantage is that it is less dependent on the number of randomly drawn matrices.

remu.matrix_utils.improve_stats(response_matrix, data_index=None)[source]

Reduce the statistical uncertainty by merging some bins in the truth binning.

Parameters
response_matrixResponseMatrix
data_indexint, optional

Improve the stats at this truth binning data index. Defaults to lowest entries bin.

Returns
new_response_matrixResponseMatrix

Warning

The resulting matrix will have the nuisance/impossible indices set to []!

Notes

Depending on the truth binning, one or more bins will be merged. The bin corresponding to data_index will be among them. The “direction” of the merge (i.e. which neighbouring bin to merge it with) is decided by the compatibility of the sets of to-be-merged bins. I.e. the algorithm tries to minimize the response difference between the merged bins.

remu.matrix_utils.mahalanobis_distance(first, second, shape=None, N=None, return_distances_from_mean=False, **kwargs)[source]

Calculate the squared Mahalanobis distance of the two matrices for each truth bin.

Parameters
first, secondResponseMatrix

The second ResponseMatrix for the comparison.

shapetuple of ints, optional

The shape of the returned matrix. Defaults to (#(truth bins),).

Nint, optional

Number of random matrices to be generated for the calculation. This number must be larger than the number of reco bins! Otherwise the covariances cannot be calculated correctly. Defaults to #(reco bins) + 100).

return_distances_from_meanbool, optional

Also return the ndarray distances_from_mean.

**kwargsoptional

Additional keyword arguments are passed through to generate_random_response_matrices().

Returns
distancendarray

Array of shape shape with the squared Mahalanobis distance of the mean difference between the matrices for each truth bin:

D_M^2( mean(first.random_matrices - second.random_matrices) )
distances_from_meanndarray, optional

Array of shape (N,)+shape with the squared Mahalanobis distances between the randomly generated matrix differences and the mean matrix difference for each truth bin:

D_M^2( (first.random_matrices - second.random_matrices)
     - mean(first.random_matrices - second.random_matrices) )

See also

compatibility
remu.matrix_utils.plot_compatibility(first, second, filename=None, **kwargs)[source]

Plot the compatibility of the two matrices.

Parameters
first, secondResponseMatrix

Two instances of ResponseMatrix for comparison.

filenamestring

The filename where the plot will be saved.

**kwargsoptional

Additional keyword arguments are passed to compatibility().

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.

See also

compatibility
remu.matrix_utils.plot_in_bin_variation(response_matrix, filename=None, **kwargs)[source]

Plot the maximum in-bin variation vor each truth bin.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters
response_matrixResponseMatrix

The thing to plot.

filenamestring

The filename where the plot will be saved.

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.

remu.matrix_utils.plot_mahalanobis_distance(first, second, filename=None, plot_expectation=True, **kwargs)[source]

Plot the squared Mahalanobis distance D_M^2 between two matrices.

Parameters
first, secondResponseMatrix

The two response matrices for the comparison.

plot_expectationbool

Also plot the expected distance.

filenamestr, optional

Save the plot to this location

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that has been plotted on.

axAxes

The axes that have been plotted into.

Notes

The expected distance is only an estimate based on the statistics in the bins. It is not exact and should be treated as a rough guide rather than a hard compatibility criterion.

remu.matrix_utils.plot_mean_efficiency(response_matrix, filename=None, nuisance_value=0.0, **kwargs)[source]

Plot mean efficiencies for all truth bins.

This ignores the statistical uncertainties of the bin entries. The plot will contain the minimum, maximum, and median marginalization of these mean efficiencies.

Parameters
response_matrixResponseMatrix

The thing to plot.

filenamestring

The filename where the plot will be saved.

nuisance_valuefloat, optional

Nuisance bins are set to this value.

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.

remu.matrix_utils.plot_mean_response_matrix(response_matrix, filename=None, **kwargs)[source]

Plot the smearing and efficiency.

Parameters
response_matrixResponseMatrix

The thing to plot.

filenamestring

The filename where the plot will be saved.

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.

remu.matrix_utils.plot_relative_in_bin_variation(response_matrix, filename=None, **kwargs)[source]

Plot the maximum in-bin variation relative to statistical uncertainty.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters
response_matrixResponseMatrix

The thing to plot.

filenamestring

The filename where the plot will be saved.

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.

remu.matrix_utils.plot_statistical_uncertainty(response_matrix, filename=None, **kwargs)[source]

Plot the maximum sqrt(statistical variance) of each truth bin.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters
response_matrixResponseMatrix

The thing to plot.

filenamestring

The filename where the plot will be saved.

**kwargsoptional

Additional keyword arguments are passed to the plotting function.

Returns
figFigure

The figure that was used for plotting.

axAxis

The axis that was used for plotting.