matrix_utils

Utility functions for the work with response matrices.

remu.matrix_utils.compatibility(first, second, N=None, return_all=False, truth_indices=None, min_quality=0.95, **kwargs)[source]

Calculate the compatibility between this and another response matrix.

Basically, this checks whether the point of “the matrices are identical” is an outlier in the distribution of matrix differences as defined by the statistical uncertainties of the matrix elements. This is done using the Mahalanobis distance as the test statistic. If the point “the matrices are identical” is not a reasonable part of the distribution, it is not reasonable to assume that the true matrices are identical.

Parameters

secondResponseMatrix: The second response matrix.
Nint, optional: Number of random matrices to be generated for the calculation. This number must be larger than the number of reco bins! Otherwise the covariances cannot be calculated correctly. Defaults to #(reco bins) + 100).
return_allbool, optional: If False, return only null_prob_count, and null_prob_chi2.
truth_indiceslist of ints, optional: Only use the given truth indices to calculate the compatibility. If this is not specified, only indices with a minimum “quality” are used. This quality requires enough statistics in the bins to make the difference between the mean matrices not be dominated by the shared prior.

Returns

null_prob_countfloat

The Bayesian p-value evaluated by counting the expected number of random matrix differences more extreme than the mean difference.

null_prob_chi2float

The Bayesian p-value evaluated by assuming a chi-square distribution of the squares of Mahalanobis distances.

null_distancefloat, optional

The squared Mahalanobis distance of the mean differences between the two matrices:

D_M^2( mean(first.random_matrices - second.random_matrices) )

distancesndarray, optional

The set of squared Mahalanobis distances between randomly generated matrix differences and the mean matrix difference:

D_M^2( (first.random_matrices - second.random_matrices)
     - mean(first.random_matrices - second.random_matrices) )

dfint, optional

Degrees of freedom of the assumed chi-squared distribution of the squared Mahalanobis distances. This is equal to the number of matrix elements that are considered for the calculation:

df = len(truth_indices) * #(reco_bins in matrix)

See also

mahalanobis_distance

Notes

The distribution of matrix differences is evaluated by generating N random response matrices from both compared matrices and calculating the (n-dimensional) differences. The resulting set of matrix differences defines the mean mean(differences) and the covariance matrix cov(differences). The covariance in turn defines a metric for the Mahalanobis distance D_M(x) on the space of matrix differences, where x is a set of matrix element differences.

The distance between the mean difference and the Null hypothesis, that the two true matrices are identical, is the null_distance:

null_distance = D_M(0 - mean(differences)) = D_M(mean(differences))

The compatibility between the matrices is now defined as the Bayesian probability that the true difference between the matrices is more extreme (has a larger distance from the mean difference) than the Null hypothesis. For this, we can just evaluate the set of matrix differences that was used to calculate the covariance matrix:

distances = D_M(differences - mean(differences))
null_prob_count = np.sum(distances >= null_distance) / distances.size

It will be 1 if the mean difference between the matrices is 0, and tend to 0 when the mean difference between the matrices is far from 0. “Far” in this case is determined by the uncertainty, i.e. the covariance, of the difference determination.

In the case of normal distributed differences, the distribution of squared Mahalanobis distances becomes chi-squared distributed. The numbers of degrees of freedom of that distribution is the number of variates, i.e. the number of response matrix elements that are being considered. This can be used to calculate a theoretical value for the compatibility:

df = len(truth_indices) * #(reco_bins)
null_prob_chi2 = chi2.sf(null_distance**2, df)

Since the distribution of differences is not necessarily Gaussian, this is only an estimate. Its advantage is that it is less dependent on the number of randomly drawn matrices.

remu.matrix_utils.improve_stats(response_matrix, data_index=None)[source]

Reduce the statistical uncertainty by merging some bins in the truth binning.

Parameters

response_matrixResponseMatrix
data_indexint, optional: Improve the stats at this truth binning data index. Defaults to lowest entries bin.

Returns

new_response_matrixResponseMatrix

Warning

The resulting matrix will have the nuisance/impossible indices set to []!

Notes

Depending on the truth binning, one or more bins will be merged. The bin corresponding to data_index will be among them. The “direction” of the merge (i.e. which neighbouring bin to merge it with) is decided by the compatibility of the sets of to-be-merged bins. I.e. the algorithm tries to minimize the response difference between the merged bins.

remu.matrix_utils.mahalanobis_distance(first, second, shape=None, N=None, return_distances_from_mean=False, **kwargs)[source]

Calculate the squared Mahalanobis distance of the two matrices for each truth bin.

Parameters

first, secondResponseMatrix: The second ResponseMatrix for the comparison.
shapetuple of ints, optional: The shape of the returned matrix. Defaults to (#(truth bins),).
Nint, optional: Number of random matrices to be generated for the calculation. This number must be larger than the number of reco bins! Otherwise the covariances cannot be calculated correctly. Defaults to #(reco bins) + 100).
return_distances_from_meanbool, optional: Also return the ndarray distances_from_mean.
**kwargsoptional: Additional keyword arguments are passed through to generate_random_response_matrices().

Returns

distancendarray

Array of shape shape with the squared Mahalanobis distance of the mean difference between the matrices for each truth bin:

D_M^2( mean(first.random_matrices - second.random_matrices) )

distances_from_meanndarray, optional

Array of shape (N,)+shape with the squared Mahalanobis distances between the randomly generated matrix differences and the mean matrix difference for each truth bin:

D_M^2( (first.random_matrices - second.random_matrices)
     - mean(first.random_matrices - second.random_matrices) )

See also

compatibility

remu.matrix_utils.plot_compatibility(first, second, filename=None, **kwargs)[source]

Plot the compatibility of the two matrices.

Parameters

first, secondResponseMatrix: Two instances of ResponseMatrix for comparison.
filenamestring: The filename where the plot will be saved.
**kwargsoptional: Additional keyword arguments are passed to compatibility().

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

See also

compatibility

remu.matrix_utils.plot_in_bin_variation(response_matrix, filename=None, **kwargs)[source]

Plot the maximum in-bin variation vor each truth bin.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters

response_matrixResponseMatrix: The thing to plot.
filenamestring: The filename where the plot will be saved.
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

See also

ResponseMatrix.get_in_bin_variation_as_ndarray

remu.matrix_utils.plot_mahalanobis_distance(first, second, filename=None, plot_expectation=True, **kwargs)[source]

Plot the squared Mahalanobis distance D_M^2 between two matrices.

Parameters

first, secondResponseMatrix: The two response matrices for the comparison.
plot_expectationbool: Also plot the expected distance.
filenamestr, optional: Save the plot to this location
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that has been plotted on.
axAxes: The axes that have been plotted into.

See also

mahalanobis_distance

Notes

The expected distance is only an estimate based on the statistics in the bins. It is not exact and should be treated as a rough guide rather than a hard compatibility criterion.

remu.matrix_utils.plot_mean_efficiency(response_matrix, filename=None, nuisance_value=0.0, **kwargs)[source]

Plot mean efficiencies for all truth bins.

This ignores the statistical uncertainties of the bin entries. The plot will contain the minimum, maximum, and median marginalization of these mean efficiencies.

Parameters

response_matrixResponseMatrix: The thing to plot.
filenamestring: The filename where the plot will be saved.
nuisance_valuefloat, optional: Nuisance bins are set to this value.
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

remu.matrix_utils.plot_mean_response_matrix(response_matrix, filename=None, **kwargs)[source]

Plot the smearing and efficiency.

Parameters

response_matrixResponseMatrix: The thing to plot.
filenamestring: The filename where the plot will be saved.
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

remu.matrix_utils.plot_relative_in_bin_variation(response_matrix, filename=None, **kwargs)[source]

Plot the maximum in-bin variation relative to statistical uncertainty.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters

response_matrixResponseMatrix: The thing to plot.
filenamestring: The filename where the plot will be saved.
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

See also

ResponseMatrix.get_in_bin_variation_as_ndarray

remu.matrix_utils.plot_statistical_uncertainty(response_matrix, filename=None, **kwargs)[source]

Plot the maximum sqrt(statistical variance) of each truth bin.

This plots will contain the minimum, maximum, and median marginalization of these maximum numbers.

Parameters

response_matrixResponseMatrix: The thing to plot.
filenamestring: The filename where the plot will be saved.
**kwargsoptional: Additional keyword arguments are passed to the plotting function.

Returns

figFigure: The figure that was used for plotting.
axAxis: The axis that was used for plotting.

See also

ResponseMatrix.get_statistical_variance_as_ndarray