HypothesisTester

class remu.likelihood.HypothesisTester(likelihood_calculator, maximizer=<remu.likelihood.BasinHoppingMaximizer object>)[source]

Class for statistical tests of hypotheses.

Methods

likelihood_p_value(parameters[, N])

Calculate the likelihood p-value of a set of parameters.

max_likelihood_p_value([fix_parameters, N])

Calculate the maximum-likelihood p-value.

max_likelihood_ratio_p_value(fix_parameters)

Calculate the maximum-likelihood-ratio p-value.

wilks_max_likelihood_ratio_p_value(...[, ...])

Calculate the maximum-likelihood-ratio p-value using Wilk's theorem.

likelihood_p_value(parameters, N=2500, **kwargs)[source]

Calculate the likelihood p-value of a set of parameters.

The likelihood p-value is the probability of hypothetical alternative data yielding a lower likelihood than the actual data, given that the simple hypothesis described by the parameters is true.

Parameters
parametersarray like

The evaluated hypotheis expressed as a vector of its parameters. Can be a multidimensional array of vectors. The p-value for each vector is calculated independently.

Nint, optional

The number of MC evaluations of the hypothesis.

**kwargsoptional

Additional keyword arguments will be passed to the likelihood calculator.

Returns
pfloat or ndarray

The likelihood p-value.

Notes

The p-value is estimated by creating N data samples according to the given parameters. The data is varied by both the statistical and systematic uncertainties resulting from the prediction. The number of data-sets that yield a likelihood as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of toy data sets.

max_likelihood_p_value(fix_parameters=None, N=250)[source]

Calculate the maximum-likelihood p-value.

The maximum-likelihood p-value is the probability of the data yielding a lower maximum likelihood (over the possible parameter space of the likelihood calculator) than the actual data, given that the best-fit parameter set of is true.

Parameters
fix_parametersarray like, optional

Optionally fix some or all of the paramters of the LikelihoodCalculator.

Nint, optional

The number of MC evaluations of the hypothesis.

**kwargsoptional

Additional keyword arguments will be passed to the maximiser.

Returns
pfloat or ndarray

The maximum-likelihood p-value.

Notes

When used to reject composite hypotheses, this p-value is somtime called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data)..

The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.

max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, N=250, **kwargs)[source]

Calculate the maximum-likelihood-ratio p-value.

The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true.

Parameters
fix_parametersarray like

Fix some or all of the paramters of the LikelihoodCalculator. This defines the tested hypothesis H0.

alternative_fix_parametersarray like, optional

Optionally fix some of the paramters of the LikelihoodCalculator to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.

Nint, optional

The number of MC evaluations of the hypothesis.

**kwargsoptional

Additional keyword arguments will be passed to the maximiser.

Returns
pfloat or ndarray

The maximum-likelihood-ratio p-value.

Notes

When used to reject composite hypotheses, this p-value is sometimes called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data).

The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.

wilks_max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, **kwargs)[source]

Calculate the maximum-likelihood-ratio p-value using Wilk’s theorem.

The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true. This method assumes that Wilk’s theorem holds.

Parameters
fix_parametersarray like

Fix some or all of the paramters of the LikelihoodCalculator. This defines the tested hypothesis H0.

alternative_fix_parametersarray like, optional

Optionally fix some of the paramters of the LikelihoodCalculator to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.

**kwargsoptional

Additional keyword arguments will be passed to the maximiser.

Returns
pfloat

The maximum-likelihood-ratio p-value.

Notes

This method assumes that Wilks’ theorem holds, i.e. that the logarithm of the maximum likelihood ratio of the two hypothesis is distributed like a chi-squared distribution:

ndof = number_of_parameters_of_H1 - number_of_parameters_of_H0
p_value = scipy.stats.chi2.sf(-2*log_likelihood_ratio, df=ndof)