HypothesisTester

class remu.likelihood.HypothesisTester(likelihood_calculator, maximizer=<remu.likelihood.BasinHoppingMaximizer object>)[source]

Class for statistical tests of hypotheses.

Methods

`likelihood_p_value`(parameters[, N])	Calculate the likelihood p-value of a set of parameters.
`max_likelihood_p_value`([fix_parameters, N])	Calculate the maximum-likelihood p-value.
`max_likelihood_ratio_p_value`(fix_parameters)	Calculate the maximum-likelihood-ratio p-value.
`wilks_max_likelihood_ratio_p_value`(...[, ...])	Calculate the maximum-likelihood-ratio p-value using Wilk's theorem.

likelihood_p_value(parameters, N=2500, **kwargs)[source]

Calculate the likelihood p-value of a set of parameters.

The likelihood p-value is the probability of hypothetical alternative data yielding a lower likelihood than the actual data, given that the simple hypothesis described by the parameters is true.

Parameters

parametersarray like: The evaluated hypotheis expressed as a vector of its parameters. Can be a multidimensional array of vectors. The p-value for each vector is calculated independently.
Nint, optional: The number of MC evaluations of the hypothesis.
**kwargsoptional: Additional keyword arguments will be passed to the likelihood calculator.

Returns

pfloat or ndarray: The likelihood p-value.

See also

max_likelihood_p_value
max_likelihood_ratio_p_value

Notes

The p-value is estimated by creating N data samples according to the given parameters. The data is varied by both the statistical and systematic uncertainties resulting from the prediction. The number of data-sets that yield a likelihood as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of toy data sets.

max_likelihood_p_value(fix_parameters=None, N=250)[source]

Calculate the maximum-likelihood p-value.

The maximum-likelihood p-value is the probability of the data yielding a lower maximum likelihood (over the possible parameter space of the likelihood calculator) than the actual data, given that the best-fit parameter set of is true.

Parameters

fix_parametersarray like, optional: Optionally fix some or all of the paramters of the LikelihoodCalculator.
Nint, optional: The number of MC evaluations of the hypothesis.
**kwargsoptional: Additional keyword arguments will be passed to the maximiser.

Returns

pfloat or ndarray: The maximum-likelihood p-value.

See also

likelihood_p_value
max_likelihood_ratio_p_value
LikelihoodCalculator.fix_parameters

Notes

When used to reject composite hypotheses, this p-value is somtime called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data)..

The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.

max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, N=250, **kwargs)[source]

Calculate the maximum-likelihood-ratio p-value.

The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true.

Parameters

fix_parametersarray like: Fix some or all of the paramters of the LikelihoodCalculator. This defines the tested hypothesis H0.
alternative_fix_parametersarray like, optional: Optionally fix some of the paramters of the LikelihoodCalculator to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.
Nint, optional: The number of MC evaluations of the hypothesis.
**kwargsoptional: Additional keyword arguments will be passed to the maximiser.

Returns

pfloat or ndarray: The maximum-likelihood-ratio p-value.

See also

wilks_max_likelihood_ratio_p_value
likelihood_p_value
max_likelihood_p_value

Notes

When used to reject composite hypotheses, this p-value is sometimes called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data).

The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:

p = n/N.

The variance of the estimator follows that of binomial statistics:

         var(n)   Np(1-p)      1
var(p) = ------ = ------- <= ---- .
          N^2       N^2       4N

The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.

wilks_max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, **kwargs)[source]

Calculate the maximum-likelihood-ratio p-value using Wilk’s theorem.

The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true. This method assumes that Wilk’s theorem holds.

Parameters

fix_parametersarray like: Fix some or all of the paramters of the LikelihoodCalculator. This defines the tested hypothesis H0.
alternative_fix_parametersarray like, optional: Optionally fix some of the paramters of the LikelihoodCalculator to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.
**kwargsoptional: Additional keyword arguments will be passed to the maximiser.

Returns

pfloat: The maximum-likelihood-ratio p-value.

See also

max_likelihood_ratio_p_value
likelihood_p_value
max_likelihood_p_value

Notes

This method assumes that Wilks’ theorem holds, i.e. that the logarithm of the maximum likelihood ratio of the two hypothesis is distributed like a chi-squared distribution:

ndof = number_of_parameters_of_H1 - number_of_parameters_of_H0
p_value = scipy.stats.chi2.sf(-2*log_likelihood_ratio, df=ndof)