HypothesisTester
- class remu.likelihood.HypothesisTester(likelihood_calculator, maximizer=<remu.likelihood.BasinHoppingMaximizer object>)[source]
Class for statistical tests of hypotheses.
Methods
likelihood_p_value
(parameters[, N])Calculate the likelihood p-value of a set of parameters.
max_likelihood_p_value
([fix_parameters, N])Calculate the maximum-likelihood p-value.
max_likelihood_ratio_p_value
(fix_parameters)Calculate the maximum-likelihood-ratio p-value.
wilks_max_likelihood_ratio_p_value
(...[, ...])Calculate the maximum-likelihood-ratio p-value using Wilk's theorem.
- likelihood_p_value(parameters, N=2500, **kwargs)[source]
Calculate the likelihood p-value of a set of parameters.
The likelihood p-value is the probability of hypothetical alternative data yielding a lower likelihood than the actual data, given that the simple hypothesis described by the parameters is true.
- Parameters
- parametersarray like
The evaluated hypotheis expressed as a vector of its parameters. Can be a multidimensional array of vectors. The p-value for each vector is calculated independently.
- Nint, optional
The number of MC evaluations of the hypothesis.
- **kwargsoptional
Additional keyword arguments will be passed to the likelihood calculator.
- Returns
- pfloat or ndarray
The likelihood p-value.
Notes
The p-value is estimated by creating
N
data samples according to the givenparameters
. The data is varied by both the statistical and systematic uncertainties resulting from the prediction. The number of data-sets that yield a likelihood as bad as, or worse than the likelihood given the actual data,n
, are counted. The estimate forp
is then:p = n/N.
The variance of the estimator follows that of binomial statistics:
var(n) Np(1-p) 1 var(p) = ------ = ------- <= ---- . N^2 N^2 4N
The expected uncertainty can thus be directly influenced by choosing an appropriate number of toy data sets.
- max_likelihood_p_value(fix_parameters=None, N=250)[source]
Calculate the maximum-likelihood p-value.
The maximum-likelihood p-value is the probability of the data yielding a lower maximum likelihood (over the possible parameter space of the likelihood calculator) than the actual data, given that the best-fit parameter set of is true.
- Parameters
- fix_parametersarray like, optional
Optionally fix some or all of the paramters of the
LikelihoodCalculator
.- Nint, optional
The number of MC evaluations of the hypothesis.
- **kwargsoptional
Additional keyword arguments will be passed to the maximiser.
- Returns
- pfloat or ndarray
The maximum-likelihood p-value.
Notes
When used to reject composite hypotheses, this p-value is somtime called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data)..
The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:
p = n/N.
The variance of the estimator follows that of binomial statistics:
var(n) Np(1-p) 1 var(p) = ------ = ------- <= ---- . N^2 N^2 4N
The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.
- max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, N=250, **kwargs)[source]
Calculate the maximum-likelihood-ratio p-value.
The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true.
- Parameters
- fix_parametersarray like
Fix some or all of the paramters of the
LikelihoodCalculator
. This defines the tested hypothesis H0.- alternative_fix_parametersarray like, optional
Optionally fix some of the paramters of the
LikelihoodCalculator
to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.- Nint, optional
The number of MC evaluations of the hypothesis.
- **kwargsoptional
Additional keyword arguments will be passed to the maximiser.
- Returns
- pfloat or ndarray
The maximum-likelihood-ratio p-value.
Notes
When used to reject composite hypotheses, this p-value is sometimes called the “profile plug-in p-value”, as one “plugs in” the maximum likelihood estimate of the hypothesis parameters to calculate it. It’s coverage properties are not exact, so care has to be taken to make sure it performs as expected (e.g. by testing it with simulated data).
The p-value is estimated by randomly creating N data samples according to the given truth_vector. The number of data-sets that yield a likelihood ratio as bad as, or worse than the likelihood given the actual data, n, are counted. The estimate for p is then:
p = n/N.
The variance of the estimator follows that of binomial statistics:
var(n) Np(1-p) 1 var(p) = ------ = ------- <= ---- . N^2 N^2 4N
The expected uncertainty can thus be directly influenced by choosing an appropriate number of evaluations.
- wilks_max_likelihood_ratio_p_value(fix_parameters, alternative_fix_parameters=None, **kwargs)[source]
Calculate the maximum-likelihood-ratio p-value using Wilk’s theorem.
The maximum-likelihood-ratio p-value is the probability of the data yielding a lower ratio of maximum likelihoods (over the possible parameter spaces of the composite hypotheses) than the actual data, given that the best-fit parameter set of the tested hypothesis H0 is true. This method assumes that Wilk’s theorem holds.
- Parameters
- fix_parametersarray like
Fix some or all of the paramters of the
LikelihoodCalculator
. This defines the tested hypothesis H0.- alternative_fix_parametersarray like, optional
Optionally fix some of the paramters of the
LikelihoodCalculator
to define the alternative Hypothesis H1. If this is not specified, H1 is the fully unconstrained calculator.- **kwargsoptional
Additional keyword arguments will be passed to the maximiser.
- Returns
- pfloat
The maximum-likelihood-ratio p-value.
Notes
This method assumes that Wilks’ theorem holds, i.e. that the logarithm of the maximum likelihood ratio of the two hypothesis is distributed like a chi-squared distribution:
ndof = number_of_parameters_of_H1 - number_of_parameters_of_H0 p_value = scipy.stats.chi2.sf(-2*log_likelihood_ratio, df=ndof)