Null-hypothesis distribution is estimated from randomly permuted data labels.
The distribution is estimated by calling fit() with an appropriate Measure or TransferError instance and a training and a validation dataset (in case of a TransferError). For a customizable amount of cycles the training data labels are permuted and the corresponding measure computed. In case of a TransferError this is the error when predicting the correct labels of the validation dataset.
The distribution can be queried using the cdf() method, which can be configured to report probabilities/frequencies from left or right tail, i.e. fraction of the distribution that is lower or larger than some critical value.
This class also supports FeaturewiseMeasure. In that case cdf() returns an array of featurewise probabilities/frequencies.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Methods
cdf(x) | |
clean() | Clean stored distributions |
dists() | |
fit(measure, ds) | Fit the distribution by performing multiple cycles which repeatedly permuted labels in the training dataset. |
p(x[, return_tails]) | Returns the p-value for values of x. |
rcdf(x) | |
reset() |
Initialize Monte-Carlo Permutation Null-hypothesis testing
Parameters : | permutator : Node
dist_class : class
measure : Measure or None
enable_ca : None or list of str
disable_ca : None or list of str
tail : {‘left’, ‘right’, ‘any’, ‘both’}
descr : str
|
---|
Methods
cdf(x) | |
clean() | Clean stored distributions |
dists() | |
fit(measure, ds) | Fit the distribution by performing multiple cycles which repeatedly permuted labels in the training dataset. |
p(x[, return_tails]) | Returns the p-value for values of x. |
rcdf(x) | |
reset() |
Clean stored distributions
Storing all of the distributions might be too expensive (e.g. in case of Nonparametric), and the scope of the object might be too broad to wait for it to be destroyed. Clean would bind dist_samples to empty list to let gc revoke the memory.
Fit the distribution by performing multiple cycles which repeatedly permuted labels in the training dataset.
Parameters : | measure: Measure or None :
ds: `Dataset` which gets permuted and used to compute the :
|
---|