amount : {‘equal’} or int or float
Specify the amount of elements to be selected (within the current
limit). The amount can be given as an integer value corresponding
to the absolute number of elements per unique attribute (see attr)
value, as a float corresponding to the fraction of elements, or with
the keyword ‘equal’. In the latter case the number of to be selected
elements is determined by the least number of available elements for
any given unique attribute value within the current limit.
attr : str
Dataset attribute whose unique values define element classes that are
to be balanced in number.
count : int
How many iterations to perform on generate().
limit : None or str or dict
If None the whole dataset is considered as one. If a single
attribute name is given, its unique values will be used to define
chunks of data that are balanced individually. Finally, if a
dictionary is provided, its keys define attribute names and its values
(single value or sequence thereof) attribute value, where all
key-value combinations across all given items define a “selection” of
to-be-balanced samples or features.
apply_selection : bool
Flag whether the balanced selection shall be applied, i.e. the output
dataset only contains selected elements. If False, the selection is
instead added as an attribute that merely marks selected elements (see
space argument).
include_offlimit : bool
If True, all samples that were off limit (i.e. not included in the
balancing input are included in the balanced selection. If False
(default) they are excluded.
space : str
Name of the selection marker attribute in the output dataset that is
created if the balanced selection is not applied to the output dataset
(see apply_selection argument).
enable_ca : None or list of str
Names of the conditional attributes which should be enabled in addition
to the default ones
disable_ca : None or list of str
Names of the conditional attributes which should be disabled
pass_attr : str, list of str|tuple, optional
Additional attributes to pass on to an output dataset. Attributes can
be taken from all three attribute collections of an input dataset
(sa, fa, a – see Dataset.get_attr()), or from the collection
of conditional attributes (ca) of a node instance. Corresponding
collection name prefixes should be used to identify attributes, e.g.
‘ca.null_prob’ for the conditional attribute ‘null_prob’, or
‘fa.stats’ for the feature attribute stats. In addition to a plain
attribute identifier it is possible to use a tuple to trigger more
complex operations. The first tuple element is the attribute
identifier, as described before. The second element is the name of the
target attribute collection (sa, fa, or a). The third element is the
axis number of a multidimensional array that shall be swapped with the
current first axis. The fourth element is a new name that shall be
used for an attribute in the output dataset.
Example: (‘ca.null_prob’, ‘fa’, 1, ‘pvalues’) will take the
conditional attribute ‘null_prob’ and store it as a feature attribute
‘pvalues’, while swapping the first and second axes. Simplified
instructions can be given by leaving out consecutive tuple elements
starting from the end.
postproc : Node instance, optional
Node to perform post-processing of results. This node is applied
in __call__() to perform a final processing step on the to be
result dataset. If None, nothing is done.
descr : str
Description of the instance
|