Binning

class remu.binning.Binning(bins, subbinnings=None, value_array=None, entries_array=None, sumw2_array=None, phasespace=None, dummy=False)[source]

A Binning is a set of disjunct Bins.

Parameters
binslist of Bin

The list of disjoint bins.

subbinningsdict of {bin_index: Binning}, optional

Subbinnings to replace certain bins.

value_arrayslice of ndarray, optional

A slice of a numpy array, where the values of the bins will be stored.

entries_arrayslice of ndarray, optional

A slice of a numpy array, where the number of entries will be stored.

sumw2_arrayslice of ndarray, optional

A slice of a numpy array, where the squared weights will be stored.

phasespacePhaseSpace, optional

The PhaseSpace the binning resides in.

dummybool, optional

Do not create any arrays to store the data.

Notes

Subbinnings are used to get a finer binning within a given bin. The bin to be replaced by the finer binning is specified using the native bin index, i.e. the number it would have before the sub binnings are assigned. Subbinnings are inserted into the numpy arrays at the position of the original bins. This changes the effective bin number of all later bins.

The data itself is stored in Numpy arrays (or views of such) that are managed by the Binning. The arrays are linked to the contained Bin objects and subbinnings by setting their respective storage arrays to sliced views of the data arrays. The original arrays in the bins and subbinnings will always be replaced.

Attributes

bins

(tuple of Bin) The list of disjoint bins on the PhaseSpace.

nbins

(int) The number of bins in the binning.

data_size

(int) The number of elements in the data arrays. Might differ from nbins due to subbinnings.

subbinnings

(dict of {bin_index: Binning}, optional) Subbinnings to replace certain bins.

value_array

(slice of ndarray) A slice of a numpy array, where the values of the bins are stored.

entries_array

(slice of ndarray) A slice of a numpy array, where the number of entries are stored.

sumw2_array

(slice of ndarray) A slice of a numpy array, where the squared weights are stored.

phasespace

(PhaseSpace) The PhaseSpace the binning resides in.

Methods

clone(**kwargs)

Create a functioning copy of the Binning.

event_in_binning(event)

Check whether an event fits into any of the bins.

fill(event[, weight, raise_error, rename])

Fill the events into their respective bins.

fill_data_index(i[, weight])

Add the weight(s) to the given data position.

fill_from_csv_file(*args, **kwargs)

Fill the binning with events from a CSV file.

fill_multiple_from_csv_file(binnings, filename)

Fill multiple Binnings from the same csv file(s).

from_yaml(loader, node)

Convert a representation node to a Python object.

get_adjacent_bin_indices()

Return a list of adjacent bin indices.

get_adjacent_data_indices()

Return a list of adjacent data indices.

get_bin_data_index(bin_i)

Calculate the data array index from the bin number.

get_data_bin_index(data_i)

Calculate the bin number from the data array index.

get_entries_as_ndarray([shape, indices])

Return the number of entries in the bins as ndarray.

get_event_bin(event)

Get the bin of the event.

get_event_bin_index(event)

Get the bin number of the given event.

get_event_data_index(event)

Get the data array index of the given event.

get_event_subbins(event)

Get the tuple of subbins of the event.

get_subbins(data_index)

Return a tuple of the bin and subbins corresponding to the data_index.

get_sumw2_as_ndarray([shape, indices])

Return the sum of squared weights in the bins as ndarray.

get_values_as_ndarray([shape, indices])

Return the bin values as ndarray.

insert_subbinning(bin_index, binning)

Insert a new subbinning into the binning.

insert_subbinning_on_ndarray(array, ...)

Insert values of a new subbinning into the array.

is_dummy()

Return True if there is no data array linked to this binning.

iter_subbins()

Iterate over all bins and subbins.

link_arrays()

Link the data storage arrays into the bins and sub_binnings.

marginalize_subbinnings([bin_indices])

Return a clone of the Binning with subbinnings removed.

marginalize_subbinnings_on_ndarray(array[, ...])

Marginalize out the bins corresponding to the subbinnings.

reset([value, entries, sumw2])

Reset all bin values to 0.

set_entries_from_ndarray(arr)

Set the number of bin entries to the values of the ndarray.

set_sumw2_from_ndarray(arr)

Set the sums of squared weights to the values of the ndarray.

set_values_from_ndarray(arr)

Set the bin values to the values of the ndarray.

to_yaml(dumper, obj)

Convert a Python object to a representation node.

yaml_dumper

yaml_loader

clone(**kwargs)[source]

Create a functioning copy of the Binning.

Can specify additional kwargs for the initialisation of the new Binning.

event_in_binning(event)[source]

Check whether an event fits into any of the bins.

fill(event, weight=1, raise_error=False, rename=None)[source]

Fill the events into their respective bins.

Parameters
event[iterable of] dict like or Numpy structured array or Pandas DataFrame

The event(s) to be filled into the binning.

weightfloat or iterable of floats, optional

The weight of the event(s). Can be either a scalar which is then used for all events or an iterable of weights for the single events. Default: 1.

raise_errorbool, optional

Raise a ValueError if an event is not in the binning. Otherwise ignore the event. Default: False

renamedict, optional

Dict for translating event variable names to binning variable names. Default: {}, i.e. no translation

fill_data_index(i, weight=1.0)[source]

Add the weight(s) to the given data position.

Also increases the number of entries and sum of squared weights accordingly.

Parameters
iint

The index of the data arrays to be filled.

weightfloat or iterable of floats, optional

Weight(s) to be added to the value of the bin.

fill_from_csv_file(*args, **kwargs)[source]

Fill the binning with events from a CSV file.

Parameters
filenamestring or list of strings

The csv file with the data. Can be a list of filenames.

weightfieldstring, optional

The column with the event weights.

weightfloat or iterable of floats, optional

A single weight that will be applied to all events in the file. Can be an iterable with one weight for each file if filename is a list.

renamedict, optional

A dict with columns that should be renamed before filling:

{'csv_name': 'binning_name'}
cut_functionfunction, optional

A function that modifies the loaded data before filling into the binning, e.g.:

cut_function(data) = data[ data['binning_name'] > some_threshold ]

This is done after the optional renaming.

buffer_csv_filesbool, optional

Save the results of loading CSV files in temporary files that can be recovered if the same CSV file is loaded again. This speeds up filling multiple Binnings with the same CSV-files considerably! Default: False

chunksizeint, optional

Load csv file in chunks of <chunksize> rows. This reduces the memory footprint of the loading operation, but can slow it down. Default: 10000

Notes

The file must be formated like this:

first_varname,second_varname,...
<first_value>,<second_value>,...
<first_value>,<second_value>,...
<first_value>,<second_value>,...
...

For example:

x,y,z
1.0,2.1,3.2
4.1,2.0,2.9
3,2,1

All values are interpreted as floats. If weightfield is given, that field will be used as weigts for the event. Other keyword arguments are passed on to the Binning’s fill() method. If filename is a list, all elemets are handled recursively.

classmethod fill_multiple_from_csv_file(binnings, filename, weightfield=None, weight=1.0, rename=None, cut_function=<function Binning.<lambda>>, buffer_csv_files=False, chunksize=10000, **kwargs)[source]

Fill multiple Binnings from the same csv file(s).

This method saves time, because the numpy array only has to be generated once. Other than the list of binnings to be filled, the (keyword) arguments are identical to the ones used by the instance method fill_from_csv_file().

classmethod from_yaml(loader, node)[source]

Convert a representation node to a Python object.

get_adjacent_bin_indices()[source]

Return a list of adjacent bin indices.

Returns
adjacent_indiceslist of ndarray

The adjacent indices of each bin

get_adjacent_data_indices()[source]

Return a list of adjacent data indices.

Returns
adjacent_indiceslist of ndarray

The adjacent indices of each data index

Notes

Data indices inside a subbinning will only ever be adjacent to other indices inside the same subbinning. There is no information available about which bins in a subbinning are adjacent to which bins in the parent binning.

get_bin_data_index(bin_i)[source]

Calculate the data array index from the bin number.

get_data_bin_index(data_i)[source]

Calculate the bin number from the data array index.

All data indices inside a subbinning will return the bin index of that subbinning.

get_entries_as_ndarray(shape=None, indices=None)[source]

Return the number of entries in the bins as ndarray.

Parameters
shape: tuple of ints

Shape of the resulting array. Default: (len(bins),)

indices: list of ints

Only return the given bins. Default: Return all bins.

Returns
ndarray

An ndarray with the numbers of entries of the bins.

get_event_bin(event)[source]

Get the bin of the event.

Returns None if the event does not fit in any bin.

Parameters
eventdict like

A dictionary (or similar object) with one value of each variable

in the binning, e.g.:

{'x': 1.4, 'y': -7.47}
Returns
Bin or None

The Bin object the event fits into.

get_event_bin_index(event)[source]

Get the bin number of the given event.

Returns None if the event does not belong to any bin.

Parameters
eventdict like

A dictionary (or similar object) with one value of each variable in the binning, e.g.:

{'x': 1.4, 'y': -7.47}
Returns
int or None

The bin number

Notes

The bin number can be used to access the corresponding Bin, or the subbinning in that bin (if it exists):

i = binning.get_event_bin_index(event)
binning.bins[i]
binning.subbinnings[i]

This is not the same as the corresponding index in the data array if there are any subbinnings present.

This is a dumb method that just loops over all bins until it finds a fitting one. It should be replaced with something smarter for more specifig binning classes.

get_event_data_index(event)[source]

Get the data array index of the given event.

Returns None if the event does not belong to any bin.

Parameters
eventdict like

A dictionary (or similar object) with one value of each variable in the binning, e.g.:

{'x': 1.4, 'y': -7.47}
Returns
int or None

The bin number

get_event_subbins(event)[source]

Get the tuple of subbins of the event.

Returns None if the event does not fit in any bin.

Parameters
eventdict like

A dictionary (or similar object) with one value of each variable

in the binning, e.g.:

{'x': 1.4, 'y': -7.47}
Returns
([bin[, subbin[, subbin …]]) or None
get_subbins(data_index)[source]

Return a tuple of the bin and subbins corresponding to the data_index.

Returns
(bin[, subbin[, subbin …]])
get_sumw2_as_ndarray(shape=None, indices=None)[source]

Return the sum of squared weights in the bins as ndarray.

Parameters
shape: tuple of ints

Shape of the resulting array. Default: (len(bins),)

indices: list of ints

Only return the given bins. Default: Return all bins.

Returns
ndarray

An ndarray with the sum of squared weights of the bins.

get_values_as_ndarray(shape=None, indices=None)[source]

Return the bin values as ndarray.

Parameters
shape: tuple of ints

Shape of the resulting array. Default: (len(bins),)

indices: list of ints

Only return the given bins. Default: Return all bins.

Returns
ndarray

An ndarray with the values of the bins.

insert_subbinning(bin_index, binning)[source]

Insert a new subbinning into the binning.

Parameters
bin_indexint

The bin to be replaced with the subbinning.

binningBinning

The new subbinning

Returns
new_binningBinning

A copy of this binning with the new subbinning.

Warning

This will replace the content of the bin with the content of the new subbinning!

insert_subbinning_on_ndarray(array, bin_index, insert_array)[source]

Insert values of a new subbinning into the array.

Parameters
arrayndarray

The data to work on.

bin_indexint

The bin to be replaced with the subbinning.

insert_arrayndarrau

The array to be inserted.

Returns
new_arrayndarray

The modified array.

is_dummy()[source]

Return True if there is no data array linked to this binning.

iter_subbins()[source]

Iterate over all bins and subbins.

Will yield a tuple of the bins in this Binning and all subbinnings in the order they correspond to the data indices.

Yields
(bin[, subbin[, subbin …]])

Link the data storage arrays into the bins and sub_binnings.

marginalize_subbinnings(bin_indices=None)[source]

Return a clone of the Binning with subbinnings removed.

Parameters
bin_indiceslist of int, optional

The bin indices of the subbinnings to be marginalized. If no indices are specified, all subbinnings are marginalized.

Returns
new_binningBinning
marginalize_subbinnings_on_ndarray(array, bin_indices=None)[source]

Marginalize out the bins corresponding to the subbinnings.

Parameters
arrayndarray

The data to work on.

bin_indiceslist of int, optional

The bin indices of the subbinnings to be marginalized. If no indices are specified, all subbinnings are marginalized.

Returns
new_arrayndarray
reset(value=0.0, entries=0, sumw2=0.0)[source]

Reset all bin values to 0.

Parameters
valuefloat, optional

Set the bin values to this value.

entriesint, optional

Set the number of entries in each bin to this value.

sumw2float, optional

Set the sum of squared weights in each bin to this value.

set_entries_from_ndarray(arr)[source]

Set the number of bin entries to the values of the ndarray.

set_sumw2_from_ndarray(arr)[source]

Set the sums of squared weights to the values of the ndarray.

set_values_from_ndarray(arr)[source]

Set the bin values to the values of the ndarray.

classmethod to_yaml(dumper, obj)[source]

Convert a Python object to a representation node.

yaml_loader

alias of yaml.loader.FullLoader