RectangularBinning¶

class remu.binning.RectangularBinning(**kwargs)¶

Bases: remu.binning.Binning

Binning made exclusively out of RectangularBins

Parameters:

binedges : dict

Dictionary of bin edges for rectangular binning, e.g.:

{
    'x': [0, 1, 2, 50],
    'y': (-float('inf'), 0, float('inf')),
    'z': np.linspace(0,77,55),
}

include_upper : bool, optional

Make bins include upper edges instead of lower edges. Default: False

variables : list of strings, optional

List that determines the order of the variables. Will be generated from binedges if not provided.

cartesian_product(other)¶

Create the Cartesian product of two rectangular binnings.

The two binnings must not share any variables. The two binnings must have the same value of include_upper. The resulting binning is in the the variables of both binnings with the respective edges.

Parameters:	other : RectangularBinning
Returns:	RectangularBinning

event_in_binning(event)¶: Check whether an event fits into any of the bins.

fill(event, weight=1, raise_error=False, rename={})¶

Fill the events into their respective bins.

Parameters:

event : [iterable of] dict like or Numpy structured array or Pandas DataFrame: The event(s) to be filled into the binning.
weight : float or iterable of floats, optional: The weight of the event(s). Can be either a scalar which is then used for all events or an iterable of weights for the single events. Default: 1.
raise_error : bool, optional: Raise a ValueError if an event is not in the binning. Otherwise ignore the event. Default: False
rename : dict, optional: Dict for translating event variable names to binning variable names. Default: {}, i.e. no translation

fill_from_csv_file(*args, **kwargs)¶

Fill the binning with events from a CSV file.

Parameters:

filename : string or list of strings

The csv file with the data. Can be a list of filenames.

weightfield : string, optional

The column with the event weights.

weight : float or iterable of floats, optional

A single weight that will be applied to all events in the file. Can be an iterable with one weight for each file if filename is a list.

rename : dict, optional

A dict with columns that should be renamed before filling:

{'csv_name': 'binning_name'}

cut_function : function, optional

A function that modifies the loaded data before filling into the binning, e.g.:

cut_function(data) = data[ data['binning_name'] > some_threshold ]

This is done after the optional renaming.

buffer_csv_files : bool, optional

Save the results of loading CSV files in temporary files that can be recovered if the same CSV file is loaded again. This speeds up filling multiple Binnings with the same CSV-files considerably! Default: False

chunksize : int, optional

Load csv file in chunks of <chunksize> rows. This reduces the memory footprint of the loading operation, but can slow it down. Default: 10000

Notes

The file must be formated like this:

first_varname,second_varname,...
<first_value>,<second_value>,...
<first_value>,<second_value>,...
<first_value>,<second_value>,...
...

For example:

x,y,z
1.0,2.1,3.2
4.1,2.0,2.9
3,2,1

All values are interpreted as floats. If weightfield is given, that field will be used as weigts for the event. Other keyword arguments are passed on to the Binning’s fill() method. If filename is a list, all elemets are handled recursively.

classmethod fill_multiple_from_csv_file(binnings, filename, weightfield=None, weight=1.0, rename={}, cut_function=<function <lambda>>, buffer_csv_files=False, chunksize=10000, **kwargs)¶

Fill multiple Binnings from the same csv file(s).

This method saves time, because the numpy array only has to be generated once. Other than the list of binnings to be filled, the (keyword) arguments are identical to the ones used by the instance method fill_from_csv_file().

get_bin_number_tuple(i_bin)¶

Translate the linear bin number of the event to a tuple of single variable bin numbers.

Turns this:

i_bin

into this:

(i_x, i_y, i_z)

The order of the indices in the tuple conforms to the order of variables. The bins are ordered row-major (C-style), i.e. increasing the bin number of the last variable by one increases the overall bin number also by one. The increments of the other variables depend on the number of bins in each variable.

get_entries_as_ndarray(shape=None, indices=None)¶

Return the number of entries in the bins as ndarray.

Parameters:	shape: tuple of ints Shape of the resulting array. Default: `(len(bins),)` indices: list of ints Only return the given bins. Default: Return all bins.
Returns:	ndarray An ndarray with the numbers of entries of the bins.

get_event_bin(event)¶

Get the bin of the event.

Returns None if the event does not fit in any bin.

Parameters:	event : dict like A dictionary (or similar object) with one value of each variable in the binning, e.g.: {'x': 1.4, 'y': -7.47}
Returns:	Bin or None The `Bin` object the event fits into.

get_event_bin_number(event)¶: Get the bin number for a given event.

get_event_tuple(event)¶: Get the variable index tuple for a given event.

get_sumw2_as_ndarray(shape=None, indices=None)¶

Return the sum of squared weights in the bins as ndarray.

Parameters:	shape: tuple of ints Shape of the resulting array. Default: `(len(bins),)` indices: list of ints Only return the given bins. Default: Return all bins.
Returns:	ndarray An ndarray with the sum of squared weights of the bins.

get_tuple_bin_number(i_var)¶

Translate a tuple of variable bin numbers to the linear bin number of the event.

Turns this:

(i_x, i_y, i_z)

into this:

i_bin

The order of the indices in the tuple must conform to the order of variables. The bins are ordered row-major (C-style), i.e. increasing the bin number of the last variable by one increases the overall bin number also by one. The increments of the other variables depend on the number of bins in each variable.

get_values_as_ndarray(shape=None, indices=None)¶

Return the bin values as ndarray.

Parameters:	shape: tuple of ints Shape of the resulting array. Default: `(len(bins),)` indices: list of ints Only return the given bins. Default: Return all bins.
Returns:	ndarray An ndarray with the values of the bins.

marginalize(variables, reduction_function=<function sum>)¶

Marginalize out the given variables and return a new RectangularBinning.

Parameters:	variables : iterable of strings Iterable of variable names to be marginalized out. reduction_function : function Use this function to marginalize out the entries over the specified variables. Must support the axis keyword argument. Default: numpy.sum

plot_entries(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶

Plot the binnings entries.

See plot_ndarray() for a description of possible parameters.

plot_ndarray(filename, arr, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, reduction_function=<function sum>, denominator=None, sqrt_errors=False, error_xoffset=0.0, error_band=False, legendprop={}, no_plot=False)¶

Plot a visual representation of an array containing the entries or values of the binning.

Parameters:

filename : string or None

The target filename of the plot. If None, the plot fill not be saved to disk. This is only useful with the figax option.

arr : ndarray

The array containing the data to be plotted. If the data contains more than one set of bin values (ndim==2), the mean value and standard deviation are plotted.

variables : optional

One of the following:

list of strings: List of variables to plot marginal histograms for.
None: Plot marginal histograms for all variables.
(list of strings, list of strings): Plot 2D histograms of the cartesian product of the two variable lists. 2D histograms where both variables are identical are plotted as 1D histograms.
(None, None): Plot 2D histograms of all possible variable combinations. 2D histograms where both variables are identical are plotted as 1D histograms.

Default: None

divide : bool, optional

Divide the bin content by the bin size before plotting.

kwargs1d, kwargs2d : dict, optional

Additional keyword arguments for the 1D/2D histograms. If the key label is present, a legend will be drawn.

legendprop : dict, optional

Additional prop arguments for the legend.

figax : tuple of (Figure, list of list of Axis), optional

Pair of figure and axes to be used for plotting. Can be used to plot multiple binnings on top of one another. Default: Create new figure and axes.

reduction_function : function, optional

Use this function to marginalize out variables. Default: numpy.sum

denominator : ndarray, optional

A second array can be provided as a denominator. It is projected the same way arr is prior to dividing.

sqrt_errors : bool, optional

Plot sqrt(n) error bars. Overrides the plotting of mean and std in case of 2D arrays.

error_xoffset : float, optional

Shifts the error bars in the x direction away from the bin centres.

error_band : bool or ‘step’, optional

Fill area instead of drawing error bars.

no_plot : bool, optional

Do not plot anything, just create the figure and axes.

Returns:

fig : Figure: The Figure that was used for plotting.
ax : list of list of Axis: The axes that were used for plotting.

plot_sumw2(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶

Plot the binnings sum of squared weights sumw2.

See plot_ndarray() for a description of possible parameters.

plot_values(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶

Plot the binnings values.

See plot_ndarray() for a description of possible parameters.

project(variables, **kwargs)¶

Project the binning onto the given variables and return a new RectangularBinning.

The variable order of the original binning is preserved.

Parameters:	variables : iterable of strings Iterable of variable names on which to project the binning. kwargs : optional Additional keyword arguments are passed on to `marginalize()`.
Returns:	RectangularBinning

rebin(remove_binedges)¶

Return a new RectangularBinning with the given bin edges removed.

The values of the bins adjacent to the removed bin edges will be summed up in the resulting larger bin. Please note that bin values are lost if the first or last binedge of a variable are removed.

Parameters:	remove_binedges : dict of lists of integers A dictionary specifying the bin edge indeices of each variable that should be removed. Binning variables that are not part of the dictionary are kept as is. E.g. if you want to remove bin edge 2 in `var_A` and bin edges 3, 4 and 7 in `var_C`: remove_binedges = { 'var_A': [2], 'var_B': [3, 4, 7] }

reset(value=0.0, entries=0, sumw2=0.0)¶

Reset all bin values to 0.

Parameters:	value : float, optional Set the bin values to this value. entries : int, optional Set the number of entries in each bin to this value. sumw2 : float, optional Set the sum of squared weights in each bin to this value.

set_entries_from_ndarray(arr)¶: Set the number of bin entries to the values of the ndarray.

set_sumw2_from_ndarray(arr)¶: Set the sums of squared weights to the values of the ndarray.

set_values_from_ndarray(arr)¶: Set the bin values to the values of the ndarray.

slice(variable_slices, return_indices=False)¶

Return a new RectangularBinning containing the given variable slices

Parameters:

variable_slices : dict of slices

A dictionary specifying the bin slices of each variable. Binning variables that are not part of the dictionary are kept as is. E.g. if you want the slice of bin 2 in var_A and bins 1 through to the last in var_C:

variable_slices = { 'var_A': slice(2,3), 'var_C': slice(1,None) }

Please note that strides other than 1 are not supported.

return_indices : bool, optional

If True, also return the indices of the new binning:

new_values = binning.get_values_as_ndarray(
                            shape=binning.nbins)[indices]

Returns:

sliced_binning : RectangularBinning: A Rectangular binning consisting of the specified slices.
indices : list of ints, optional: The indices of the bins of the new RectangularBinning in the original Binning.