RectangularBinning¶
-
class
remu.binning.
RectangularBinning
(**kwargs)¶ Bases:
remu.binning.Binning
Binning made exclusively out of RectangularBins
Parameters: - binedges : dict
Dictionary of bin edges for rectangular binning, e.g.:
{ 'x': [0, 1, 2, 50], 'y': (-float('inf'), 0, float('inf')), 'z': np.linspace(0,77,55), }
- include_upper : bool, optional
Make bins include upper edges instead of lower edges. Default: False
- variables : list of strings, optional
List that determines the order of the variables. Will be generated from binedges if not provided.
-
cartesian_product
(other)¶ Create the Cartesian product of two rectangular binnings.
The two binnings must not share any variables. The two binnings must have the same value of include_upper. The resulting binning is in the the variables of both binnings with the respective edges.
Parameters: - other : RectangularBinning
Returns: - RectangularBinning
-
event_in_binning
(event)¶ Check whether an event fits into any of the bins.
-
fill
(event, weight=1, raise_error=False, rename={})¶ Fill the events into their respective bins.
Parameters: - event : [iterable of] dict like or Numpy structured array or Pandas DataFrame
The event(s) to be filled into the binning.
- weight : float or iterable of floats, optional
The weight of the event(s). Can be either a scalar which is then used for all events or an iterable of weights for the single events. Default: 1.
- raise_error : bool, optional
Raise a ValueError if an event is not in the binning. Otherwise ignore the event. Default: False
- rename : dict, optional
Dict for translating event variable names to binning variable names. Default: {}, i.e. no translation
-
fill_from_csv_file
(*args, **kwargs)¶ Fill the binning with events from a CSV file.
Parameters: - filename : string or list of strings
The csv file with the data. Can be a list of filenames.
- weightfield : string, optional
The column with the event weights.
- weight : float or iterable of floats, optional
A single weight that will be applied to all events in the file. Can be an iterable with one weight for each file if filename is a list.
- rename : dict, optional
A dict with columns that should be renamed before filling:
{'csv_name': 'binning_name'}
- cut_function : function, optional
A function that modifies the loaded data before filling into the binning, e.g.:
cut_function(data) = data[ data['binning_name'] > some_threshold ]
This is done after the optional renaming.
- buffer_csv_files : bool, optional
Save the results of loading CSV files in temporary files that can be recovered if the same CSV file is loaded again. This speeds up filling multiple Binnings with the same CSV-files considerably! Default: False
- chunksize : int, optional
Load csv file in chunks of <chunksize> rows. This reduces the memory footprint of the loading operation, but can slow it down. Default: 10000
Notes
The file must be formated like this:
first_varname,second_varname,... <first_value>,<second_value>,... <first_value>,<second_value>,... <first_value>,<second_value>,... ...
For example:
x,y,z 1.0,2.1,3.2 4.1,2.0,2.9 3,2,1
All values are interpreted as floats. If weightfield is given, that field will be used as weigts for the event. Other keyword arguments are passed on to the Binning’s
fill()
method. If filename is a list, all elemets are handled recursively.
-
classmethod
fill_multiple_from_csv_file
(binnings, filename, weightfield=None, weight=1.0, rename={}, cut_function=<function <lambda>>, buffer_csv_files=False, chunksize=10000, **kwargs)¶ Fill multiple Binnings from the same csv file(s).
This method saves time, because the numpy array only has to be generated once. Other than the list of binnings to be filled, the (keyword) arguments are identical to the ones used by the instance method
fill_from_csv_file()
.
-
get_bin_number_tuple
(i_bin)¶ Translate the linear bin number of the event to a tuple of single variable bin numbers.
Turns this:
i_bin
into this:
(i_x, i_y, i_z)
The order of the indices in the tuple conforms to the order of variables. The bins are ordered row-major (C-style), i.e. increasing the bin number of the last variable by one increases the overall bin number also by one. The increments of the other variables depend on the number of bins in each variable.
-
get_entries_as_ndarray
(shape=None, indices=None)¶ Return the number of entries in the bins as ndarray.
Parameters: - shape: tuple of ints
Shape of the resulting array. Default:
(len(bins),)
- indices: list of ints
Only return the given bins. Default: Return all bins.
Returns: - ndarray
An ndarray with the numbers of entries of the bins.
-
get_event_bin
(event)¶ Get the bin of the event.
Returns None if the event does not fit in any bin.
Parameters: - event : dict like
A dictionary (or similar object) with one value of each variable
in the binning, e.g.:
{'x': 1.4, 'y': -7.47}
Returns: - Bin or None
The
Bin
object the event fits into.
-
get_event_bin_number
(event)¶ Get the bin number for a given event.
-
get_event_tuple
(event)¶ Get the variable index tuple for a given event.
-
get_sumw2_as_ndarray
(shape=None, indices=None)¶ Return the sum of squared weights in the bins as ndarray.
Parameters: - shape: tuple of ints
Shape of the resulting array. Default:
(len(bins),)
- indices: list of ints
Only return the given bins. Default: Return all bins.
Returns: - ndarray
An ndarray with the sum of squared weights of the bins.
-
get_tuple_bin_number
(i_var)¶ Translate a tuple of variable bin numbers to the linear bin number of the event.
Turns this:
(i_x, i_y, i_z)
into this:
i_bin
The order of the indices in the tuple must conform to the order of variables. The bins are ordered row-major (C-style), i.e. increasing the bin number of the last variable by one increases the overall bin number also by one. The increments of the other variables depend on the number of bins in each variable.
-
get_values_as_ndarray
(shape=None, indices=None)¶ Return the bin values as ndarray.
Parameters: - shape: tuple of ints
Shape of the resulting array. Default:
(len(bins),)
- indices: list of ints
Only return the given bins. Default: Return all bins.
Returns: - ndarray
An ndarray with the values of the bins.
-
marginalize
(variables, reduction_function=<function sum>)¶ Marginalize out the given variables and return a new RectangularBinning.
Parameters: - variables : iterable of strings
Iterable of variable names to be marginalized out.
- reduction_function : function
Use this function to marginalize out the entries over the specified variables. Must support the axis keyword argument. Default: numpy.sum
-
plot_entries
(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶ Plot the binnings entries.
See
plot_ndarray()
for a description of possible parameters.
-
plot_ndarray
(filename, arr, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, reduction_function=<function sum>, denominator=None, sqrt_errors=False, error_xoffset=0.0, error_band=False, legendprop={}, no_plot=False)¶ Plot a visual representation of an array containing the entries or values of the binning.
Parameters: - filename : string or None
The target filename of the plot. If None, the plot fill not be saved to disk. This is only useful with the figax option.
- arr : ndarray
The array containing the data to be plotted. If the data contains more than one set of bin values (ndim==2), the mean value and standard deviation are plotted.
- variables : optional
One of the following:
- list of strings
List of variables to plot marginal histograms for.
- None
Plot marginal histograms for all variables.
- (list of strings, list of strings)
Plot 2D histograms of the cartesian product of the two variable lists. 2D histograms where both variables are identical are plotted as 1D histograms.
- (None, None)
Plot 2D histograms of all possible variable combinations. 2D histograms where both variables are identical are plotted as 1D histograms.
Default: None
- divide : bool, optional
Divide the bin content by the bin size before plotting.
- kwargs1d, kwargs2d : dict, optional
Additional keyword arguments for the 1D/2D histograms. If the key label is present, a legend will be drawn.
- legendprop : dict, optional
Additional prop arguments for the legend.
- figax : tuple of (Figure, list of list of Axis), optional
Pair of figure and axes to be used for plotting. Can be used to plot multiple binnings on top of one another. Default: Create new figure and axes.
- reduction_function : function, optional
Use this function to marginalize out variables. Default: numpy.sum
- denominator : ndarray, optional
A second array can be provided as a denominator. It is projected the same way arr is prior to dividing.
- sqrt_errors : bool, optional
Plot sqrt(n) error bars. Overrides the plotting of mean and std in case of 2D arrays.
- error_xoffset : float, optional
Shifts the error bars in the x direction away from the bin centres.
- error_band : bool or ‘step’, optional
Fill area instead of drawing error bars.
- no_plot : bool, optional
Do not plot anything, just create the figure and axes.
Returns: - fig : Figure
The Figure that was used for plotting.
- ax : list of list of Axis
The axes that were used for plotting.
-
plot_sumw2
(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶ Plot the binnings sum of squared weights sumw2.
See
plot_ndarray()
for a description of possible parameters.
-
plot_values
(filename, variables=None, divide=True, kwargs1d={}, kwargs2d={}, figax=None, **kwargs)¶ Plot the binnings values.
See
plot_ndarray()
for a description of possible parameters.
-
project
(variables, **kwargs)¶ Project the binning onto the given variables and return a new RectangularBinning.
The variable order of the original binning is preserved.
Parameters: - variables : iterable of strings
Iterable of variable names on which to project the binning.
- kwargs : optional
Additional keyword arguments are passed on to
marginalize()
.
Returns: - RectangularBinning
-
rebin
(remove_binedges)¶ Return a new RectangularBinning with the given bin edges removed.
The values of the bins adjacent to the removed bin edges will be summed up in the resulting larger bin. Please note that bin values are lost if the first or last binedge of a variable are removed.
Parameters: - remove_binedges : dict of lists of integers
A dictionary specifying the bin edge indeices of each variable that should be removed. Binning variables that are not part of the dictionary are kept as is. E.g. if you want to remove bin edge 2 in
var_A
and bin edges 3, 4 and 7 invar_C
:remove_binedges = { 'var_A': [2], 'var_B': [3, 4, 7] }
-
reset
(value=0.0, entries=0, sumw2=0.0)¶ Reset all bin values to 0.
Parameters: - value : float, optional
Set the bin values to this value.
- entries : int, optional
Set the number of entries in each bin to this value.
- sumw2 : float, optional
Set the sum of squared weights in each bin to this value.
-
set_entries_from_ndarray
(arr)¶ Set the number of bin entries to the values of the ndarray.
-
set_sumw2_from_ndarray
(arr)¶ Set the sums of squared weights to the values of the ndarray.
-
set_values_from_ndarray
(arr)¶ Set the bin values to the values of the ndarray.
-
slice
(variable_slices, return_indices=False)¶ Return a new RectangularBinning containing the given variable slices
Parameters: - variable_slices : dict of slices
A dictionary specifying the bin slices of each variable. Binning variables that are not part of the dictionary are kept as is. E.g. if you want the slice of bin 2 in
var_A
and bins 1 through to the last invar_C
:variable_slices = { 'var_A': slice(2,3), 'var_C': slice(1,None) }
Please note that strides other than 1 are not supported.
- return_indices : bool, optional
If
True
, also return the indices of the new binning:new_values = binning.get_values_as_ndarray( shape=binning.nbins)[indices]
Returns: - sliced_binning : RectangularBinning
A Rectangular binning consisting of the specified slices.
- indices : list of ints, optional
The indices of the bins of the new RectangularBinning in the original Binning.