.. _examplePD: ======================================================== Example PD -- Advanced data loading with pandas and ROOT ======================================================== Aims ==== * Use pandas :class:`DataFrame` to fill a :class:`.Binning` * Use uproot to load ROOT files and fill them into a :class:`.Binning` Instructions ============ Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language: https://pandas.pydata.org/ It provides a :class:`DataFrame` class, which is a useful tool to organise structured data:: from remu import binning from remu import plotting import numpy as np import pandas as pd pd.set_option('display.max_rows', 10) px = np.random.randn(1000)*20 py = np.random.randn(1000)*20 pz = np.random.randn(1000)*20 df = pd.DataFrame({'px': px, 'py': py, 'pz': pz}) print(df) .. include:: df.txt :literal: ReMU supports :class:`DataFrame` objects as inputs for all :meth:`fill<.Binning.fill>` methods:: with open("muon-binning.yml", 'r') as f: muon_binning = yaml.full_load(f) muon_binning.fill(df) pltr = plotting.get_plotter(muon_binning, ['py','pz'], ['px']) pltr.plot_values() pltr.savefig("pandas.png") .. image:: pandas.png This way, ReMU supports the same input file formats as the pandas library, e.g. CSV, JSON, HDF5, SQL, etc.. Using the uproot library, pandas can also be used to load ROOT files: https://github.com/scikit-hep/uproot5 The ROOT framework is the de-facto standard for data analysis in high energy particle physics: https://root.cern.ch/ Uproot does *not* need the actual ROOT framework to be installed to work. It can convert a flat ROOT :class:`TTree` directly into a usable pandas :class:`DataFrame`:: import uproot flat_tree = uproot.open("Zmumu.root")['events'] print(flat_tree.keys()) .. include:: flat_keys.txt :literal: :: df = flat_tree.arrays(library="pd") print(df) .. include:: flat_df.txt :literal: :: muon_binning.reset() muon_binning.fill(df, rename={'px1': 'px', 'py1': 'py', 'pz1': 'pz'}) pltr = plotting.get_plotter(muon_binning, ['py','pz'], ['px']) pltr.plot_values() pltr.savefig("flat_muons.png") .. image:: flat_muons.png ReMU expects exactly one row per event. If the root file is not flat, but has a more complicated structure, it must be converted first. For example, let us take a look at a file where each event has varying numbers of reconstructed particles:: structured_tree = uproot.open("HZZ.root")['events'] print(structured_tree.keys()) .. include:: structured_keys.txt :literal: :: df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library='pd') print(df) .. include:: structured_df.txt :literal: This kind of data frame with "lists" as cell elements can be inconvenient to handle. But we can flatten it using the power of the `awkward`:: import awkward as ak arr = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"]) df = ak.to_dataframe(arr) print(df) .. include:: flattened_df.txt :literal: This double-index structure is still not suitable as input for ReMU, though. We can select only the first muon in each event, to get the required "one event per row" structure:: idx = pd.IndexSlice df = df.loc[idx[:,0], :] print(df) .. include:: sliced_df.txt :literal: :: muon_binning.reset() muon_binning.fill(df, rename={'Muon_Px': 'px', 'Muon_Py': 'py', 'Muon_Pz': 'pz'}) pltr = plotting.get_plotter(muon_binning, ['py','pz'], ['px']) pltr.plot_values() pltr.savefig("sliced_muons.png") .. image:: sliced_muons.png