uproot.dask

Defined in uproot._dask on line 10.

uproot._dask.dask(files, *, filter_name=<function no_filter>, filter_typename=<function no_filter>, filter_branch=<function no_filter>, recursive=True, full_paths=False, step_size='100 MB', library='ak', ak_add_doc=False, custom_classes=None, allow_missing=False, open_files=True, form_mapping=None, **options)
Parameters:
  • files – See below.

  • filter_name (None, glob string, regex string in "/pattern/i" syntax, function of str → bool, or iterable of the above) – A filter to select TBranches by name.

  • filter_typename (None, glob string, regex string in "/pattern/i" syntax, function of str → bool, or iterable of the above) – A filter to select TBranches by type.

  • filter_branch (None or function of uproot.TBranch → bool, uproot.interpretation.Interpretation, or None) – A filter to select TBranches using the full uproot.TBranch object. If the function returns False or None, the TBranch is excluded; if the function returns True, it is included with its standard interpretation; if an uproot.interpretation.Interpretation, this interpretation overrules the standard one.

  • recursive (bool) – If True, include all subbranches of branches as separate fields; otherwise, only search one level deep.

  • full_paths (bool) – If True, include the full path to each subbranch with slashes (/); otherwise, use the descendant’s name as the field name.

  • step_size (int or str) – If an integer, the maximum number of entries to include in each chunk; if a string, the maximum memory_size to include in each chunk. The string must be a number followed by a memory unit, such as “100 MB”.

  • library (str or uproot.interpretation.library.Library) – The library that is used to represent arrays. If library='np' it returns a dict of dask arrays and if library='ak' it returns a single dask-awkward array. library='pd' has not been implemented yet and will raise a NotImplementedError.

  • ak_add_doc (bool) – If True and library="ak", add the TBranch title to the Awkward __doc__ parameter of the array.

  • custom_classes (None or dict) – If a dict, override the classes from the uproot.ReadOnlyFile or uproot.classes.

  • allow_missing (bool) – If True, skip over any files that do not contain the specified TTree.

  • open_files (bool) – If True (default), the function will open the files to read file metadata, i.e. only the main data read is delayed till the compute call on the dask collections. If False, the opening of the files and reading the metadata is also delayed till the compute call. In this case, branch-names are inferred by opening only the first file.

  • form_mapping (Callable[awkward.forms.Form] -> awkward.forms.Form | None) – If not none and library=”ak” then apply this remapping function to the awkward form of the input data. The form keys of the desired form should be available data in the input form.

  • options – See below.

Returns dask equivalents of the backends supported by uproot. If library='np', the function returns a Python dict of dask arrays. If library='ak', the function returns a single dask-awkward array.

For example:

>>> uproot.dask(root_file)
dask.awkward<from-uproot, npartitions=1>
>>> uproot.dask(root_file,library='np')
{'Type': dask.array<Type-from-uproot, shape=(2304,), dtype=object, chunksize=(2304,), chunktype=numpy.ndarray>, ...}

This function (naturally) depends on Dask. To use it with library="np":

# with pip
pip install "dask[complete]"
# or with conda
conda install dask

For using library='ak'

pip install dask-awkward   # not on conda-forge yet

Allowed types for the files parameter:

  • str/bytes: relative or absolute filesystem path or URL, without any colons other than Windows drive letter or URL schema. Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"

  • str/bytes: same with an object-within-ROOT path, separated by a colon. Example: "rel/file.root:tdirectory/ttree"

  • pathlib.Path: always interpreted as a filesystem path or URL only (no object-within-ROOT path), regardless of whether there are any colons. Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")

  • glob syntax in str/bytes and pathlib.Path. Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"

  • dict: keys are filesystem paths, values are objects-within-ROOT paths. Example: {{"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}}

  • already-open TTree objects.

  • iterables of the above.

Options (type; default):

Other file entry points:

  • uproot.open: opens one file to read any of its objects.

  • uproot.iterate: iterates through chunks of contiguous entries in TTrees.

  • uproot.concatenate: returns a single concatenated array from TTrees.

  • uproot.dask (this function): returns an unevaluated Dask array from TTrees.