uproot.dask

Defined in uproot._dask on line 31.

uproot._dask.dask(files, *, filter_name=<function no_filter>, filter_typename=<function no_filter>, filter_branch=<function no_filter>, recursive=True, full_paths=False, step_size=uproot._util.unset, steps_per_file=uproot._util.unset, library='ak', ak_add_doc=False, custom_classes=None, allow_missing=False, open_files=True, form_mapping=None, allow_read_errors_with_report=False, known_base_form=None, decompression_executor=None, interpretation_executor=None, **options)
Parameters:
  • files – See below.

  • filter_name (None, glob string, regex string in "/pattern/i" syntax, function of str → bool, or iterable of the above) – A filter to select TBranches by name.

  • filter_typename (None, glob string, regex string in "/pattern/i" syntax, function of str → bool, or iterable of the above) – A filter to select TBranches by type.

  • filter_branch (None or function of uproot.TBranch → bool, uproot.interpretation.Interpretation, or None) – A filter to select TBranches using the full uproot.TBranch object. If the function returns False or None, the TBranch is excluded; if the function returns True, it is included with its standard interpretation; if an uproot.interpretation.Interpretation, this interpretation overrules the standard one.

  • recursive (bool) – If True, include all subbranches of branches as separate fields; otherwise, only search one level deep.

  • full_paths (bool) – If True, include the full path to each subbranch with slashes (/); otherwise, use the descendant’s name as the field name.

  • step_size (int or str) – If an integer, the maximum number of entries to include in each chunk/partition; if a string, the maximum memory_size to include in each chunk/partition. The string must be a number followed by a memory unit, such as “100 MB”. Mutually incompatible with steps_per_file: only set step_size or steps_per_file, not both. Cannot be used with open_files=False.

  • steps_per_file (int, default 1) – Subdivide files into the specified number of chunks/partitions. Mutually incompatible with step_size: only set step_size or steps_per_file, not both. If both step_size and steps_per_file are unset, steps_per_file’s default value of 1 (whole file per chunk/partition) is used, regardless of open_files.

  • library (str or uproot.interpretation.library.Library) – The library that is used to represent arrays. If library='np' it returns a dict of dask arrays and if library='ak' it returns a single dask-awkward array. library='pd' has not been implemented yet and will raise a NotImplementedError.

  • ak_add_doc (bool) – If True and library="ak", add the TBranch title to the Awkward __doc__ parameter of the array.

  • custom_classes (None or dict) – If a dict, override the classes from the uproot.ReadOnlyFile or uproot.classes.

  • allow_missing (bool) – If True, skip over any files that do not contain the specified TTree.

  • open_files (bool) – If True (default), the function will open the files to read file metadata, i.e. only the main data read is delayed till the compute call on the dask collections. If False, the opening of the files and reading the metadata is also delayed till the compute call. In this case, branch-names are inferred by opening only the first file.

  • form_mapping (Callable[awkward.forms.Form] -> awkward.forms.Form | None) – If not none and library=”ak” then apply this remapping function to the awkward form of the input data. The form keys of the desired form should be available data in the input form.

  • allow_read_errors_with_report (bool or tuple of exceptions) – If True, catch OSError exceptions and return an empty array for these nodes in the task graph. If a tuple, catch any of those exceptions and return empty arrays for those nodes. In either of those cases, The return of this function becomes a two element tuple, where the first return is the dask-awkward collection of interest and the second return is a report dask-awkward collection.

  • known_base_form (awkward.forms.Form | None) – If not none use this form instead of opening one file to determine the dataset’s form. Only available with open_files=False.

  • decompression_executor (None or Executor with a submit method) – The executor that is used to decompress TBaskets; if None, a uproot.TrivialExecutor is created. Executors attached to a file are shutdown when the file is closed.

  • interpretation_executor (None or Executor with a submit method) – The executor that is used to interpret uncompressed TBasket data as arrays; if None, a uproot.TrivialExecutor is created. Executors attached to a file are shutdown when the file is closed.

  • options – See below.

Returns dask equivalents of the backends supported by uproot. If library='np', the function returns a Python dict of dask arrays. If library='ak', the function returns a single dask-awkward array.

For example:

>>> uproot.dask(root_file)
dask.awkward<from-uproot, npartitions=1>
>>> uproot.dask(root_file,library='np')
{'Type': dask.array<Type-from-uproot, shape=(2304,), dtype=object, chunksize=(2304,), chunktype=numpy.ndarray>, ...}

Allowed types for the files parameter:

  • str/bytes: relative or absolute filesystem path or URL, without any colons other than Windows drive letter or URL schema. Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"

  • str/bytes: same with an object-within-ROOT path, separated by a colon. Example: "rel/file.root:tdirectory/ttree"

  • pathlib.Path: always interpreted as a filesystem path or URL only (no object-within-ROOT path), regardless of whether there are any colons. Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")

  • glob syntax in str/bytes and pathlib.Path. Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"

  • dict: keys are filesystem paths, values are objects-within-ROOT paths. Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}

  • dict: keys are filesystem paths, values are dicts containing objects-within-ROOT and steps (chunks/partitions) as a list of starts and stops or steps as a list of offsets Example:

    {{“/data_v1/tree1.root”: {“object_path”: “ttree_v1”, “steps”: [[0, 10000], [15000, 20000], …]},

    “/data_v1/tree2.root”: {“object_path”: “ttree_v1”, “steps”: [0, 10000, 20000, …]}}}

    (This files pattern is incompatible with step_size and steps_per_file.)

  • already-open TTree objects.

  • iterables of the above.

Options (type; default):

  • handler (uproot.source.chunk.Source class; None)

  • timeout (float for HTTP, int for XRootD; 30)

  • max_num_elements (None or int; None)

  • num_workers (int; 1)

  • use_threads (bool; False on the emscripten platform (i.e. in a web browser), else True)

  • num_fallback_workers (int; 10)

  • begin_chunk_size (memory_size; 403, the smallest a ROOT file can be)

  • minimal_ttree_metadata (bool; True)

Other file entry points:

  • uproot.open: opens one file to read any of its objects.

  • uproot.iterate: iterates through chunks of contiguous entries in TTrees.

  • uproot.concatenate: returns a single concatenated array from TTrees.

  • uproot.dask (this function): returns an unevaluated Dask array from TTrees.