uproot.dask
Defined in uproot._dask on line 31.
- uproot._dask.dask(files, *, filter_name=<function no_filter>, filter_typename=<function no_filter>, filter_branch=<function no_filter>, recursive=True, full_paths=False, step_size=uproot._util.unset, steps_per_file=uproot._util.unset, library='ak', ak_add_doc=False, custom_classes=None, allow_missing=False, open_files=True, form_mapping=None, allow_read_errors_with_report=False, known_base_form=None, decompression_executor=None, interpretation_executor=None, **options)
- Parameters:
files – See below.
filter_name (None, glob string, regex string in
"/pattern/i"
syntax, function of str → bool, or iterable of the above) – A filter to selectTBranches
by name.filter_typename (None, glob string, regex string in
"/pattern/i"
syntax, function of str → bool, or iterable of the above) – A filter to selectTBranches
by type.filter_branch (None or function of uproot.TBranch → bool, uproot.interpretation.Interpretation, or None) – A filter to select
TBranches
using the full uproot.TBranch object. If the function returns False or None, theTBranch
is excluded; if the function returns True, it is included with its standard interpretation; if an uproot.interpretation.Interpretation, this interpretation overrules the standard one.recursive (bool) – If True, include all subbranches of branches as separate fields; otherwise, only search one level deep.
full_paths (bool) – If True, include the full path to each subbranch with slashes (
/
); otherwise, use the descendant’s name as the field name.step_size (int or str) – If an integer, the maximum number of entries to include in each chunk/partition; if a string, the maximum memory_size to include in each chunk/partition. The string must be a number followed by a memory unit, such as “100 MB”. Mutually incompatible with steps_per_file: only set step_size or steps_per_file, not both. Cannot be used with
open_files=False
.steps_per_file (int, default 1) – Subdivide files into the specified number of chunks/partitions. Mutually incompatible with step_size: only set step_size or steps_per_file, not both. If both
step_size
andsteps_per_file
are unset,steps_per_file
’s default value of 1 (whole file per chunk/partition) is used, regardless ofopen_files
.library (str or uproot.interpretation.library.Library) – The library that is used to represent arrays. If
library='np'
it returns a dict of dask arrays and iflibrary='ak'
it returns a single dask-awkward array.library='pd'
has not been implemented yet and will raise aNotImplementedError
.ak_add_doc (bool) – If True and
library="ak"
, add the TBranchtitle
to the Awkward__doc__
parameter of the array.custom_classes (None or dict) – If a dict, override the classes from the uproot.ReadOnlyFile or
uproot.classes
.allow_missing (bool) – If True, skip over any files that do not contain the specified
TTree
.open_files (bool) – If True (default), the function will open the files to read file metadata, i.e. only the main data read is delayed till the compute call on the dask collections. If False, the opening of the files and reading the metadata is also delayed till the compute call. In this case, branch-names are inferred by opening only the first file.
form_mapping (Callable[awkward.forms.Form] -> awkward.forms.Form | None) – If not none and library=”ak” then apply this remapping function to the awkward form of the input data. The form keys of the desired form should be available data in the input form.
allow_read_errors_with_report (bool or tuple of exceptions) – If True, catch OSError exceptions and return an empty array for these nodes in the task graph. If a tuple, catch any of those exceptions and return empty arrays for those nodes. In either of those cases, The return of this function becomes a two element tuple, where the first return is the dask-awkward collection of interest and the second return is a report dask-awkward collection.
known_base_form (awkward.forms.Form | None) – If not none use this form instead of opening one file to determine the dataset’s form. Only available with open_files=False.
decompression_executor (None or Executor with a
submit
method) – The executor that is used to decompressTBaskets
; if None, a uproot.TrivialExecutor is created. Executors attached to a file areshutdown
when the file is closed.interpretation_executor (None or Executor with a
submit
method) – The executor that is used to interpret uncompressedTBasket
data as arrays; if None, a uproot.TrivialExecutor is created. Executors attached to a file areshutdown
when the file is closed.options – See below.
Returns dask equivalents of the backends supported by uproot. If
library='np'
, the function returns a Python dict of dask arrays. Iflibrary='ak'
, the function returns a single dask-awkward array.For example:
>>> uproot.dask(root_file) dask.awkward<from-uproot, npartitions=1> >>> uproot.dask(root_file,library='np') {'Type': dask.array<Type-from-uproot, shape=(2304,), dtype=object, chunksize=(2304,), chunktype=numpy.ndarray>, ...}
Allowed types for the
files
parameter:str/bytes: relative or absolute filesystem path or URL, without any colons other than Windows drive letter or URL schema. Examples:
"rel/file.root"
,"C:\abs\file.root"
,"http://where/what.root"
str/bytes: same with an object-within-ROOT path, separated by a colon. Example:
"rel/file.root:tdirectory/ttree"
pathlib.Path: always interpreted as a filesystem path or URL only (no object-within-ROOT path), regardless of whether there are any colons. Examples:
Path("rel:/file.root")
,Path("/abs/path:stuff.root")
glob syntax in str/bytes and pathlib.Path. Examples:
Path("rel/*.root")
,"/abs/*.root:tdirectory/ttree"
dict: keys are filesystem paths, values are objects-within-ROOT paths. Example:
{"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}
dict: keys are filesystem paths, values are dicts containing objects-within-ROOT and steps (chunks/partitions) as a list of starts and stops or steps as a list of offsets Example:
- {{“/data_v1/tree1.root”: {“object_path”: “ttree_v1”, “steps”: [[0, 10000], [15000, 20000], …]},
“/data_v1/tree2.root”: {“object_path”: “ttree_v1”, “steps”: [0, 10000, 20000, …]}}}
(This
files
pattern is incompatible withstep_size
andsteps_per_file
.)already-open TTree objects.
iterables of the above.
Options (type; default):
handler (uproot.source.chunk.Source class; None)
timeout (float for HTTP, int for XRootD; 30)
max_num_elements (None or int; None)
num_workers (int; 1)
use_threads (bool; False on the emscripten platform (i.e. in a web browser), else True)
num_fallback_workers (int; 10)
begin_chunk_size (memory_size; 403, the smallest a ROOT file can be)
minimal_ttree_metadata (bool; True)
Other file entry points:
uproot.open: opens one file to read any of its objects.
uproot.iterate: iterates through chunks of contiguous entries in
TTrees
.uproot.concatenate: returns a single concatenated array from
TTrees
.uproot.dask (this function): returns an unevaluated Dask array from
TTrees
.