Parallel I/O

An essential aspect of uproot’s file-reader is that data Sources are completely distinct from Cursors, which track position in the source. This interface is similar to memory-mapped files, which do not track a position but respond as needed to requests for data by address, and it is unlike traditional file handles, which reference the source of data (integer linked to a file through syscalls) and a position within it (queried by seek and changed by tell) as an indivisible unit. By default, uproot reads data through memory-mapped files; all other sources are made to look like a memory-mapped file.

Throughout the ROOT I/O and TTree-handling modules, Sources and Cursors are passed as independent objects. A Cursor cannot read data without being given an explicit Source. When parts of a file are to be read in parallel, lightweight Cursors are duplicated, one per thread, while Sources are only duplicated (e.g. multiple file handles into the same file) if the source is not inherently thread-safe (as memory-mapped files are).

Even when not reading in parallel, copying a Cursor when passing it to a subroutine is a lightweight way to keep one’s place without the spaghetti of seek and tell commands to backtrack, as is often necessary in the ROOT file structure.

uproot.source.cursor.Cursor

class uproot.source.cursor.Cursor(index, origin=0, refs=None)

Maintain a position in a Source that updates as data are read.

Attributes, properties, and methods:

  • index (int) the position.
  • origin (int) “beginning of buffer” position, used in the refs key in uproot.rootio._readobjany.
  • refs (None or dict-like) manages cross-references in uproot.rootio._readobjany.
  • copied return a copy of this Cursor with modifications.
  • skipped return a copy of this Cursor with the index moved forward.
  • skip move the index of this Cursor forward.
  • fields interpret bytes in the Source with given data types and skip the index past them.
  • field interpret bytes in the Source with a given data type and skip the index past it.
  • bytes return a range of bytes from the Source and skip the index past it.
  • array return a range of bytes from the Source as a typed Numpy array and skip the index past it.
  • string read a string from the Source, interpreting the first 1 or 5 bytes as a size and skip the index past it.
  • cstring read a null-terminated string from the Source and skip the index past it.
  • skipstring interpret the first 1 or 5 bytes as a size and skip the index past the string (without creating a Python string).
  • hexdump view a section of the Source as formatted by the POSIX hexdump program and do not move the index.
Parameters:
  • index (int) – the initial index.
  • origin (int) – the origin, (default is 0).
  • refs (None or dict-like) – if None (default), use a new dict as the ref; otherwise, use the value provided.
Cursor.copied(index=None, origin=None, refs=None)

Return a copy of this Cursor with modifications.

Parameters:
  • index (None or int) – if not None (default), use this as the new index position.
  • origin (None or int) – if not None (default), use this as the new origin.
  • refs (None or dict-like) – if not None (default), use this as the new refs.
Returns:

the new cursor.

Return type:

Cursor

Notes

This is a shallow copy— the refs are shared with the parent and all other copies.

Cursor.skipped(numbytes, origin=None, refs=None)

Return a copy of this Cursor with the index moved forward.

Parameters:
  • numbytes (int) – number of bytes to be skipped in the copy, leaving the original unchanged.
  • origin (None or int) – if not None (default), use this as the new origin.
  • refs (None or dict-like) – if not None (default), use this as the new refs.
Returns:

the new cursor.

Return type:

Cursor

Notes

This is a shallow copy— the refs are shared with the parent and all other copies.

Cursor.skip(numbytes)

Move the index of this Cursor forward.

Parameters:numbytes (int) – number of bytes to skip
Cursor.fields(source, format)

Interpret bytes in the Source with given data types and skip the index past them.

Parameters:
  • source (Source) – data to be read.
  • format (struct.Struct) – compiled parser from Python’s struct library.
Returns:

field values (types determined by format)

Return type:

tuple

Cursor.field(source, format)

Interpret bytes in the Source with a given data type and skip the index past it.

Parameters:
  • source (Source) – data to be read.
  • format (struct.Struct) – compiled parser from Python’s struct library; must return only one field.
Returns:

field value

Return type:

type determined by format

Cursor.bytes(source, length)

Return a range of bytes from the Source and skip the index past it.

Parameters:
  • source (Source) – data to be read.
  • length (int) – number of bytes.
Returns:

raw view of data from source.

Return type:

numpy.ndarray of numpy.uint8

Cursor.array(source, length, dtype)

Return a range of bytes from the Source as a typed Numpy array and skip the index past it.

Parameters:
  • source (Source) – data to be read.
  • length (int) – number of items.
  • dtype (numpy.dtype) – type of the array.
Returns:

interpreted view of data from source.

Return type:

numpy.ndarray

Cursor.string(source)

Read a string from the Source, interpreting the first 1 or 5 bytes as a size and skip the index past it.

Parameters:source (Source) – data to be read.
Returns:Python string (bytes in Python 3).
Return type:bytes
Cursor.cstring(source)

Read a null-terminated string from the Source and skip the index past it.

The index is also skipped past the null that terminates the string.

Parameters:source (Source) – data to be read.
Returns:Python string (bytes in Python 3).
Return type:bytes
Cursor.skipstring(source)

Interpret the first 1 or 5 bytes as a size and skip the index past the string (without creating a Python string).

Parameters:source (Source) – data to be read.
Cursor.hexdump(source, size=160, offset=0, format='%02x')

View a section of the Source as formatted by the POSIX hexdump program and do not move the index.

This is much more useful than simply hexdumping the whole file, since partial interpretation is necessary to find the right point in the file to dump.

Parameters:
  • source (Source) – data to be read.
  • size (int) – number of bytes to view; default is 160 (10 lines).
  • offset (int) – where to start the view, relative to index; default is 0 (at index).
  • format (str) – Python’s printf-style format string for individual bytes; default is “%02x” (zero-prefixed, two-character hexidecimal).
Returns:

hexdump-formatted view to be printed

Return type:

str

uproot.source.source.Source

class uproot.source.source.Source(data)

Interface for data sources.

Sources do not need to inherit from this class, but they do need to satisfy the interface described below.

parent(self)
return the Source from which this was copied; may be None.
threadlocal(self)
either return self (if thread-safe) or return a thread-safe copy, such as a new file handle into the same file.
dismiss(self)
thread-local copies are no longer needed; they may be eliminated if redundant.
data(self, start, stop, dtype=None)
return a view of data from the starting byte (inclusive) to the stopping byte (exclusive), with a given Numpy type (numpy.uint8 if None).

uproot.FileSource

static FileSource.defaults(path)

Provide sensible defaults for a FileSource.

The default parameters are:

  • chunkbytes: 8*1024 (8 kB per chunk, the minimum that pages into memory if you try to read one byte on a typical Linux system).
  • limitbytes: 1024**2 (1 MB), a very modest amount of RAM.
Parameters:path (str) – local file path of the input file (it must not be moved during reading!).
Returns:a new file source.
Return type:FileSource
class uproot.source.file.FileSource(path, *args, **kwds)

Emulate a memory-mapped interface with traditional file handles, opening many if necessary.

FileSource objects avoid double-reading and many small reads by caching data in chunks. All thread-local copies of a FileSource share a ThreadSafeArrayCache to avoid double-reads across threads.

Parameters:
  • path (str) – local file path of the input file (it must not be moved during reading!).
  • chunkbytes (int) – number of bytes per chunk.
  • limitbytes (int) – maximum number of bytes to keep in the cache.

Notes

Methods implementing the Source interface are not documented here.

uproot.MemmapSource

static MemmapSource.defaults(path)

Provide sensible defaults for a MemmapSource.

This is a dummy function, as MemmapSource is not parameterizable. It exists to satisfy code symmetry.

Parameters:path (str) – local file path of the input file.
Returns:a new memory-mapped source.
Return type:MemmapSource
class uproot.source.memmap.MemmapSource(path)

Thin wrapper around a memory-mapped file, which already behaves like a Source.

Parameters:path (str) – local file path of the input file.

Notes

Methods implementing the Source interface are not documented here.

uproot.XRootDSource

static XRootDSource.defaults(path)

Provide sensible defaults for a XRootDSource.

The default parameters are:

  • chunkbytes: 8*1024 (8 kB per chunk).
  • limitbytes: 1024**2 (1 MB), a very modest amount of RAM.
Parameters:path (str) – remote file URL.
Returns:a new XRootD source.
Return type:XRootDSource
class uproot.source.xrootd.XRootDSource(path, *args, **kwds)

Emulate a memory-mapped interface with XRootD.

XRootD is already thread-safe, but provides no caching. XRootDSource objects avoid double-reading and many small reads by caching data in chunks. They are not duplicated when splitting into threads.

Parameters:
  • path (str) – remote file URL.
  • chunkbytes (int) – number of bytes per chunk.
  • limitbytes (int) – maximum number of bytes to keep in the cache.

Notes

Methods implementing the Source interface are not documented here.

uproot.source.compressed.CompressedSource

class uproot.source.compressed.Compression(fCompress)

Describe the compression of a compressed block.

Attributes, properties, and methods:

  • algo (int) algorithm code.
  • level (int) 0 is no compression, 1 is least, 9 is most.
  • algoname (str) algorithm expressed as a string: "zlib", "lzma", "old", or "lz4".
  • copy(algo=None, level=None) copy this Compression object, possibly changing a field.
  • decompress(source, cursor, compressedbytes, uncompressedbytes) decompress data from source at cursor, knowing the compressed and uncompressed size.
Parameters:fCompress (int) – ROOT fCompress field.
class uproot.source.compressed.CompressedSource(compression, source, cursor, compressedbytes, uncompressedbytes)

A Source for compressed data.

Decompresses on demand— without caching the result— so cache options in higher-level array functions are very important.

Ordinary users would never create a CompressedSource. They are produced when a TKey encounters a compressed value.

Parameters:
  • compression (Compression) – inherited description of the compression. Note that this is overridden by the first two bytes of the compressed block, which can disagree with the higher-level description and takes precedence.
  • source (Source) – the source in which compressed data may be found.
  • cursor (Cursor) – location in the source.
  • compressedbytes (int) – number of bytes after compression.
  • uncompressedbytes (int) – number of bytes before compression.