The LHC's Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are ...starting already. At the beginning of LHC Run 1, all ROOT data was compressed with the ZLIB algorithm; since then, ROOT has added support for additional algorithms such as LZMA and LZ4, each with unique strengths. This work must continue as industry introduces new techniques - ROOT can benefit saving disk space or reducing the I/O and bandwidth for online and offline needs of experiments by introducing better compression algorithms. In addition to alternate algorithms, we have been exploring alternate techniques to improve parallelism and apply pre-conditioners to the serialized data. We have performed a survey of the performance of the new compression techniques. Our survey includes various use cases of data compression of ROOT files provided by different LHC experiments. We also provide insight into solutions applied to resolve bottlenecks in compression algorithms, resulting in improved ROOT performance.
ROOT provides an flexible format used throughout the HEP community. The number of use cases - from an archival data format to end-stage analysis - has required a number of tradeoffs to be exposed to ...the user. For example, a high "compression level" in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU time when read). At the scale of the LHC experiment, poor design choices can result in terabytes of wasted space or wasted CPU time. We explore and attempt to quantify some of these tradeoffs. Specifically, we explore: the use of alternate compressing algorithms to optimize for read performance; an alternate method of compressing individual events to allow efficient random access; and a new approach to whole-file compression. Quantitative results are given, as well as guidance on how to make compression decisions for different use cases.
Evolution of ROOT package management Shadura, O; Bockelman, B; Vassilev, V
Journal of physics. Conference series,
04/2020, Volume:
1525, Issue:
1
Journal Article
Peer reviewed
Open access
ROOT is a large code base with a complex set of build-time dependencies; there is a significant difference in compilation time between the "core" of ROOT and the full-fledged deployment. We present ...results on a "delayed build" for internal ROOT packages and external packages. This gives the ability to offer a "lightweight" core of ROOT, later extended by building additional modules to extend the functionality of ROOT. As a part of this work, we have improved the separation of ROOT code into distinct modules and packages with minimal dependencies. This approach gives users better flexibility and the possibility to combine various build features without rebuilding from scratch. Dependency hell is a common problem found in software and particularly in HEP software ecosystem. We would like to discuss an improvement of artifact management ("lazy-install") system as a solution to the "dependency hell" problem. HEP software stack usually consists of multiple sub-projects with dependencies. The development model is often distributed, independent and non-coherent among the sub-projects. We believe that software should be designed to take advantage of other software components that are already available, or have already been designed and implemented for use elsewhere rather than "reinventing the wheel". The main idea is to build the ROOT project and all of its dependencies recursively and incrementally, making it fundamentally different than just adding one external project and rebuilding from scratch. In addition, this allows to keep a list of dependencies to be able to resolve possible incompatibility of transitive dependencies caused by the versions conict. In our contribution, we will present our approach to artifact management system of ROOT together with a set of examples and use cases.
Speeding HEP Analysis with ROOT Bulk I/O Bockelman, B; Zhang, Z; Shadura, O
Journal of physics. Conference series,
04/2020, Volume:
1525, Issue:
1
Journal Article
Peer reviewed
Open access
Distinct HEP workflows have distinct I/O needs; while ROOT I/O excels at serializing complex C++ objects common to reconstruction, analysis workflows typically have simpler objects and can sustain ...higher event rates. To meet these workflows, we have developed a "bulk I/O" interface, allowing multiple events' data to be returned per library call. This reduces ROOT-related overheads and increases event rates - orders-of-magnitude improvements are shown in microbenchmarks. Unfortunately, this bulk interface is difficult to use as it requires users to identify when it is applicable and they still "think" in terms of events, not arrays of data. We have integrated the bulk I/O interface into the new RDataFrame analysis framework inside ROOT. As RDataFrame's interface can provide improved type information, the framework itself can determine what data is readable via the bulk IO and automatically switch between interfaces. We demonstrate how this can improve event rates when reading analysis data formats, such as CMS's NanoAOD.
The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an ...experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform their end-stage analysis, reading from intermediate files they generate from experiment data. RIO is thus incredibly flexible: it must serve as a file format for archival (optimized for space) and for working data (optimized for read speed). To date, most of the technical work has focused on improving the former use case. We present work designed to help improve RIO for analysis. We analyze the real-world impact of a compression algorithm LZ4 to decrease decompression times (and the corresponding cost in disk space). We introduce new APIs that read RIO data in bulk, removing the per-event overhead of a C++ function call. We compare the performance with the existing RIO APIs for simple structure data and show how this can be complimentary with efforts to improve the parallelism of the RIO stack.
Over the past few years, Grid Computing technologies have reached a high level of maturity. One key aspect of this success has been the development and adoption of newer Compute Elements to interface ...the external Grid users with local batch systems. These new Compute Elements allow for better handling of jobs requirements and a more precise management of diverse local resources. However, despite this level of maturity, the Grid Computing world is lacking diversity in local execution platforms. As Grid Computing technologies have historically been driven by the needs of the High Energy Physics community, most resource providers run the platform (operating system version and architecture) that best suits the needs of their particular users. In parallel, the development of virtualization and cloud technologies has accelerated recently, making available a variety of solutions, both commercial and academic, proprietary and open source. Virtualization facilitates performing computational tasks on platforms not available at most computing sites. This work attempts to join the technologies, allowing users to interact with computing sites through one of the standard Computing Elements, HTCondor-CE, but running their jobs within VMs on a local cloud platform, OpenStack, when needed. The system will re-route, in a transparent way, end user jobs into dynamically-launched VM worker nodes when they have requirements that cannot be satisfied by the static local batch system nodes. Also, once the automated mechanisms are in place, it becomes straightforward to allow an end user to invoke a custom Virtual Machine at the site. This will allow cloud resources to be used without requiring the user to establish a separate account. Both scenarios are described in this work.
The HTCondor-CE is the next-generation gateway software for the Open Science Grid (OSG). This is responsible for providing a network service which authorizes remote users and provides a resource ...provisioning service (other well-known gateways include Globus GRAM, CREAM, Arc-CE, and Openstacks Nova). Based on the venerable HTCondor software, this new CE is simply a highly-specialized configuration of HTCondor. It was developed and adopted to provide the OSG with a more flexible, scalable, and easier-to-manage gateway software. Further, the focus of the HTCondor-CE is not job submission (as in GRAM or CREAM) but resource provisioning. This software does not exist in a vacuum: to deploy this gateway across the OSG, we had to integrate it with the CE configuration, deploy a corresponding information service, coordinate with sites, and overhaul our documentation.
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent ...parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further emphasised this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O. In this contribution we highlight the improvements of the ROOT I/O subsystem which targeted a satisfactory scaling behaviour in a multithreaded context. The effect of parallelism on the individual steps which are chained by ROOT to read and write data, namely (de)compression, (de)serialisation, access to storage backend, are discussed. Performance measurements are discussed through real life examples coming from CMS production workflows on traditional server platforms and highly parallel architectures such as Intel Xeon Phi.
The Worldwide LHC Computing Grid (WLCG) is the largest grid computing infrastructure in the world pooling the resources of 170 computing centers (sites). One advantage of grid computing is that ...multiple copies of data can be distributed across different sites, allowing user access that is independent of geographic location or software. Each site is able to communicate using software stacks collectively referred to as "middleware". One key middleware piece is the storage element (SE), which provides remote POSIX-like access to a site's storage. The middleware stack managed by the Open Science Grid (OSG) used a storage resource manager (SRM) protocol implementation that, among other things, allowed sites to load-balance servers providing the Grid File Transfer Protocol (GridFTP) interface. OSG is eliminating the use of an SRM entirely and is transitioning to a solution based solely on GridFTP load-balanced at the network level with Linux Virtual Server (LVS). LVS is a core component of the Linux kernel, so this change increases both maintainability and reduces complexity of the site. In this document, we outline our methodologies and results from the large scale testing of an LVS+GridFTP cluster for data reads. Additionally, we discuss potential optimizations to the cluster to maximize total throughput.
The main goal of the project to demonstrate the ability of using HTTP data federations in a manner analogous to the existing AAA infrastructure of the CMS experiment. An initial testbed at Caltech ...has been built and changes in the CMS software (CMSSW) are being implemented in order to improve HTTP support. The testbed consists of a set of machines at the Caltech Tier2 that improve the support infrastructure for data federations at CMS. As a first step, we are building systems that produce and ingest network data transfers up to 80 Gbps. In collaboration with AAA, HTTP support is enabled at the US redirector and the Caltech testbed. A plugin for CMSSW is being developed for HTTP access based on the DaviX software. It will replace the present fork/exec or curl for HTTP access. In addition, extensions to the XRootD HTTP implementation are being developed to add functionality to it, such as client-based monitoring identifiers. In the future, patches will be developed to better integrate HTTP-over-XRootD with the Open Science Grid (OSG) distribution. First results of the transfer tests using HTTP are presented in this paper together with details about the initial setup.