ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class ...can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks — e.g. data mining in HEP — by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.
Program title: ROOT
Catalogue identifier: AEFA_v1_0
Program summary URL:
http://cpc.cs.qub.ac.uk/summaries/AEFA_v1_0.html
Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland
Licensing provisions: LGPL
No. of lines in distributed program, including test data, etc.: 3 044 581
No. of bytes in distributed program, including test data, etc.: 36 325 133
Distribution format: tar.gz
Programming language: C++
Computer: Intel i386, Intel x86-64, Motorola PPC, Sun Sparc, HP PA-RISC
Operating system: GNU/Linux, Windows XP/Vista, Mac OS X, FreeBSD, OpenBSD, Solaris, HP-UX, AIX
Has the code been vectorized or parallelized?: Yes
RAM:
>
55
Mbytes
Classification: 4, 9, 11.9, 14
Nature of problem: Storage, analysis and visualization of scientific data
Solution method: Object store, wide range of analysis algorithms and visualization methods
Additional comments: For an up-to-date author list see:
http://root.cern.ch/drupal/content/root-development-team and
http://root.cern.ch/drupal/content/former-root-developers
Running time: Depending on the data size and complexity of analysis algorithms
References:
1
http://root.cern.ch.
A new stable version (“production version”) v5.28.00 of ROOT 1 has been published 2. It features several major improvements in many areas, most noteworthy data storage performance as well as ...statistics and graphics features. Some of these improvements have already been predicted in the original publication Antcheva et al. (2009) 3. This version will be maintained for at least 6 months; new minor revisions (“patch releases”) will be published 4 to solve problems reported with this version.
Program title: ROOT
Catalogue identifier: AEFA_v2_0
Program summary URL:
http://cpc.cs.qub.ac.uk/summaries/AEFA_v2_0.html
Program obtainable from: CPC Program Library, Queenʼs University, Belfast, N. Ireland
Licensing provisions: GNU Lesser Public License v.2.1
No. of lines in distributed program, including test data, etc.: 2 934 693
No. of bytes in distributed program, including test data, etc.: 1009
Distribution format: tar.gz
Programming language: C++
Computer: Intel i386, Intel x86-64, Motorola PPC, Sun Sparc, HP PA-RISC
Operating system: GNU/Linux, Windows XP/Vista/7, Mac OS X, FreeBSD, OpenBSD, Solaris, HP-UX, AIX
Has the code been vectorized or parallelized?: Yes
RAM: > 55 Mbytes
Classification: 4, 9, 11.9, 14
Catalogue identifier of previous version: AEFA_v1_0
Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 2499
Does the new version supersede the previous version?: Yes
Nature of problem: Storage, analysis and visualization of scientific data
Solution method: Object store, wide range of analysis algorithms and visualization methods
Reasons for new version: Added features and corrections of deficiencies
Summary of revisions: The release notes at
http://root.cern.ch/root/v528/Version528.news.html give a module-oriented overview of the changes in v5.28.00. Highlights include
•
File format Reading of TTrees has been improved dramatically with respect to CPU time (30%) and notably with respect to disk space.
•
Histograms A new TEfficiency class has been provided to handle the calculation of efficiencies and their uncertainties, TH2Poly for polygon-shaped bins (e.g. maps), TKDE for kernel density estimation, and TSVDUnfold for singular value decomposition.
•
Graphics Kerning is now supported in TLatex, PostScript and PDF; a table of contents can be added to PDF files. A new font provides italic symbols. A TPad containing GL can be stored in a binary (i.e. non-vector) image file; add support for full-scene anti-aliasing. Usability enhancements to EVE.
•
Math New interfaces for generating random number according to a given distribution, goodness of fit tests of unbinned data, binning multidimensional data, and several advanced statistical functions were added.
•
RooFit Introduction of HistFactory; major additions to RooStats.
•
TMVA Updated to version 4.1.0, adding e.g. the support for simultaneous classification of multiple output classes for several multivariate methods.
•
PROOF Many new features, adding to PROOFʼs usability, plus improvements and fixes.
•
PyROOT Support of Python 3 has been added.
•
Tutorials Several new tutorials were provided for above new features (notably RooStats).
A detailed list of all the changes is available at
http://root.cern.ch/root/htmldoc/examples/V5.
Additional comments: For an up-to-date author list see:
http://root.cern.ch/drupal/content/root-development-team and
http://root.cern.ch/drupal/content/former-root-developers.
The distribution file for this program is over 30 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead a html file giving details of how the program can be obtained is sent.
Running time: Depending on the data size and complexity of analysis algorithms.
References:
1
http://root.cern.ch.
2
http://root.cern.ch/drupal/content/production-version-528.
3
I. Antcheva, M. Ballintijn, B. Bellenot, M. Biskup, R. Brun, N. Buncic, Ph. Canal, D. Casadei, O. Couet, V. Fine, L. Franco, G. Ganis, A. Gheata, D. Gonzalez Maline, M. Goto, J. Iwaszkiewicz, A. Kreshuk, D. Marcos Segura, R. Maunder, L. Moneta, A. Naumann, E. Offermann, V. Onuchin, S. Panacek, F. Rademakers, P. Russo, M. Tadel, ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization, Comput. Phys. Commun. 180 (2009) 2499.
4
http://root.cern.ch/drupal/content/root-version-v5-28-00-patch-release-notes.
Abstract
The High Luminosity upgrade of the Large Hadron Collider (HL-LHC) will produce particle collisions with up to 200 simultaneous proton-proton interactions. These unprecedented conditions will ...create a combinatorial complexity for charged-particle track reconstruction that demands a computational cost that is expected to surpass the projected computing budget using conventional CPUs. Motivated by this and taking into account the prevalence of heterogeneous computing in cutting-edge High Performance Computing centers, we propose an efficient, fast and highly parallelizable bottom-up approach to track reconstruction for the HL-LHC, along with an associated implementation on GPUs, in the context of the Phase 2 CMS outer tracker. Our algorithm, called Segment Linking (or Line Segment Tracking), takes advantage of localized track stub creation, combining individual stubs to progressively form higher level objects that are subject to kinematical and geometrical requirements compatible with genuine physics tracks. The local nature of the algorithm makes it ideal for parallelization under the Single Instruction, Multiple Data paradigm, as hundreds of objects can be built simultaneously. The computing and physics performance of the algorithm has been tested on an NVIDIA Tesla V100 GPU, already yielding efficiency and timing measurements that are on par with the latest, multi-CPU versions of existing CMS tracking algorithms.
In April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD- developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The ...hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has allowed us to reduce Hadoop replication on a significant subset of files in our cluster, freeing hundreds of terabytes in our local storage, and to recover HDFS blocks lost due to storage degradation. An overview of the architecture of the hdfs-xrootd-fallback system will be presented, as well as details of our experience operating the service over the past year.
The Pacific Research Platform is an initiative to interconnect Science DMZs between campuses across the West Coast of the United States over a 100 gbps network. The LHC @ UC is a proof of concept ...pilot project that focuses on interconnecting 6 University of California campuses. It is spearheaded by computing specialists from the UCSD Tier 2 Center in collaboration with the San Diego Supercomputer Center. A machine has been shipped to each campus extending the concept of the Data Transfer Node to a cluster in a box that is fully integrated into the local compute, storage, and networking infrastructure. The node contains a full HTCondor batch system, and also an XRootD proxy cache. User jobs routed to the DTN can run on 40 additional slots provided by the machine, and can also flock to a common GlideinWMS pilot pool, which sends jobs out to any of the participating UCs, as well as to Comet, the new supercomputer at SDSC. In addition, a common XRootD federation has been created to interconnect the UCs and give the ability to arbitrarily export data from the home university, to make it available wherever the jobs run. The UC level federation also statically redirects to either the ATLAS FAX or CMS AAA federation respectively to make globally published datasets available, depending on end user VO membership credentials. XRootD read operations from the federation transfer through the nearest DTN proxy cache located at the site where the jobs run. This reduces wide area network overhead for subsequent accesses, and improves overall read performance. Details on the technical implementation, challenges faced and overcome in setting up the infrastructure, and an analysis of usage patterns and system scalability will be presented.
Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Algorithms should ...accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.
Following the success of the XRootd-based US CMS data federation, the AAA project investigated extensions of the federation architecture by developing two sample implementations of an XRootd, ...disk-based, caching proxy. The first one simply starts fetching a whole file as soon as a file open request is received and is suitable when completely random file access is expected or it is already known that a whole file be read. The second implementation supports on-demand downloading of partial files. Extensions to the Hadoop Distributed File System have been developed to allow for an immediate fallback to network access when local HDFS storage fails to provide the requested block. Both cache implementations are in pre-production testing at UCSD.
In the High-Luminosity Large Hadron Collider (HL-LHC), one of the most challenging computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction. ...The methods currently in use at the LHC are based on the Kalman filter. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD (single instruction, multiple data) and SIMT (single instruction, multiple thread) architectures. Our adapted Kalman-filter-based software has obtained significant parallel speedups using such processors, e.g., Intel Xeon Phi, Intel Xeon SP (Scalable Processors) and (to a limited degree) NVIDIA GPUs. Recently, an effort has started towards the integration of our software into the CMS software framework, in view of its exploitation for the Run III of the LHC. Prior reports have shown that our software allows in fact for some significant improvements over the existing framework in terms of computational performance with comparable physics performance, even when applied to realistic detector configurations and event complexity. Here, we demonstrate that in such conditions physics performance can be further improved with respect to our prior reports, while retaining the improvements in computational performance, by making use of the knowledge of the detector and its geometry.
The CMS analysis computing model was always relying on jobs running near the data, with data allocation between CMS compute centers organized at management level, based on expected needs of the CMS ...community. While this model provided high CPU utilization during job run times, there were times when a large fraction of CPUs at certain sites were sitting idle due to lack of demand, all while Terabytes of data were never accessed. To improve the utilization of both CPU and disks, CMS is moving toward controlled overflowing of jobs from sites that have data but are oversubscribed to others with spare CPU and network capacity, with those jobs accessing the data through real time Xrootd streaming over WAN. The major limiting factor for remote data access is the ability of the source storage system to serve such data, so the number of jobs accessing it must be carefully controlled. The CMS approach to this is to implement the overflowing by means of glideinWMS, a Condor based pilot system, and by providing the WMS with the known storage limits and let it schedule jobs within those limits. This paper presents the detailed architecture of the overflow-enabled glideinWMS system, together with operational experience of the past 6 months.