Abstract
The PHENIX experiment (
the Pioneering High Energy Nuclear Interaction eXperiment)
is the largest of the four experiments that have operated at the Relativistic Heavy Ion Collider (RHIC), ...taking data in 2000-2016. PHENIX has made fundamental contributions to the discovery and study of the Quark-Gluon Plasma, advancement of spin physics and other areas. Currently, the PHENIX Collaboration is analyzing large data samples previously collected, while facing challenges in the area of Data and Analysis Preservation. We describe the strategy and practices employed by the PHENIX Collaboration to meet these challenges by leveraging state-of-the-art platforms and tools created and maintained by the High Energy and Nuclear Physics communities, including Zenodo, HEPData, OpenData, REANA and others.
For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ...ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges are being met with a R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques used for efficient indexing of the data. We also discuss the hardware requirements as they were determined by testing with actual data and realistic loads.
Future experiments such as the Deep Underground Neutrino Experiment (DUNE) will use very large Liquid Argon Projection Chambers (LArTPC) containing tens of kilotons of cryogenic medium. To be able to ...utilize sensitive volume of that size, current design employs arrays of wire electrodes grouped in readout planes, arranged with a stereo angle. This leads to certain challenges for object reconstruction due to ambiguities inherent in such a scheme. We present a novel reconstruction method (named "Wirecell") inspired by principles used in tomography, which brings the LArTPC technology closer to its full potential.
The Production and Distributed Analysis System (PanDA) plays a key role in the ATLAS distributed computing infrastructure. All ATLAS Monte-Carlo simulation and data reprocessing jobs pass through the ...PanDA system. We will describe how PanDA manages job execution on the grid using dynamic resource estimation and data replication together with intelligent brokerage in order to meet the scaling and automation requirements of ATLAS distributed computing. PanDA is also the primary ATLAS system for processing user and group analysis jobs, bringing further requirements for quick, flexible adaptation to the rapidly evolving analysis use cases of the early datataking phase, in addition to the high reliability, robustness and usability needed to provide efficient and transparent utilization of the grid for analysis users. We will describe how PanDA meets ATLAS requirements, the evolution of the system in light of operational experience, how the system has performed during the first LHC data-taking phase and plans for the future.
The article deals with the influence of weak and super weak impacts on certain technological processes in nanotechnology related to the synthesis of nanoscale films and coatings. We also touch upon ...the impacts of weak diffraction fields of complex shape on the formation of fractal films and coatings.
The Deep Underground Neutrino Experiment (DUNE) will employ a set of Liquid Argon Time Projection Chambers (LArTPC) with a total mass of 40 kt as the main components of its Far Detector. In order to ...validate this technology and characterize the detector performance at full scale, an ambitious experimental program (called "protoDUNE") has been initiated which includes a test of the large-scale prototypes for the single-phase and dual-phase LArTPC technologies, which will run in a beam at CERN. The total raw data volume that is slated to be collected during the scheduled 3-month beam run is estimated to be in excess of 2.5 PB for each detector. This data volume will require that the protoDUNE experiment carefully design the DAQ, data handling and data quality monitoring systems to be capable of dealing with challenges inherent with peta-scale data management while simultaneously fulfilling the requirements of disseminating the data to a worldwide collaboration and DUNE associated computing sites. We present our approach to solving these problems by leveraging the design, expertise and components created for the LHC and Intensity Frontier experiments into a unified architecture that is capable of meeting the needs of protoDUNE.
This document describes the design of the new Production System of the ATLAS experiment at the LHC 1. The Production System is the top level workflow manager which translates physicists' needs for ...production level processing and analysis into actual workflows executed across over a hundred Grid sites used globally by ATLAS. As the production workload increased in volume and complexity in recent years (the ATLAS production tasks count is above one million, with each task containing hundreds or thousands of jobs) there is a need to upgrade the Production System to meet the challenging requirements of the next LHC run while minimizing the operating costs. In the new design, the main subsystems are the Database Engine for Tasks (DEFT) and the Job Execution and Definition Interface (JEDI). Based on users' requests, DEFT manages inter-dependent groups of tasks (Meta-Tasks) and generates corresponding data processing workflows. The JEDI component then dynamically translates the task definitions from DEFT into actual workload jobs executed in the PanDA Workload Management System 2. We present the requirements, design parameters, basics of the object model and concrete solutions utilized in building the new Production System and its components.
In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain ...information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS NoSQL architecture.
The PanDA (Production and Distributed Analysis) Workload Management System is used for ATLAS distributed production and analysis worldwide. The needs of ATLAS global computing imposed challenging ...requirements on the design of PanDA in areas such as scalability, robustness, automation, diagnostics, and usability for both production shifters and analysis users. Through a system-wide job database, the PanDA monitor provides a comprehensive and coherent view of the system and job execution, from high level summaries to detailed drill-down job diagnostics. It is (like the rest of PanDA) an Apache-based Python application backed by Oracle. The presentation layer is HTML code generated on the fly in the Python application which is also responsible for managing database queries. However, this approach is lacking in user interface flexibility, simplicity of communication with external systems, and ease of maintenance. A decision was therefore made to migrate the PanDA monitor server to Django Web Application Framework and apply JSON/AJAX technology in the browser front end. This allows us to greatly reduce the amount of application code, separate data preparation from presentation, leverage open source for tools such as authentication and authorization mechanisms, and provide a richer and more dynamic user experience. We describe our approach, design and initial experience with the migration process.
For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ...ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges are being met with an R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques used for efficient indexing of the data. We also discuss the hardware requirements as they were determined by testing with actual data and realistic rate of queries. In conclusion, we present our experience with operating a Cassandra cluster over an extended period of time and with data load adequate for planned application.