ATLAS software and computing is in a period of intensive evolution. The current long shutdown presents an opportunity to assimilate lessons from the very successful Run 1 (2009–2013) and to prepare ...for the substantially increased computing requirements for Run 2 (from spring 2015). Run 2 will bring a near doubling of the energy and the data rate, high event pile-up levels, and higher event complexity from detector upgrades, meaning the number and complexity of events to be analyzed will increase dramatically. At the same time operational loads must be reduced through greater automation, a wider array of opportunistic resources must be supported, costly storage must be used with greater efficiency, a sophisticated new analysis model must be integrated, and concurrency features of new processors must be exploited. This paper surveys the distributed computing aspects of the upgrade program and the plans for 2014 to exercise the new capabilities in a large scale Data Challenge.
We discuss the physics potential and the experimental challenges of an upgraded LHC running at an instantaneous luminosity of 1035 cm-2s-1. The detector R&D needed to operate ATLAS and CMS in a very ...high radiation environment and the expected detector performance are discussed. A few examples of the increased physics potential are given, ranging from precise measurements within the Standard Model (in particular in the Higgs sector) to the discovery reach for several New Physics processes.
The EventIndex is the complete catalogue of all ATLAS real and simulated events, keeping the references to all permanent files that contain a given event in any processing stage; its implementation ...has been substantially revised in advance of LHC Run 3 to be able to scale to the higher production rates. The Event Picking Server automates the procedure of finding the locations of large numbers of events, extracting and collecting them into separate files. It supports different formats of events and has an elastic workflow for different input data. The convenient graphical interface of the Event Picking Server is integrated with ATLAS SSO. The monitoring system controls the performance of all parts of the service.
The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, ...calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of “NoSQL” databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions of event records, ...each of which consists of ∼100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. We report also on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports.
The ATLAS experiment is commissioning its computing system in preparation for LHC data. Part of this activity consists in testing the data flow from the online data acquisition to the offline ...processing system, and the distribution of raw and processed data to the external computing centres. A series of functional and rate tests has been performed in 2006 and 2007, allowing the optimisation of the hardware and software components of this system; the last phase of commissioning, the so-called Final Dress Rehearsal, consisting of an integration tests of all components, will take place later in 2007. This paper describes the performed tests, the problems that we encountered, and the solutions we found.
The ATLAS EventIndex and its evolution towards Run 3 Villaplana Perez, M; Alexandrov, E; Aleksandrov, I ...
Journal of physics. Conference series,
04/2020, Letnik:
1525, Številka:
1
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
The ATLAS experiment has produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the ...world. The EventIndex is the complete catalogue of all ATLAS events, real and simulated, keeping the references to all permanent files that contain a given event in any processing stage. It provides the means to select and access event data in the ATLAS distributed storage system, and provides support for completeness and consistency checks and trigger and offline selection overlap studies. The EventIndex employs various data handling technologies like Hadoop and Oracle databases, and it is integrated with other parts of the ATLAS distributed computing infrastructure, including systems for data, metadata, and production management. The project has been in operation since the start of LHC Run 2 in 2015, and it is in permanent development in order to satisfy the production and analysis demands and follow technology evolution. The main data store in Hadoop, based on MapFiles and HBase, has worked well during Run 2 but new solutions are being explored for the future. This paper reports on the current system performance and on the studies of a new data storage prototype that can carry the EventIndex through Run 3.
The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in ...use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LSI. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported.
Conditions data (for example: alignment, calibration, data quality) are used extensively in the processing of real and simulated data in ATLAS. The volume and variety of the conditions data needed by ...different types of processing are quite diverse, so optimizing its access requires a careful understanding of conditions usage patterns. These patterns can be quantified by mining representative log files from each type of processing and gathering detailed information about conditions usage for that type of processing into a central repository.
The ATLAS Distributed Computing (ADC) group established a new Computing Run Coordinator (CRC) shift at the start of LHC Run 2 in 2015. The main goal was to rely on a person with a good overview of ...the ADC activities to ease the ADC experts' workload. The CRC shifter keeps track of ADC tasks related to their fields of expertise and responsibility. At the same time, the shifter maintains a global view of the day-to-day operations of the ADC system. During Run 1, this task was accomplished by a person of the expert team called the ADC Manager on Duty (AMOD), a position that was removed during the shutdown period due to the reduced number and availability of ADC experts foreseen for Run 2. The CRC position was proposed to cover some of the AMODs former functions, while allowing more people involved in computing to participate. In this way, CRC shifters help with the training of future ADC experts. The CRC shifters coordinate daily ADC shift operations, including tracking open issues, reporting, and representing ADC in relevant meetings. The CRC also facilitates communication between the ADC experts team and the other ADC shifters. These include the Distributed Analysis Support Team (DAST), which is the first point of contact for addressing all distributed analysis questions, and the ATLAS Distributed Computing Shifters (ADCoS), which check and report problems in central services, sites, Tier-0 export, data transfers and production tasks. Finally, the CRC looks at the level of ADC activities on a weekly or monthly timescale to ensure that ADC resources are used efficiently.