Abstract
CERN experiments are preparing for the HL-LHC era, which will bring an unprecedented volume of scientific data. These data will need to be stored and processed by thousands of physicists, ...but expected resource growth is nowhere near the extrapolated requirements of existing models, in terms of both storage volume and compute power. Opportunistic CPU resources such as HPCs and university clusters can provide extra CPU cycles, but there is no opportunistic storage. In this article, we will present the main architectural ideas, deployment details, and test results, with emphasis on our research to build a prototype of a distributed data processing and storage system with a focus on optimizing the efficiency of resources by reducing overhead costs for accessing the data. The described prototype was built using the geographically distributed WLCG sites and university clusters in Russia.
The next phase of LHC Operations - High Luminosity LHC (HL-LHC), which is aimed at ten-fold increase in the luminosity of proton-proton collisions at the energy of 14 TeV, is expected to start ...operation in 2027-2028 and will deliver an unprecedented scientific data volume of multi-exabyte scale. This amount of data has to be stored and the corresponding storage system should ensure fast and reliable data delivery for processing by scientific groups distributed all over the world. The present LHC computing and data processing model will not be able to provide the required infrastructure growth even taking into account the expected hardware technology evolution. To address this challenge the new state-of-the-art computing infrastructure technologies are now being developed and are presented here. The possibilities of application of the HL-LHC distributed data handling technique for other particle and astro-particle physics experiments dealing with large-scale data volumes like DUNE, LSST, Belle-II, JUNO, SKAO etc. are also discussed.
During the Afghan War, the Mujahideen claimed that the Afghan communists were atheists who were subservient to Moscow and did not have the legitimacy to rule Afghanistan. The war became a contest for ...legitimacy in Afghanistan and internationally. The Soviets and the Afghan communists portrayed communist Afghanistan as Islamic and therefore legitimate in the international arena. The Soviets elaborated an information campaign emphasising Islam and strengthened Afghanistan's contacts with Muslim countries to show that the Afghan communists were Muslims too. They hoped international recognition would reduce Muslim countries' support to the Mujahideen and improve the Afghan communists' acceptance at home.
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid ...sites and universities' clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
BigPanDA monitoring is a web application that provides various processing and representation of the Production and Distributed Analysis (PanDA) system objects states. Analysing hundreds of millions ...of computation entities, such as an event or a job, BigPanDA monitoring builds different scales and levels of abstraction reports in real time mode. Provided information allows users to drill down into the reason of a concrete event failure or observe the broad picture such as tracking the computation nucleus and satellites performance or the progress of a whole production campaign. PanDA system was originally developed for the ATLAS experiment. Currently, it manages execution of more than 2 million jobs distributed over 170 computing centers worldwide on daily basis. BigPanDA is its core component commissioned in the middle of 2014 and now is the primary source of information for ATLAS users about the state of their computations and the source of decision support information for shifters, operators and managers. In this work, we describe the evolution of the architecture, current status and plans for the development of the BigPanDA monitoring.
The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the ...Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC's first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this paper we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.
Contemporary scientific experiments produce significant amount of data as well as scientific publications based on this data. Since volumes of both are constantly increasing, it becomes more and more ...problematic to establish a connection between a given paper and the underlying data. However, such an association is one of the crucial pieces of information for performing various tasks, such as validating the scientific results presented in paper, comparing different approaches to deal with a problem or even simply understanding the situation in some area of science. Authors of this paper are working under the Data Knowledge Base (DKB) R&D project, initiated in 2016 to solve this issue for the ATLAS experiment at CERN. This project is aimed at developing of the software environment, providing the storage and a coherent representation of the basic information objects. In this paper authors present a metadata model developed for the ATLAS experiment, the architecture of the DKB system and its main components. Special attention is paid to the Kafka-based ETL subsystem implementation and mechanism for extraction of meta-information from the texts of ATLAS publications
•Wide distribution and high abundance of I. persulcatus ticks in Karelia.•Higher infection rate of I. persulcatus ticks, than I. ricinus.•TBEV was not detected in southwestern Karelia, where I. ...ricinus dominates.•No correlation between the infection rate of ticks and the sex of the vector.•Geographical and temporal variability of tick infection rates.
Ixodid ticks (Acarina, Ixodidae) are vectors of dangerous human infections. The main tick species that determine the epidemiological situation for tick-borne diseases in northern Europe are Ixodes ricinus and Ixodes persulcatus. In recent years, significant changes in the number and distribution of these species have been observed, accompanied by an expansion of the sympatric range. This work summarizes the data of long-term studies carried out in Karelia since 2007 on the infection of I. persulcatus and I. ricinus ticks with various pathogens, including new viruses with unclear pathogenic potential. As a result, tick-borne encephalitis virus (TBEV, Siberian genotype), Alongshan virus, several representatives of the family Phenuiviridae, Borrelia afzelii, Borrelia garinii, Ehrlichia muris, Candidatus Rickettsia tarasevichiae and Candidatus Lariskella arthropodarum were identified. Data were obtained on the geographical and temporal variability of tick infection rates with these main pathogens. The average infection rates of I. persulcatus with TBEV and Borrelia burgdorferi sensu lato were 4.4% and 23.4% and those of I. ricinus were 1.1% and 11.9%, respectively. We did not find a correlation between the infection rate of ticks with TBEV, B. burgdorferi s.l. and Ehrlichia muris/chaffeensis with the sex of the vector. In general, the peculiarities of the epidemiological situation in Karelia are determined by the wide distribution and high abundance of I. persulcatus ticks and by their relatively high infection rate with TBEV and B. burgdorferi s.l. in most of the territory, including the periphery of the range.
The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific ...workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn't exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented "train" model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.