The ATLAS detector at CERN's Large Hadron Collider presents data handling requirements on an unprecedented scale. From 2008 on the ATLAS distributed data management system, Don Quijote2 (DQ2), must ...manage tens of petabytes of experiment data per year, distributed globally via the LCG, OSG and NDGF computing grids, now commonly known as the WLCG. Since its inception in 2005 DQ2 has continuously managed all experiment data for the ATLAS collaboration, which now comprises over 3000 scientists participating from more than 150 universities and laboratories in 34 countries. Fulfilling its primary requirement of providing a highly distributed, fault-tolerant and scalable architecture DQ2 was successfully upgraded from managing data on a terabyte-scale to managing data on a petabyte-scale. We present improvements and enhancements to DQ2 based on the increasing demands for ATLAS data management. We describe performance issues, architectural changes and implementation decisions, the current state of deployment in test and production as well as anticipated future improvements. Test results presented here show that DQ2 is capable of handling data up to and beyond the requirements of full-scale data-taking.
ATLAS Data Carousel Barisits, Martin; Elmsheuser, Johannes; Borodin, M ...
11/2019, Letnik:
245
Conference Proceeding
Recenzirano
Odprti dostop
The ATLAS experiment at CERN's LHC stores detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide, currently in total about 200PB on disk and 250PB on ...tape. Data have different access characteristics due to various computational workflows, and can be accessed from different media, such as remote I/O, disk cache on hard disk drives or SSDs. Also, larger data centers provide the majority of offline storage capability via tape systems. For the High-Luminosity LHC (HL-LHC), the estimated data storage requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing was very successful in the last years with high performance and high throughput computing integration and in using opportunistic computing resources for the Monte Carlo simulation. On the other hand, equivalent opportunistic storage does not exist. ATLAS started the "Data Carousel" project to increase the usage of less expensive storage, i.e. tapes or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input data are available at any one time. With this project, we aim to demonstrate that this is the natural way to dramatically reduce our storage cost. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Phase II now requires a tight integration of the workload and data management systems. Additionally, the Data Carousel studies the feasibility to run multiple computing workflows from tape. The project is progressing very well and the results presented in this document will be used before the LHC Run 3.
Background. The purpose of this study was to evaluate the significance of aortic rupture on clinical outcome in patients after aortic repair for acute type A dissection.
Methods. One hundred and ...twenty patients underwent aortic operations with resection of the intimal tear and open distal anastomosis. Median age was 60 years (range 16 to 87); 78 were male. Thirty-six patients had only ascending aortic replacement, 82 had hemiarch repair, and 2 had the entire arch replaced. Retrograde cerebral perfusion was utilized in 66 patients (53%). Rupture defined as free blood in the pericardial space was present in 60 patients (50%). Univariate and multivariate analyses were performed to assess the risk factors for mortality and neurologic dysfunction.
Results. Overall hospital mortality rate was 24.2% ± 4.0% (± 70% confidence level) but did not differ between patients with aortic rupture or without (
p = 0.83). The incidence of permanent neurologic dysfunction was 9.4% overall, 10.5% with rupture and 8.3% without rupture (
p = 0.75). Multivariate analysis revealed absence of retrograde cerebral perfusion and any postoperative complication as statistically significant indicators for in-hospital mortality (
p < 0.05). Overall 1- and 5-year survival was 85.3% and 33.7%; among discharged patients, survival in the nonruptured group was 89% and 37%, versus 81% and 31% in the ruptured group (
p = 0.01).
Conclusions. Aortic rupture at the time of surgery does not increase the risk of hospital mortality or permanent neurologic complications in patients with acute type A dissections. However, aortic rupture at the time of surgery does influence long-term survival.
The ATLAS Distributed Data Management system organizes more than 90PB of physics data across more than 100 sites globally. Over 5 million files are transferred daily with strongly varying usage ...patterns. For performance and scalability reasons it is imperative to adapt and improve the data management system continuously. Therefore future system modifications in hardware, software, as well as policy, need to be evaluated to accomplish the intended results and to avoid unwanted side effects. Due to the complexity of large-scale distributed systems this evaluation process is primarily based on expert knowledge, as conventional evaluation methods are inadequate. However, this error-prone process lacks quantitative estimations and leads to inaccuracy as well as incorrect conclusions. In this work we present a novel, full-scale simulation framework. This modular simulator is able to accurately model the ATLAS Distributed Data Management system. The design and architecture of the component-based software is presented and discussed. The evaluation is based on the comparison with historical workloads and concentrates on the accuracy of the simulation framework. Our results show that we can accurately model the distributed data management system within 80%.
The ATLAS Distributed Data Management system requires accounting of its contents at the metadata layer. This presents a hard problem due to the large scale of the system, the high dimensionality of ...attributes, and the high rate of concurrent modifications of data. The system must efficiently account more than 90PB of disk and tape that store upwards of 500 million files across 100 sites globally. In this work a generic accounting system is presented, which is able to scale to the requirements of ATLAS. The design and architecture is presented, and the implementation is discussed. An emphasis is placed on the design choices such that the underlying data models are generally applicable to different kinds of accounting, reporting and monitoring.
Dynamic and adaptive data-management in ATLAS Lassnig, Mario; Garonne, Vincent; Branco, Miguel ...
Journal of physics. Conference series,
04/2010, Letnik:
219, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Distributed data-management on the grid is subject to huge uncertainties yet static policies govern its usage. Due to the unpredictability of user behaviour, the high-latency and the heterogeneous ...nature of the environment, distributed data-management on the grid is challenging. In this paper we present the first steps towards a future dynamic data-management system that adapts to the changing conditions and environment. Such a system would eliminate the number of manual interventions and remove unnecessary software layers, thereby providing a higher quality of service to the collaboration.
This paper describes a monitoring framework for large scale data management systems with frequent data access. This framework allows large data management systems to generate meaningful information ...from collected tracing data and to be queried on demand for specific user usage patterns in respect to source and destination locations, period intervals, and other searchable parameters. The feasibility of such a system at the petabyte scale is demonstrated by describing the implementation and operational experience of a real world management information system for the ATLAS experiment employing the proposed framework. Our observations suggest that the proposed user monitoring framework is capable of scaling to meet the needs of very large data management systems.
The distributed data management system of the high-energy physics experiment ATLAS has a critical dependency on the Oracle Relational Database Management System. Recently however, the increased ...appearance of data warehouselike workload in the experiment has put considerable and increasing strain on the Oracle database. In particular, the analysis of archived data, and the aggregation of data for summary purposes has been especially demanding. For this reason, structured storage systems were evaluated to offload the Oracle database, and to handle processing of data in a non-transactional way. This includes distributed file systems like HDFS that support parallel execution of computational tasks on distributed data, as well as non-relational databases like HBase, Cassandra, or MongoDB. In this paper, the most important analysis and aggregation use cases of the data management system are presented, and how structured storage systems were established to process them.
The ATLAS DDM accounting and Storage Usage Service Megino, Fernando H Barreiro; Garonne, Vincent; Jezequel, Stephane ...
Journal of physics. Conference series,
04/2010, Letnik:
219, Številka:
7
Journal Article
Recenzirano
Odprti dostop
The ATLAS Distributed Data Management system is the system developed and used by ATLAS for handling large amounts of data. It encompasses data bookkeeping, managing of largescale production transfers ...as well as endusers data access requests. The multi-petabyte ATLAS data volume already under management requires an accounting and monitoring service that collects different data usage informations in order to show and compare it from the experiment and application perspective. In this paper we will describe the design and implementation of the DDM Accounting and Storage Usage Service, built to meet the monitoring requirement.
We present a probabilistic tracing method that captures both user and system behaviour for large-scale distributed applications. Our method extends the notion of data stream monitoring to work within ...what we define as concealed environments. We detail the conceptual design and implementation of our method. Additionally, we evaluate the scalability of the tracing method in a real petabyte-scale distributed data management system. Finally, we demonstrate the usefulness of the collected trace data in three scenarios. First, we use collected trace data to examine the arrival of user events and find self-similar processes. Second, we examine the behaviour and performance of mass storage systems in a grid under concurrent requests. Third, we develop a model for prediction of user event arrivals based on historical data. Our results suggest that a probabilistic tracing method is scalable, straightforward to integrate with existing applications, and provides useful insight into the behaviour of very large-scale applications.