The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in ...use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LSI. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported.
The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and ...the internal pointer to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for each permanent output file. This information is packed and transferred to a central broker at CERN using an ActiveMQ messaging system, and then is unpacked, sorted and reformatted in order to be stored and catalogued into a central Hadoop server. This contribution describes in detail the Producer Consumer architecture to convey this information from the running jobs through the messaging system to the Hadoop server.
The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A ...subset of this information is copied to an Oracle relational database for fast dataset discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure.
Modern scientific experiments collect vast amounts of data that must be catalogued to meet multiple use cases and search criteria. In particular, high-energy physics experiments currently in ...operation produce several billion events per year. A database with the references to the files including each event in every stage of processing is necessary in order to retrieve the selected events from data storage systems. The ATLAS EventIndex project is studying the best way to store the necessary information using modern data storage technologies (Hadoop, HBase etc.) that allow saving in memory key-value pairs and select the best tools to support this application from the point of view of performance, robustness and ease of use. This paper describes the initial design and performance tests and the project evolution towards deployment and operation during 2014.
Spanish ATLAS Tier-2: facing up to LHC Run 2 González de la Hoz, S; Peso, J Del; Fassi, F ...
Journal of physics. Conference series,
12/2015, Letnik:
664, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The goal of this work is to describe the way of addressing the main challenges of Run 2 by the Spanish ATLAS Tier-2. The considerable increase of energy and luminosity for the upcoming Run 2 with ...respect to Run 1 has led to a revision of the ATLAS computing model as well as some of the main ATLAS computing tools. In this paper, the adaptation to these changes will be described. The Spanish ATLAS Tier-2 is a R&D project which consists of a distributed infrastructure composed of three sites and its members are involved in ATLAS computing progress, namely the work in different tasks and the development of new tools (e.g. Event Index).
The ATLAS experiment has produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the ...world. The EventIndex is the complete catalogue of all ATLAS events, real and simulated, keeping the references to all permanent files that contain a given event in any processing stage. It provides the means to select and access event data in the ATLAS distributed storage system, and provides support for completeness and consistency checks and trigger and offline selection overlap studies. The EventIndex employs various data handling technologies like Hadoop and Oracle databases, and it is integrated with other parts of the ATLAS distributed computing infrastructure, including systems for data, metadata, and production management. The project has been in operation since the start of LHC Run 2 in 2015, and it is in permanent development in order to satisfy the production and analysis demands and follow technology evolution. The main data store in Hadoop, based on MapFiles and HBase, has worked well during Run 2 but new solutions are being explored for the future. This paper reports on the current system performance and on the studies of a new data storage prototype that can carry the EventIndex through Run 3.
The Event Index project consists in the development and deployment of a complete catalogue of events for experiments with large amounts of data, such as the ATLAS experiment at the LHC accelerator at ...CERN. Data to be stored in the EventIndex are produced by all production jobs that run at CERN or the GRID; for every permanent output file, a snippet of information, containing the file unique identifier and the relevant attributes for each event, is sent to the central catalogue. The estimated insertion rate during the LHC Run 2 is about 80 Hz of file records containing ∼15 kHz of event records. This contribution describes the system design, the initial performance tests of the full data collection and cataloguing chain, and the project evolution towards the full deployment and operation by the end of 2014.
The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event ...identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
Abstract Objective Idiopathic dilated cardiomyopathy (IDCM) affects myocardial vascularization. Whether a lack of demand for increased myocardial vascularization and/or an impaired response of ...circulating angiogenic-supportive cells are responsible for the vascular derangements found in IDCM is unknown. Methods and results Left ventricle (LV) samples obtained at transplant from IDCM hearts were compared to control hearts from non-cardiac decedents. Peripheral colony-forming myeloid cells were extracted from age- and sex-matched IDCM patients and healthy volunteers. At the tissue level, no differences were detected in stromal cell-derived factor (SDF)-1α expression, but integrin-linked kinase (ILK) levels and activity were increased in IDCM. A marked co-localization of SDF-1α and the specific marker of cholesterol-enriched lipid rafts Flotillin (Flot)-1 was found in IDCM. SDF-1α was also highly distributed into IDCM lipid rafts. Non-adherent pro-angiogenic cells from both groups, which were found increased in patients but showed similar surface levels of CXCR-4, equally supported Matrigel-mediated cell network formation. However, SDF-1-mediated migration was reduced in IDCM-derived cells, which also exhibited decreased ILK activity and downstream ERK activation. Conclusions Taken together, our results point out that myocardial competency to increase vascularization is not altered in IDCM, but dysfunctional SDF-1-mediated migration by peripheral pro-angiogenic cells through ILK and downstream ERK signaling may compromise endothelial recovery in patients. We provide new insights into lipid raft function in human IDCM and envision more effective treatments.
The ATLAS EventIndex is a data catalogue system that stores event-related metadata for all (real and simulated) ATLAS events, on all processing stages. As it consists of different components that ...depend on other applications (such as distributed storage, and different sources of information) we need to monitor the conditions of many heterogeneous subsystems, to make sure everything is working correctly. This paper describes how we gather information about the EventIndex components and related subsystems: the Producer-Consumer architecture for data collection, health parameters from the servers that run EventIndex components, EventIndex web interface status, and the Hadoop infrastructure that stores EventIndex data. This information is collected, processed, and then displayed using CERN service monitoring software based on the Kibana analytic and visualization package, provided by CERN IT Department. EventIndex monitoring is used both by the EventIndex team and ATLAS Distributed Computing shifts crew.