Until now, geometry information for the detector description of the ATLAS experiment was only defined in C++ code, stored in online relational databases integrated into the experiment's frameworks or ...described in files with text-based markup languages. In all cases, to build and use the complete detector geometry, a full software stack was needed. In this paper, we present a new and scalable mechanism to store the geometry data and to query and serve the detector description data through a web interface and a REST API. This new approach decouples the geometry information from the experiment's framework. Moreover, it provides new functionalities to users, who can now search for specific volumes and get partial detector description, or filter geometry data based on custom criteria. We present two approaches to build a REST API to serve geometry data, based on two different technologies used in other fields and communities: The graph database Neo4j and the search engine ElasticSearch. We describe their characteristics, and we compare them in a HEP context.
HL-LHC will confront the WLCG community with enormous data storage, management and access challenges. These are as much technical as economical. In the WLCG-DOMA Access working group, members of the ...experiments and site managers have explored different models for data access and storage strategies to reduce cost and complexity, taking into account the boundary conditions given by our community.Several of these scenarios have been evaluated quantitatively, such as the Data Lake model and incremental improvements of the current computing model with respect to resource needs, costs and operational complexity.To better understand these models in depth, analysis of traces of current data accesses and simulations of the impact of new concepts have been carried out. In parallel, evaluations of the required technologies took place. These were done in testbed and production environments at small and large scale.We will give an overview of the activities and results of the working group, describe the models and summarise the results of the technology evaluation focusing on the impact of storage consolidation in the form of Data Lakes, where the use of streaming caches has emerged as a successful approach to reduce the impact of latency and bandwidth limitation.We will describe the experience and evaluation of these approaches in different environments and usage scenarios. In addition we will present the results of the analysis and modelling efforts based on data access traces of the experiments.
We will describe a component of the Intelligent Data Delivery Service being developed in collaboration with IRIS-HEP and the LHC experiments. ServiceX is an experiment-agnostic service to enable ...on-demand data delivery specifically tailored for nearly-interactive vectorized analysis. This work is motivated by the data engineering challenges posed by HL-LHC data volumes and the increasing popularity of python and Spark-based analysis workflows.
ServiceX gives analyzers the ability to query events by dataset metadata. It uses containerized transformations to extract just the data required for the analysis. This operation is colocated with the data to avoid transferring unnecessary branches over the WAN. Simple filtering operations are supported to further reduce the amount of data transferred.
Transformed events are cached in a columnar datastore to accelerate delivery of subsequent similar requests. ServiceX will learn commonly related columns and automatically include them in the transformation to increase the potential for cache hits by other users.
Selected events are streamed to the analysis system using an efficient wire protocol that can be readily consumed by a variety of computational frameworks. This reduces time-to-insight for physics analysis by delegating to ServiceX the complexity of event selection, slimming, reformatting, and streaming.
Caching Servers for ATLAS Gardner, R W; Hanushevsky, A; Vukotic, I ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
6
Journal Article
Recenzirano
Odprti dostop
As many LHC Tier-3 and some Tier-2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are ...essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either mode. They can also operate autonomously (i.e. demand driven), be centrally managed (i.e. a Rucio managed cache), or operate in both modes. We explore the pros and cons of various configurations as well as practical requirements for caching to be effective. While we focus on XRootD caches, the analysis should apply to other kinds of caches as well.
Big Data technologies have proven to be very useful for storage, processing and visualization of derived metrics associated with ATLAS distributed computing (ADC) services. Logfiles, database ...records, and metadata from a diversity of systems have been aggregated and indexed to create an analytics platform for ATLAS ADC operations analysis. Dashboards, wide area data access cost metrics, user analysis patterns, and resource utilization efficiency charts are produced flexibly through queries against a powerful analytics cluster. Here we explore whether these techniques and associated analytics ecosystem can be applied to add new modes of open, quick, and pervasive access to ATLAS event data. Such modes would simplify access and broaden the reach of ATLAS public data to new communities of users. An ability to efficiently store, filter, search and deliver ATLAS data at the event and/or sub-event level in a widely supported format would enable or significantly simplify usage of machine learning environments and tools like Spark, Jupyter, R, SciPy, Caffe, TensorFlow, etc. Machine learning challenges such as the Higgs Boson Machine Learning Challenge, the Tracking challenge, Event viewers (VP1, ATLANTIS, ATLASrift), and still to be developed educational and outreach tools would be able to access the data through a simple REST API. In this preliminary investigation we focus on derived xAOD data sets. These are much smaller than the primary xAODs having containers, variables, and events of interest to a particular analysis. Being encouraged with the performance of Elasticsearch for the ADC analytics platform, we developed an algorithm for indexing derived xAOD event data. We have made an appropriate document mapping and have imported a full set of standard model W/Z datasets. We compare the disk space efficiency of this approach to that of standard ROOT files, the performance in simple cut flow type of data analysis, and will present preliminary results on its scaling characteristics with different numbers of clients, query complexity, and size of the data retrieved.
The Worldwide LHC Computing Grid relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any ...network issues, including connection failures, congestion, traffic routing, etc. The WLCG Network and Transfer Metrics project aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure. This includes FTS monitoring information, monitoring data from the XRootD federation, as well as results of the perfSONAR tests. The main challenge consists of further integrating and analyzing this information in order to allow the optimizing of data transfers and workload management systems of the LHC experiments. In this contribution, we present our activity in commissioning WLCG perfSONAR network and integrating network and transfer metrics: We motivate the need for the network performance monitoring, describe the main use cases of the LHC experiments as well as status and evolution in the areas of configuration and capacity management, datastore and analytics, including integration of transfer and network metrics and operations and support.
Using Xrootd to Federate Regional Storage Bauerdick, L; Benjamin, D; Bloom, K ...
Journal of physics. Conference series,
01/2012, Letnik:
396, Številka:
4
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
While the LHC data movement systems have demonstrated the ability to move data at the necessary throughput, we have identified two weaknesses: the latency for physicists to access data and the ...complexity of the tools involved. To address these, both ATLAS and CMS have begun to federate regional storage systems using Xrootd. Xrootd, referring to a protocol and implementation, allows us to provide data access to all disk-resident data from a single virtual endpoint. This “redirector” discovers the actual location of the data and redirects the client to the appropriate site. The approach is particularly advantageous since typically the redirection requires much less than 500 milliseconds and the Xrootd client is conveniently built into LHC physicists’ analysis tools. Currently, there are three regional storage federations - a US ATLAS region, a European CMS region, and a US CMS region. The US ATLAS and US CMS regions include their respective Tier 1, Tier 2 and some Tier 3 facilities; a large percentage of experimental data is available via the federation. Additionally, US ATLAS has begun studying low-latency regional federations of close-by sites. From the base idea of federating storage behind an endpoint, the implementations and use cases diverge. The CMS software framework is capable of efficiently processing data over high-latency links, so using the remote site directly is comparable to accessing local data. The ATLAS processing model allows a broad spectrum of user applications with varying degrees of performance with regard to latency; a particular focus has been optimizing n-tuple analysis. Both VOs use GSI security. ATLAS has developed a mapping of VOMS roles to specific file system authorizations, while CMS has developed callouts to the site's mapping service. Each federation presents a global namespace to users. For ATLAS, the global-to-local mapping is based on a heuristic-based lookup from the site's local file catalog, while CMS does the mapping based on translations given in a configuration file. We will also cover the latest usage statistics and interesting use cases that have developed over the previous 18 months.
The computing models of the LHC experiments are gradually moving from hierarchical data models with centrally managed data pre-placement towards federated storage which provides seamless access to ...data files independently of their location and dramatically improve recovery due to fail-over mechanisms. Construction of the data federations and understanding the impact of the new approach to data management on user analysis requires complete and detailed monitoring. Monitoring functionality should cover the status of all components of the federated storage, measuring data traffic and data access performance, as well as being able to detect any kind of inefficiencies and to provide hints for resource optimization and effective data distribution policy. Data mining of the collected monitoring data provides a deep insight into new usage patterns. In the WLCG context, there are several federations currently based on the XRootD technology. This paper will focus on monitoring for the ATLAS and CMS XRootD federations implemented in the Experiment Dashboard monitoring framework. Both federations consist of many dozens of sites accessed by many hundreds of clients and they continue to grow in size. Handling of the monitoring flow generated by these systems has to be well optimized in order to achieve the required performance. Furthermore, this paper demonstrates the XRootD monitoring architecture is sufficiently generic to be easily adapted for other technologies, such as HTTP/WebDAV dynamic federations.
We describe recent I/O testing frameworks that we have developed and applied within the UK GridPP Collaboration, the ATLAS experiment and the DPM team, for a variety of distinct purposes. These ...include benchmarking vendor supplied storage products, discovering scaling limits of SRM solutions, tuning of storage systems for experiment data analysis, evaluating file access protocols, and exploring I/O read patterns of experiment software and their underlying event data models. With multiple grid sites now dealing with petabytes of data, such studies are becoming essential. We describe how the tests build, and improve, on previous work and contrast how the use-cases differ. We also detail the results obtained and the implications for storage hardware, middleware and experiment software.
We detail recent changes to ROOT-based I/O within the ATLAS experiment. The ATLAS persistent event data model continues to make considerable use of a ROOT I/O backend through POOL persistency. Also ...ROOT is used directly in later stages of analysis that make use of a flat-ntuple based “D3PD” data-type. For POOL/ROOT persistent data, several improvements have been made including implementation of automatic basket optimisation, memberwise streaming, and changes to split and compression levels. Optimisations have also been made for the D3PD format. We present a full evaluation of the resulting performance improvements from these, including in the case of selected retrieval of events. We also evaluate ongoing changes internal to ROOT, in the ATLAS context, for both POOL and D3PD data. We report results not only from test systems, but also utilising new automated tests on real ATLAS production resources which employ a wide range of storage technologies.