ATLAS Distributed Computing during LHC Run-1 was challenged by steadily increasing computing, storage and network requirements. In addition, the complexity of processing task workflows and their ...associated data management requirements led to a new paradigm in the ATLAS computing model for Run-2, accompanied by extensive evolution and redesign of the workflow and data management systems. The new systems were put into production at the end of 2014, and gained robustness and maturity during 2015 data taking. ProdSys2, the new request and task interface; JEDI, the dynamic job execution engine developed as an extension to PanDA; and Rucio, the new data management system, form the core of Run-2 ATLAS distributed computing engine. One of the big changes for Run-2 was the adoption of the Derivation Framework, which moves the chaotic CPU and data intensive part of the user analysis into the centrally organized train production, delivering derived AOD datasets to user groups for final analysis. The effectiveness of the new model was demonstrated through the delivery of analysis datasets to users just one week after data taking, by completing the calibration loop, Tier-0 processing and train production steps promptly. The great flexibility of the new system also makes it possible to execute part of the Tier-0 processing on the grid when Tier-0 resources experience a backlog during high data-taking periods. The introduction of the data lifetime model, where each dataset is assigned a finite lifetime (with extensions possible for frequently accessed data), was made possible by Rucio. Thanks to this the storage crises experienced in Run-1 have not reappeared during Run-2. In addition, the distinction between Tier-1 and Tier-2 disk storage, now largely artificial given the quality of Tier-2 resources and their networking, has been removed through the introduction of dynamic ATLAS clouds that group the storage endpoint nucleus and its close-by execution satellite sites. All stable ATLAS sites are now able to store unique or primary copies of the datasets. ATLAS Distributed Computing is further evolving to speed up request processing by introducing network awareness, using machine learning and optimisation of the latencies during the execution of the full chain of tasks. The Event Service, a new workflow and job execution engine, is designed around check-pointing at the level of event processing to use opportunistic resources more efficiently. ATLAS has been extensively exploring possibilities of using computing resources extending beyond conventional grid sites in the WLCG fabric to deliver as many computing cycles as possible and thereby enhance the significance of the Monte-Carlo samples to deliver better physics results. The exploitation of opportunistic resources was at an early stage throughout 2015, at the level of 10% of the total ATLAS computing power, but in the next few years it is expected to deliver much more. In addition, demonstrating the ability to use an opportunistic resource can lead to securing ATLAS allocations on the facility, hence the importance of this work goes beyond merely the initial CPU cycles gained. In this paper, we give an overview and compare the performance, development effort, flexibility and robustness of the various approaches.
Fifteen Chinese High-Performance Computing sites, many of them on the TOP500 list of most powerful supercomputers, are integrated into a common infrastructure providing coherent access to a user ...through an interface based on a RESTful interface called SCEAPI. These resources have been integrated into the ATLAS Grid production system using a bridge between ATLAS and SCEAPI which translates the authorization and job submission protocols between the two environments. The ARC Computing Element (ARC-CE) forms the bridge using an extended batch system interface to allow job submission to SCEAPI. The ARC-CE was setup at the Institute for High Energy Physics, Beijing, in order to be as close as possible to the SCEAPI front-end interface at the Computing Network Information Center, also in Beijing. This paper describes the technical details of the integration between ARC-CE and SCEAPI and presents results so far with two supercomputer centers, Tianhe-IA and ERA. These two centers have been the pilots for ATLAS Monte Carlo Simulation in SCEAPI and have been providing CPU power since fall 2015.
With ever-greater computing needs and fixed budgets, big scientific experiments are turning to opportunistic resources as a means to add much-needed extra computing power. These resources can be very ...different in design from those that comprise the Grid computing of most experiments, therefore exploiting them requires a change in strategy for the experiment. They may be highly restrictive in what can be run or in connections to the outside world, or tolerate opportunistic usage only on condition that tasks may be terminated without warning. The Advanced Resource Connector Computing Element (ARC CE) with its nonintrusive architecture is designed to integrate resources such as High Performance Computing (HPC) systems into a computing Grid. The ATLAS experiment developed the ATLAS Event Service (AES) primarily to address the issue of jobs that can be terminated at any point when opportunistic computing capacity is needed by someone else. This paper describes the integration of these two systems in order to exploit opportunistic resources for ATLAS in a restrictive environment. In addition to the technical details, results from deployment of this solution in the SuperMUC HPC centre in Munich are shown.
ATLAS@Home is a volunteer computing project which allows the public to contribute to computing for the ATLAS experiment through their home or office computers. The project has grown continuously ...since its creation in mid-2014 and now counts almost 100,000 volunteers. The combined volunteers' resources make up a sizeable fraction of overall resources for ATLAS simulation. This paper takes stock of the experience gained so far and describes the next steps in the evolution of the project. These improvements include running natively on Linux to ease the deployment on for example university clusters, using multiple cores inside one task to reduce the memory requirements and running different types of workload such as event generation. In addition to technical details the success of ATLAS@Home as an outreach tool is evaluated.
ATLAS@Home: Harnessing Volunteer Computing for HEP Adam-Bourdarios, C; Cameron, D; Filip i, A ...
Journal of physics. Conference series,
12/2015, Letnik:
664, Številka:
2
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
A recent common theme among HEP computing is exploitation of opportunistic resources in order to provide the maximum statistics possible for Monte Carlo simulation. Volunteer computing has been used ...over the last few years in many other scientific fields and by CERN itself to run simulations of the LHC beams. The ATLAS@Home project was started to allow volunteers to run simulations of collisions in the ATLAS detector. So far many thousands of members of the public have signed up to contribute their spare CPU cycles for ATLAS, and there is potential for volunteer computing to provide a significant fraction of ATLAS computing resources. Here we describe the design of the project, the lessons learned so far and the future plans.
While current grid middleware implementations are quite advanced in terms of connecting jobs to resources, their client tools are generally quite minimal and features for managing large sets of jobs ...are left to the user to implement. The ARC Control Tower (aCT) is a very flexible job management framework that can be run on anything from a single users laptop to a multi-server distributed setup. aCT was originally designed to enable ATLAS jobs to be submitted to the ARC CE. However, with the recent redesign of aCT where the ATLAS specific elements are clearly separated from the ARC job management parts, the control tower can now easily be reused as a flexible generic distributed job manager for other communities. This paper will give a detailed explanation how aCT works as a job management framework and go through the steps needed to create a simple job manager using aCT and show that it can easily manage thousands of jobs.
Distributed computing resources available for high-energy physics research are becoming less dedicated to one type of workflow and researchers workloads are increasingly exploiting modern computing ...technologies such as parallelism. The current pilot job management model used by many experiments relies on static dedicated resources and cannot easily adapt to these changes. The model used for ATLAS in Nordic countries and some other places enables a flexible job management system based on dynamic resources allocation. Rather than a fixed set of resources managed centrally, the model allows resources to be requested on the fly. The ARC Computing Element (ARC-CE) and ARC Control Tower (aCT) are the key components of the model. The aCT requests jobs from the ATLAS job management system (PanDA) and submits a fully-formed job description to ARC-CEs. ARC-CE can then dynamically request the required resources from the underlying batch system. In this paper we describe the architecture of the model and the experience of running many millions of ATLAS jobs on it.
Staging data to and from remote storage services on the Grid for users’ jobs is a vital component of the ARC computing element. A new data staging framework for the computing element has recently ...been developed to address issues with the present framework, which has essentially remained unchanged since its original implementation 10 years ago. This new framework consists of an intelligent data transfer scheduler which handles priorities and fair-share, a rapid caching system, and the ability to delegate data transfer over multiple nodes to increase network throughput. This paper uses data from real user jobs running on production ARC sites to present an evaluation of the new framework. It is shown to make more efficient use of the available resources, reduce the overall time to run jobs, and avoid the problems seen with the previous simplistic scheduling system. In addition, its simple design coupled with intelligent logic provides greatly increased flexibility for site administrators, end users and future development.
Ultra-high-energy photons with energies exceeding 1017 eV offer a wealth of connections to different aspects of cosmic-ray astrophysics as well as to gamma-ray and neutrino astronomy. The recent ...observations of photons with energies in the 1015 eV range further motivate searches for even higher-energy photons. In this paper, we present a search for photons with energies exceeding 2 × 1017 eV using about 5.5 yr of hybrid data from the low-energy extensions of the Pierre Auger Observatory. The upper limits on the integral photon flux derived here are the most stringent ones to date in the energy region between 1017 and 1018 eV.
Optical properties of the atmospheric boundary layer (ABL) above the land–sea transition interface were measured using a scanning Mie lidar located 30
km away from the Adriatic coast. Based on the ...two-dimensional range-height-indicator scans, detailed information on the ABL was obtained, including parameters such as atmospheric optical depth, aerosol extinction coefficient and the height of the ABL. The presented case study indicates that the height of the ABL in the land–sea transition zone and the adjacent mountainous region was changing rapidly due to highly variable atmospheric conditions.