These proceedings give a summary of the many software upgrade projects undertaken to prepare ATLAS for the challenges of Run-2 of the LHC. Those projects include a significant reduction of the CPU ...time required for reconstruction of real data with high average pile-up event rates compared to 2012. This is required to meet the challenges of the expected increase in pileup and the higher data taking rate of up to 1 kHz. By far the most ambitious project is the implementation of a completely new analysis model, based on a new ROOT readable reconstruction format, xAOD. The new model also includes a reduction framework based on a train model to centrally produce skimmed data samples and an analysis framework. These proceedings close with a brief overview of future software projects and plans that will lead up to the coming Long Shutdown 2 as the next major ATLAS software upgrade phase.
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the ...remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently we are using the dynamic data federation Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standards for connection protocols like Amazon's S3, Microsoft's Azure, as well as WebDAV and HTTP. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have setup an instance of Dynafed and integrated it into the ATLAS data distribution management system. We report on the challenges faced during the installation and integration. We have tested ATLAS analysis jobs submitted by the PanDA production system and we report on our first experiences with its operation.
The use of opportunistic cloud resources by HEP experiments has significantly increased over the past few years. Clouds that are owned or managed by the HEP community are connected to the LHCONE ...network or the research network with global access to HEP computing resources. Private clouds, such as those supported by non-HEP research funds are generally connected to the international research network; however, commercial clouds are either not connected to the research network or only connect to research sites within their national boundaries. Since research network connectivity is a requirement for HEP applications, we need to find a solution that provides a high-speed connection. We are studying a solution with a virtual router that will address the use case when a commercial cloud has research network connectivity in a limited region. In this situation, we host a virtual router in our HEP site and require that all traffic from the commercial site transit through the virtual router. Although this may increase the network path and also the load on the HEP site, it is a workable solution that would enable the use of the remote cloud for low I/O applications. We are exploring some simple open-source solutions. In this paper, we present the results of our studies and how it will benefit our use of private and public clouds for HEP computing.
In this paper we explain how the C++ code quality is managed in ATLAS using a range of tools from compile-time through to run time testing and reflect on the substantial progress made in the last two ...years largely through the use of static analysis tools such as Coverity®, an industry-standard tool which enables quality comparison with general open source C++ code. Other available code analysis tools are also discussed, as is the role of unit testing with an example of how the GoogleTest framework can be applied to our codebase.
The distributed cloud using the CloudScheduler VM provisioning service is one of the longest running systems for HEP workloads. It has run millions of jobs for ATLAS and Belle II over the past few ...years using private and commercial clouds around the world. Our goal is to scale the distributed cloud to the 10,000-core level, with the ability to run any type of application (low I/O, high I/O and high memory) on any cloud. To achieve this goal, we have been implementing changes that utilize context-aware computing designs that are currently employed in the mobile communication industry. Context-awareness makes use of real-time and archived data to respond to user or system requirements. In our distributed cloud, we have many opportunistic clouds with no local HEP services, software or storage repositories. A context-aware design significantly improves the reliability and performance of our system by locating the nearest location of the required services. We describe how we are collecting and managing contextual information from our workload management systems, the clouds, the virtual machines and our services. This information is used not only to monitor the system but also to carry out automated corrective actions. We are incrementally adding new alerting and response services to our distributed cloud. This will enable us to scale the number of clouds and virtual machines. Further, a context-aware design will enable us to run analysis or high I/O application on opportunistic clouds. We envisage an open-source HTTP data federation (for example, the DynaFed system at CERN) as a service that would provide us access to existing storage elements used by the HEP experiments.
After the current maintenance period, the LHC will provide higher energy collisions with increased luminosity. In order to keep up with these higher rates, ATLAS software needs to speed up ...substantially. However, ATLAS code is composed of approximately 6M lines, written by many different programmers with different backgrounds, which makes code optimisation a challenge. To help with this effort different profiling tools and techniques are being used. These include well known tools, such as the Valgrind suite and Intel Amplifier; less common tools like Pin, PAPI, and GOoDA; as well as techniques such as library interposing. In this paper we will mainly focus on Pin tools and GOoDA. Pin is a dynamic binary instrumentation tool which can obtain statistics such as call counts, instruction counts and interrogate functions' arguments. It has been used to obtain CLHEP Matrix profiles, operations and vector sizes for linear algebra calculations which has provided the insight necessary to achieve significant performance improvements. Complimenting this, GOoDA, an in-house performance tool built in collaboration with Google, which is based on hardware performance monitoring unit events, is used to identify hot-spots in the code for different types of hardware limitations, such as CPU resources, caches, or memory bandwidth. GOoDA has been used in improvement of the performance of new magnetic field code and identification of potential vectorization targets in several places, such as Runge-Kutta propagation code.
In a complex multi-developer, multi-package software environment, such as the ATLAS offline framework Athena, tracking the performance of the code can be a non-trivial task in itself. In this paper ...we describe improvements in the instrumentation of ATLAS offline software that have given considerable insight into the performance of the code and helped to guide the optimization work. The first tool we used to instrument the code is PAPI, which is a programing interface for accessing hardware performance counters. PAPI events can count floating point operations, cycles, instructions and cache accesses. Triggering PAPI to start/stop counting for each algorithm and processed event results in a good understanding of the algorithm level performance of ATLAS code. Further data can be obtained using Pin, a dynamic binary instrumentation tool. Pin tools can be used to obtain similar statistics as PAPI, but advantageously without requiring recompilation of the code. Fine grained routine and instruction level instrumentation is also possible. Pin tools can additionally interrogate the arguments to functions, like those in linear algebra libraries, so that a detailed usage profile can be obtained. These tools have characterized the extensive use of vector and matrix operations in ATLAS tracking. Currently, CLHEP is used here, which is not an optimal choice. To help evaluate replacement libraries a testbed has been setup allowing comparison of the performance of different linear algebra libraries (including CLHEP, Eigen and SMatrix/SVector). Results are then presented via the ATLAS Performance Management Board framework, which runs daily with the current development branch of the code and monitors reconstruction and Monte-Carlo jobs. This framework analyses the CPU and memory performance of algorithms and an overview of results are presented on a web page. These tools have provided the insight necessary to plan and implement performance enhancements in ATLAS code by identifying the most common operations, with the call parameters well understood, and allowing improvements to be quantified in detail.
We report the observation of a narrow charmoniumlike state produced in the exclusive decay process B+/--->K+/-pi(+)pi(-)J/psi. This state, which decays into pi(+)pi(-)J/psi, has a mass of ...3872.0+/-0.6(stat)+/-0.5(syst) MeV, a value that is very near the M(D0)+M(D(*0)) mass threshold. The results are based on an analysis of 152M B-Bmacr; events collected at the Upsilon(4S) resonance in the Belle detector at the KEKB collider. The signal has a statistical significance that is in excess of 10sigma.
The deployment of HEP applications in heterogeneous grid environments can be challenging because many of the applications are dependent on specific OS versions and have a large number of complex ...software dependencies. Virtual machine monitors such as Xen could be used to package HEP applications, complete with their execution environments, to run on resources that do not meet their operating system requirements. Our previous work has shown HEP applications running within Xen suffer little or no performance penalty as a result of virtualization. However, a practical strategy is required for remotely deploying, booting, and controlling virtual machines on a remote cluster. One tool that promises to overcome the deployment hurdles using standard grid technology is the Globus Virtual Workspaces project. We describe strategies for the deployment of Xen virtual machines using Globus Virtual Workspace middleware that simplify the deployment of HEP applications.
We describe a high-throughput computing system for running jobs on public and private computing clouds using the HTCondor job scheduler and the cloudscheduler VM provisioning service. The distributed ...cloud computing system is designed to simultaneously use dedicated and opportunistic cloud resources at local and remote locations. It has been used for large-scale production particle physics workloads for many years using thousands of cores on three continents. A decade after its initial design and implementation, cloudscheduler has been modernized to take advantage of new software designs, improved operating system capabilities and support packages. The updated cloudscheduler is more resilient and scalable, with expanded capabilities. We present an overview of the original design and then describe the new version of the distributed compute cloud system. We conclude with a review of the current status and future plans.