Abstract
Particle accelerators are an important tool to study the fundamental properties of elementary particles. Currently the highest energy accelerator is the LHC at CERN, in Geneva, Switzerland. ...Each of its four major detectors, such as the CMS detector, produces dozens of Petabytes of data per year to be analyzed by a large international collaboration. The processing is carried out on the Worldwide LHC Computing Grid, that spans over more than 170 compute centers around the world and is used by a number of particle physics experiments. Recently the LHC experiments were encouraged to make increasing use of HPC resources. While Grid resources are homogeneous with respect to the used Grid middleware, HPC installations can be very different in their setup. In order to integrate HPC resources into the highly automatized processing setups of the CMS experiment a number of challenges need to be addressed. For processing, access to primary data and metadata as well as access to the software is required. At Grid sites all this is achieved via a number of services that are provided by each center. However at HPC sites many of these capabilities cannot be easily provided and have to be enabled in the user space or enabled by other means. At HPC centers there are often restrictions regarding network access to remote services, which is again a severe limitation. The paper discusses a number of solutions and recent experiences by the CMS experiment to include HPC resources in processing campaigns.
The Spanish CMS Analysis Facility at CIEMAT Cárdenas-Montes, M.; Delgado Peris, A.; Flix, J. ...
EPJ Web of Conferences,
2024, Volume:
295
Journal Article, Conference Proceeding
Peer reviewed
Open access
The increasingly larger data volumes that the LHC experiments will accumulate in the coming years, especially in the High-Luminosity LHC era, call for a paradigm shift in the way experimental ...datasets are accessed and analyzed. The current model, based on data reduction on the Grid infrastructure, followed by interactive data analysis of manageable size samples on the physicists’ individual computers, will be superseded by the adoption of Analysis Facilities. This rapidly evolving concept is converging to include dedicated hardware infrastructures and computing services optimized for the effective analysis of large HEP data samples. This paper describes the actual implementation of this new analysis facility model at the CIEMAT institute, in Spain, to support the local CMS experiment community. Our work details the deployment of dedicated highly performant hardware, the operation of data staging and caching services ensuring prompt and efficient access to CMS physics analysis datasets, and the integration and optimization of a custom analysis framework based on ROOT’s RDataFrame and CMS NanoAOD format. Finally, performance results obtained by benchmarking the deployed infrastructure and software against a CMS analysis workflow are summarized.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The CMS experiment is working to integrate an increasing number of High Performance Computing (HPC) resources into its distributed computing infrastructure. The case of the Barcelona Supercomputing ...Center (BSC) is particularly challenging as severe network restrictions prevent the use of CMS standard computing solutions. The CIEMAT CMS group has performed significant work in order to overcome these constraints and make BSC resources available to CMS. The developments include adapting the workload management tools, replicating the CMS software repository to BSC storage, providing an alternative access to detector conditions data, and setting up a service to transfer produced output data to a nearby storage facility. In this work, we discuss the current status of this integration activity and present recent developments, such as a front-end service to improve slot usage efficiency and an enhanced transfer service that supports the staging of input data for workflows at BSC. Moreover, significant efforts have been devoted to improving the scalability of the deployed solution, automating its operation, and simplifying the matchmaking of CMS workflows that are suitable for execution at BSC.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
In 2029 the LHC will start the high-luminosity LHC program, with a boost in the integrated luminosity resulting in an unprecedented amount of ex- perimental and simulated data samples to be ...transferred, processed and stored in disk and tape systems across the worldwide LHC computing Grid. Content de- livery network solutions are being explored with the purposes of improving the performance of the compute tasks reading input data via the wide area network, and also to provide a mechanism for cost-effective deployment of lightweight storage systems supporting traditional or opportunistic compute resources. In this contribution we study the benefits of applying cache solutions for the CMS experiment, in particular the configuration and deployment of XCache serving data to two Spanish WLCG sites supporting CMS: the Tier-1 site at PIC and the Tier-2 site at CIEMAT. The deployment and configuration of the system and the developed monitoring tools will be shown, as well as data popularity studies in relation to the optimization of the cache configuration, the effects on CPU efficiency improvements for analysis tasks, and the cost benefits and impact of including this solution in the region.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
If the dark matter (DM), which is considered to constitute most of the mass of galaxies, is made of supersymmetric particles, the central region of our Galaxy should emit gamma rays produced by their ...annihilation. We use detailed models of the Milky Way to make accurate estimates of continuum gamma-ray fluxes. We argue that the most important effect, which was previously neglected, is the compression of the dark matter due to the infall of baryons to the galactic center: it boosts the expected signal by a factor 1000. To illustrate this effect, we computed the expected gamma fluxes in the minimal supergravity scenario. Our models predict that the signal could be detected at high confidence levels by imaging atmospheric C erenkov telescopes assuming that neutralinos make up most of the DM in the Universe.
Full text
Available for:
CMK, CTK, FMFMET, IJS, NUK, PNG, UM
CMS is tackling the exploitation of CPU resources at HPC centers where compute nodes do not have network connectivity to the Internet. Pilot agents and payload jobs need to interact with external ...services from the compute nodes: access to the application software (CernVM-FS) and conditions data (Frontier), management of input and output data files (data management services), and job management (HTCondor). Finding an alternative route to these services is challenging. Seamless integration in the CMS production system without causing any operational overhead is a key goal. The case of the Barcelona Supercomputing Center (BSC), in Spain, is particularly challenging, due to its especially restrictive network setup. We describe in this paper the solutions developed within CMS to overcome these restrictions, and integrate this resource in production. Singularity containers with application software releases are built and pre-placed in the HPC facility shared file system, together with conditions data files. HTCondor has been extended to relay communications between running pilot jobs and HTCondor daemons through the HPC shared file system. This operation mode also allows piping input and output data files through the HPC file system. Results, issues encountered during the integration process, and remaining concerns are discussed.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Lightweight site federation for CMS support Acosta-Silva, C.; Delgado Peris, A.; Flix, J. ...
EPJ Web of Conferences,
01/2020, Volume:
245
Journal Article, Conference Proceeding
Peer reviewed
Open access
There is a general trend in WLCG towards the federation of resources, aiming for increased simplicity, efficiency, flexibility, and availability. Although general VO-agnostic federation of resources ...between two independent and autonomous resource centres may prove arduous, a considerable amount of flexibility in resource sharing can be achieved in the context of a single WLCG VO, with a relatively simple approach. We have demonstrated this for PIC and CIEMAT, the Spanish Tier-1 and Tier-2 sites for CMS, by making use of the existing CMS xrootd federation infrastructure and profiting from the common CE/batch technology used by the two centres. This work describes how compute slots are shared between the two sites, so that the capacity of one site can be dynamically increased with idle execution slots from the remote site, and how data can be efficiently accessed irrespective of its location. Our contribution includes measurements for diverse CMS workflows comparing performances between local and remote execution, and can also be regarded as a benchmark to explore future potential scenarios, where storage resources would be concentrated in a reduced number of sites.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The Wordwide LHC Computing Grid project (WLCG) provides the computing and storage resources required by the LHC collaborations to store, process and analyse their data. It includes almost 200,000 CPU ...cores, 200 PB of disk storage and 200 PB of tape storage distributed among more than 150 sites. The WLCG operations team is responsible for several essential tasks, such as the coordination of testing and deployment of Grid middleware and services, communication with the experiments and the sites, followup and resolution of operational issues and medium/long term planning. In 2012 WLCG critically reviewed all operational procedures and restructured the organisation of the operations team as a more coherent effort in order to improve its efficiency. In this paper we describe how the new organisation works, its recent successes and the changes to be implemented during the long LHC shutdown in preparation for the LHC Run 2.
With the advent of workloads containing explicit requests for multiple cores in a single grid job, grid sites faced a new set of challenges in workload scheduling. The most common batch schedulers ...deployed at HEP computing sites do a poor job at multicore scheduling when using only the native capabilities of those schedulers. This paper describes how efficient multicore scheduling was achieved at the sites the authors represent, by implementing dynamically-sized multicore partitions via a minimalistic addition to the Torque/Maui batch system already in use at those sites. The paper further includes example results from use of the system in production, as well as measurements on the dependence of performance (especially the ramp-up in throughput for multicore jobs) on node size and job size.
The Worldwide LHC Computing Grid relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any ...network issues, including connection failures, congestion, traffic routing, etc. The WLCG Network and Transfer Metrics project aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure. This includes FTS monitoring information, monitoring data from the XRootD federation, as well as results of the perfSONAR tests. The main challenge consists of further integrating and analyzing this information in order to allow the optimizing of data transfers and workload management systems of the LHC experiments. In this contribution, we present our activity in commissioning WLCG perfSONAR network and integrating network and transfer metrics: We motivate the need for the network performance monitoring, describe the main use cases of the LHC experiments as well as status and evolution in the areas of configuration and capacity management, datastore and analytics, including integration of transfer and network metrics and operations and support.