The Computing Model of the CMS experiment was prepared in 2005 and described in detail in the CMS Computing Technical Design Report. With the experience of the first years of LHC data taking and with ...the evolution of the available technologies, the CMS Collaboration identified areas where improvements were desirable. In this work we describe the most important modifications that have been, or are being implemented in the Distributed Computing Model of CMS. The Worldwide LHC computing Grid (WLCG) project acknowledged that the whole distributed computing infrastructure is impacted by this kind of changes that are happening in most LHC experiments and decided to create several Technical Evolution Groups (TEG) aiming at assessing the situation and at developing a strategy for the future. In this work we describe the CMS view on the TEG activities as well.
CMS experiment utilizes distributed computing infrastructure and its performance heavily depends on the fast and smooth distribution of data between different CMS sites. Data must be transferred from ...the Tier-0 (CERN) to the Tier-1s for processing, storing and archiving, and time and good quality are vital to avoid overflowing CERN storage buffers. At the same time, processed data has to be distributed from Tier-1 sites to all Tier-2 sites for physics analysis while Monte Carlo simulations sent back to Tier-1 sites for further archival. At the core of all transferring machinery is PhEDEx (Physics Experiment Data Export) data transfer system. It is very important to ensure reliable operation of the system, and the operational tasks comprise monitoring and debugging all transfer issues. Based on transfer quality information Site Readiness tool is used to create plans for resources utilization in the future. We review the operational procedures created to enforce reliable data delivery to CMS distributed sites all over the world. Additionally, we need to keep data and meta-data consistent at all sites and both on disk and on tape. In this presentation, we describe the principles and actions taken to keep data consistent on sites storage systems and central CMS Data Replication Database (TMDB/DBS) while ensuring fast and reliable data samples delivery of hundreds of terabytes to the entire CMS physics community.
After many years of preparation the CMS computing system has reached a situation where stability in operations limits the possibility to introduce innovative features. Nevertheless it is the same ...need of stability and smooth operations that requires the introduction of features that were considered not strategic in the previous phases. Examples are: adequate authorization to control and prioritize the access to storage and computing resources; improved monitoring to investigate problems and identify bottlenecks on the infrastructure; increased automation to reduce the manpower needed for operations; effective process to deploy in production new releases of the software tools. We present the work of the CMS Distributed Computing Integration Activity that is responsible for providing a liaison between the CMS distributed computing infrastructure and the software providers, both internal and external to CMS. In particular we describe the introduction of new middleware features during the last 18 months as well as the requirements to Grid and Cloud software developers for the future.
Section Editors Groep, D L; Bonacorsi, D
Journal of physics. Conference series,
06/2014, Letnik:
513
Journal Article
Recenzirano
Odprti dostop
1. Data Acquisition, Trigger and Controls Niko NeufeldCERNniko.neufeld@cern.ch Tassos BeliasDemokritosbelias@inp.demokritos.gr Andrew NormanFNALanorman@fnal.gov Vivian O'DellFNALodell@fnal.gov 2. ...Event Processing, Simulation and Analysis Rolf SeusterTRIUMFseuster@cern.ch Florian UhligGSIf.uhlig@gsi.de Lorenzo MonetaCERNLorenzo.Moneta@cern.ch Pete ElmerPrincetonpeter.elmer@cern.ch 3. Distributed Processing and Data Handling Nurcan OzturkU Texas Arlingtonnurcan@uta.edu Stefan RoiserCERNstefan.roiser@cern.ch Robert IllingworthFNAL Davide SalomoniINFN CNAFDavide.Salomoni@cnaf.infn.it Jeff TemplonNikheftemplon@nikhef.nl 4. Data Stores, Data Bases, and Storage Systems David LangeLLNLlange6@llnl.gov Wahid BhimjiU Edinburghwbhimji@staffmail.ed.ac.uk Dario BarberisGenovaDario.Barberis@cern.ch Patrick FuhrmannDESYpatrick.fuhrmann@desy.de Igor MandrichenkoFNALivm@fnal.gov Mark van de SandenSURF SARA sanden@sara.nl 5. Software Engineering, Parallelism & Multi-Core Solveig AlbrandLPSC/IN2P3solveig.albrand@lpsc.in2p3.fr Francesco GiacominiINFN CNAFfrancesco.giacomini@cnaf.infn.it Liz SextonFNALsexton@fnal.gov Benedikt HegnerCERNbenedikt.hegner@cern.ch Simon PattonLBNLSJPatton@lbl.gov Jim KowalkowskiFNAL jbk@fnal.gov 6. Facilities, Infrastructures, Networking and Collaborative Tools Maria GironeCERNMaria.Girone@cern.ch Ian CollierSTFC RALian.collier@stfc.ac.uk Burt HolzmanFNALburt@fnal.gov Brian Bockelman U Nebraskabbockelm@cse.unl.edu Alessandro de SalvoRoma 1Alessandro.DeSalvo@ROMA1.INFN.IT Helge MeinhardCERN Helge.Meinhard@cern.ch Ray PasetesFNAL rayp@fnal.gov Steven GoldfarbU Michigan Steven.Goldfarb@cern.ch
In this age and time, capturing state of the art of computing in a conference proceedings gets to be increasingly hard. It is quite common too for the submitted abstracts to refer to studies yet to ...be done – and the time span between abstract submission and the actual conference is often less than six months. By the time the proceedings appear in journal form, a similar period after its closing session, some of the work is over a year old, by which time new ideas will have been formed and the deployment of current ones progressed – at times beyond recognition. The preface is continued in the pdf.
The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of ...stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS; the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operating worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.
Approaching LHC data taking, the CMS experiment is deploying, commissioning and operating the building tools of its grid-based computing infrastructure. The commissioning program includes testing, ...deployment and operation of various storage solutions to support the computing workflows of the experiment. Recently, some of the Tier-1 and Tier-2 centers supporting the collaboration have started to deploy StoRM based storage systems. These are POSIX-based disk storage systems on top of which StoRM implements the Storage Resource Manager (SRM) version 2 interface allowing for a standard-based access from the Grid. In this notes we briefly describe the experience so far achieved at the CNAF Tier-1 center and at the IFCA Tier-2 center.
The CMS CERN Analysis Facility (CAF) Buchmüller, O; Bonacorsi, D; Fanzago, F ...
Journal of physics. Conference series,
04/2010, Letnik:
219, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and ...diagnosis, and high-interest physics analysis requiring fast-turnaround. In addition to the low latency requirement on the batch farm, another mandatory condition is the efficient access to the RAW detector data stored at the CERN Tier-0 facility. The CMS CAF also foresees resources for interactive login by a large number of CMS collaborators located at CERN, as an entry point for their day-by-day analysis. These resources will run on a separate partition in order to protect the high-priority use-cases described above. While the CMS CAF represents only a modest fraction of the overall CMS resources on the WLCG GRID, an appropriately sized user-support service needs to be provided. We will describe the building, commissioning and operation of the CMS CAF during the year 2008. The facility was heavily and routinely used by almost 250 users during multiple commissioning and data challenge periods. It reached a CPU capacity of 1.4MSI2K and a disk capacity at the Peta byte scale. In particular, we will focus on the performances in terms of networking, disk access and job efficiency and extrapolate prospects for the upcoming LHC first year data taking. We will also present the experience gained and the limitations observed in operating such a large facility, in which well controlled workflows are combined with more chaotic type analysis by a large number of physicists.
The Scattering and Neutrino Detector at the LHC (SND@LHC) started taking data at the beginning of Run 3 of the LHC. The experiment is designed to perform measurements with neutrinos produced in ...proton-proton collisions at the LHC in an energy range between 100 GeV and 1 TeV. It covers a previously unexplored pseudo-rapidity range of
7.2
<
η
<
8.4
. The detector is located 480 m downstream of the ATLAS interaction point in the TI18 tunnel. It comprises a veto system, a target consisting of tungsten plates interleaved with nuclear emulsion and scintillating fiber (SciFi) trackers, followed by a muon detector (UpStream, US and DownStream, DS). In this article we report the measurement of the muon flux in three subdetectors: the emulsion, the SciFi trackers and the DownStream Muon detector. The muon flux per integrated luminosity through an 18
×
18 cm
2
area in the emulsion is:
1.5
±
0.1
(
stat
)
×
10
4
fb/cm
2
.
The muon flux per integrated luminosity through a 31
×
31 cm
2
area in the centre of the SciFi is:
2.06
±
0.01
(
stat
)
±
0.12
(
sys
)
×
10
4
fb/cm
2
The muon flux per integrated luminosity through a 52
×
52 cm
2
area in the centre of the downstream muon system is:
2.35
±
0.01
(
stat
)
±
0.10
(
sys
)
×
10
4
fb/cm
2
The total relative uncertainty of the measurements by the electronic detectors is 6
%
for the SciFi and 4
%
for the DS measurement. The Monte Carlo simulation prediction of these fluxes is 20–25
%
lower than the measured values.