The second phase of the LHC, the High-Luminosity LHC, is scheduled to start in 2029, after a shutdown during which the beam intensity and focusing will be significantly upgraded. For this HL-LHC era, ...also the CMS detector will receive an extensive upgrade, primarily to maintain its physics performance at increasing pileup. The Phase-2 CMS Level-1 trigger rate will increase to 750 kHz, for an estimated data rate in excess of 50 Tbit/s. The Phase-2 CMS off-detector electronics will be based on the ATCA standard, with back-end boards receiving the detector data from the on-detector front-ends via custom, radiation-tolerant, optical links. The CMS Phase-2 data acquisition design tightens the integration between trigger control and data flow, extending the synchronous regime of the DAQ system. At the core of the design is the DAQ and Timing Hub, a custom ATCA hub card forming the bridge between the different, detector-specific, control and readout electronics and the common timing, trigger, and control systems. The overall synchronisation and data flow of the experiment is handled by the Trigger and Timing Control and Distribution System. For increased flexibility during commissioning and calibration runs, the design of the Phase-2 trigger and timing distribution system breaks with the traditional distribution tree, in favour of a configurable network connecting multiple independent control units to all off-detector endpoints. In order to reduce the number of custom hardware designs required, the DAQ hardware is designed such that it can also be used to implement the Trigger and Timing Control and Distribution System.
The upgraded High Luminosity LHC, after the third Long Shutdown (LS3), will provide an instantaneous luminosity of 7.5 × 1034 cm−2s−1 (levelled), at the price of extreme pileup of up to 200 ...interactions per crossing. In LS3, the CMS Detector will also undergo a major upgrade to prepare for the phase-2 of the LHC physics program, starting around 2025. The upgraded detector will be read out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750 kHz. Complete events will be analysed by software algorithms running on standard processing nodes, and selected events will be stored permanently at a rate of up to 10 kHz for offline processing and analysis. In this paper we discuss the baseline design of the DAQ and HLT systems for the phase-2, taking into account the projected evolution of high speed network fabrics for event building and distribution, and the anticipated performance of general purpose CPU. Implications on hardware and infrastructure requirements for the DAQ "data center" are analysed. Emerging technologies for data reduction are considered. Novel possible approaches to event building and online processing, inspired by trending developments in other areas of computing dealing with large masses of data, are also examined. We conclude by discussing the opportunities offered by reading out and processing parts of the detector, wherever the front-end electronics allows, at the machine clock rate (40 MHz). This idea presents interesting challenges and its physics potential should be studied.
The Compact Muon Solenoid is a large a complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient ...operation of the detector requires widespread and timely access to a broad range of monitoring and status information. To that end the Web Based Monitoring (WBM) system was developed to present data to users located anywhere from many underlying heterogeneous sources, from real time messaging systems to relational databases. This system provides the power to combine and correlate data in both graphical and tabular formats of interest to the experimenters, including data such as beam conditions, luminosity, trigger rates, detector conditions, and many others, allowing for flexibility on the user's side. This paper describes the WBM system architecture and describes how the system has been used from the beginning of data taking until now (Run1 and Run 2).
Performance of the CMS Event Builder Andre, J-M; Behrens, U; Branson, J ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of O(100GB/s) to ...the high-level trigger farm. The DAQ architecture is based on state-of-the-art network technologies for the event building. For the data concentration, 10/40 Gbit/s Ethernet technologies are used together with a reduced TCP/IP protocol implemented in FPGA for a reliable transport between custom electronics and commercial computing hardware. A 56 Gbit/s Infiniband FDR Clos network has been chosen for the event builder. This paper presents the implementation and performance of the event-building system.
The efficiency of the Data Acquisition (DAQ) of the Compact Muon Solenoid (CMS) experiment for LHC Run 2 is constantly being improved. A significant factor affecting the data taking efficiency is the ...experience of the DAQ operator. One of the main responsibilities of the DAQ operator is to carry out the proper recovery procedure in case of failure of data-taking. At the start of Run 2, understanding the problem and finding the right remedy could take a considerable amount of time (up to many minutes). Operators heavily relied on the support of on-call experts, also outside working hours. Wrong decisions due to time pressure sometimes lead to an additional overhead in recovery time. To increase the efficiency of CMS data-taking we developed a new expert system, the DAQExpert, which provides shifters with optimal recovery suggestions instantly when a failure occurs. DAQExpert is a web application analyzing frequently updating monitoring data from all DAQ components and identifying problems based on expert knowledge expressed in small, independent logic-modules written in Java. Its results are presented in real-time in the control room via a web-based GUI and a sound-system in a form of short description of the current failure, and steps to recover.
The New CMS DAQ System for Run-2 of the LHC Bawej, Tomasz; Behrens, Ulf; Branson, James ...
IEEE transactions on nuclear science,
06/2015, Letnik:
62, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The data acquisition (DAQ) system of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s to the ...high level trigger (HLT) farm. The HLT farm selects interesting events for storage and offline analysis at a rate of around 1 kHz. The DAQ system has been redesigned during the accelerator shutdown in 2013/14. The motivation is twofold: Firstly, the current compute nodes, networking, and storage infrastructure will have reached the end of their lifetime by the time the LHC restarts. Secondly, in order to handle higher LHC luminosities and event pileup, a number of sub-detectors will be upgraded, increasing the number of readout channels and replacing the off-detector readout electronics with a μTCA implementation. The new DAQ architecture will take advantage of the latest developments in the computing industry. For data concentration, 10/40 Gb/s Ethernet technologies will be used, as well as an implementation of a reduced TCP/IP in FPGA for a reliable transport between custom electronics and commercial computing hardware. A Clos network based on 56 Gb/s FDR Infiniband has been chosen for the event builder with a throughput of ~ 4 Tb/s. The HLT processing is entirely file based. This allows the DAQ and HLT systems to be independent, and to use the HLT software in the same way as for the offline processing. The fully built events are sent to the HLT with 1/10/40 Gb/s Ethernet via network file systems. Hierarchical collection of HLT accepted events and monitoring meta-data are stored into a global file system. This paper presents the requirements, technical choices, and performance of the new system.
TCP and the socket abstraction have barely changed over the last two decades, but at the network layer there has been a giant leap from a few megabits to 100 gigabits in bandwidth. At the same time, ...CPU architectures have evolved into the multi-core era and applications are expected to make full use of all available resources. Applications in the data acquisition domain based on the standard socket library running in a Non-Uniform Memory Access (NUMA) architecture are unable to reach full efficiency and scalability without the software being adequately aware about the IRQ (Interrupt Request), CPU and memory affinities. During the first long shutdown of LHC, the CMS DAQ system is going to be upgraded for operation from 2015 onwards and a new software component has been designed and developed in the CMS online framework for transferring data with sockets. This software attempts to wrap the low-level socket library to ease higher-level programming with an API based on an asynchronous event driven model similar to the DAT uDAPL API. It is an event-based application with NUMA optimizations, that allows for a high throughput of data across a large distributed system. This paper describes the architecture, the technologies involved and the performance measurements of the software in the context of the CMS distributed event building.
The High Luminosity LHC (HL-LHC) will start operating in 2027 after the third Long Shutdown (LS3), and is designed to provide an ultimate instantaneous luminosity of 7:5 × 10
34
cm
−2
s
−1
, at the ...price of extreme pileup of up to 200 interactions per crossing. The number of overlapping interactions in HL-LHC collisions, their density, and the resulting intense radiation environment, warrant an almost complete upgrade of the CMS detector. The upgraded CMS detector will be read out by approximately fifty thousand highspeed front-end optical links at an unprecedented data rate of up to 80 Tb/s, for an average expected total event size of approximately 8 − 10 MB. Following the present established design, the CMS trigger and data acquisition system will continue to feature two trigger levels, with only one synchronous hardware-based Level-1 Trigger (L1), consisting of custom electronic boards and operating on dedicated data streams, and a second level, the High Level Trigger (HLT), using software algorithms running asynchronously on standard processors and making use of the full detector data to select events for offline storage and analysis. The upgraded CMS data acquisition system will collect data fragments for Level-1 accepted events from the detector back-end modules at a rate up to 750 kHz, aggregate fragments corresponding to individual Level- 1 accepts into events, and distribute them to the HLT processors where they will be filtered further. Events accepted by the HLT will be stored permanently at a rate of up to 7.5 kHz. This paper describes the baseline design of the DAQ and HLT systems for the Phase-2 of CMS.
During Run-1 of the LHC, many operational procedures have been automated in the run control system of the Compact Muon Solenoid (CMS) experiment. When detector high voltages are ramped up or down or ...upon certain beam mode changes of the LHC, the DAQ system is automatically partially reconfigured with new parameters. Certain types of errors such as errors caused by single-event upsets may trigger an automatic recovery procedure. Furthermore, the top-level control node continuously performs cross-checks to detect sub-system actions becoming necessary because of changes in configuration keys, changes in the set of included front-end drivers or because of potential clock instabilities. The operator is guided to perform the necessary actions through graphical indicators displayed next to the relevant command buttons in the user interface. Through these indicators, consistent configuration of CMS is ensured. However, manually following the indicators can still be inefficient at times. A new assistant to the operator has therefore been developed that can automatically perform all the necessary actions in a streamlined order. If additional problems arise, the new assistant tries to automatically recover from these. With the new assistant, a run can be started from any state of the sub-systems with a single click. An ongoing run may be recovered with a single click, once the appropriate recovery action has been selected. We review the automation features of CMS Run Control and discuss the new assistant in detail including first operational experience.