The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events of 2MB at a rate of 100 kHz. The event builder collects event fragments from about 750 ...sources and assembles them into complete events which are then handed to the High-Level Trigger (HLT) processes running on
O
(1000) computers. The aging eventbuilding hardware will be replaced during the long shutdown 2 of the LHC taking place in 2019/20. The future data networks will be based on 100 Gb/s interconnects using Ethernet and Infiniband technologies. More powerful computers may allow to combine the currently separate functionality of the readout and builder units into a single I/O processor handling simultaneously 100 Gb/s of input and output traffic. It might be beneficial to preprocess data originating from specific detector parts or regions before handling it to generic HLT processors. Therefore, we will investigate how specialized coprocessors, e.g. GPUs, could be integrated into the event builder. We will present the envisioned changes to the event-builder compared to today’s system. Initial measurements of the performance of the data networks under the event-building traffic pattern will be shown. Implications of a folded network architecture for the event building and corresponding changes to the software implementation will be discussed.
The CMS DAQ Pinball Machine Cornu, Cynthia; Deldicque, Christian; Gladki, Maciej ...
EPJ Web of Conferences,
01/2020, Letnik:
245
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
We present an interactive game for up to seven players that demonstrates the challenges of on-line event selection at the Compact Muon Solenoid (CMS) experiment to the public. The game - in the shape ...of a popular classic pinball machine - was conceived and prototyped by an interdisciplinary team of graphic designers, physicists and engineers at the CMS Create hackathon in 2016. Having won the competition, the prototype was turned into a fully working machine that is now exhibited on the CMS visitors’ path. Teams of 2-7 visitors can compete with one another to collect as many interesting events as possible within a simulated LHC fill. In a fun and engaging way, the game conveys concepts such as multi-level triggering, pipelined processing, event building, the importance of purity in event selection and more subtle details such as dead time. The multi-player character of the game corresponds to the distributed nature of the actual trigger and data acquisition system of the experiment. We present the concept of the game, its design and its technical implementation centered around an Arduino micro-controller controlling 700 RGB LEDs and a sound subsystem running on a Mac mini.
The part of the CMS Data Acquisition (DAQ) system responsible for data readout and event building is a complex network of interdependent distributed applications. To ensure successful data taking, ...these programs have to be constantly monitored in order to facilitate the timeliness of necessary corrections in case of any deviation from specified behaviour. A large number of diverse monitoring data samples are periodically collected from multiple sources across the network. Monitoring data are kept in memory for online operations and optionally stored on disk for post-mortem analysis. We present a generic, reusable solution based on an open source NoSQL database, Elasticsearch, which is fully compatible and non-intrusive with respect to the existing system. The motivation is to benefit from an offthe-shelf software to facilitate the development, maintenance and support efforts. Elasticsearch provides failover and data redundancy capabilities as well as a programming language independent JSON-over-HTTP interface. The possibility of horizontal scaling matches the requirements of a DAQ
monitoring system. The data load from all sources is balanced by redistribution over an Elasticsearch cluster that can be hosted on a computer cloud. In order to achieve the necessary robustness and to validate the scalability of the approach the above monitoring solution currently runs in parallel with an existing in-house developed DAQ monitoring system.
The data acquisition (DAQ) system of the Compact Muon Solenoid (CMS) at CERN reads out the detector at the level-1 trigger accept rate of 100 kHz, assembles events with a bandwidth of 200 GB/s, ...provides these events to the high level-trigger running on a farm of about 30k cores and records the accepted events. Comprising custom-built and cutting edge commercial hardware and several 1000 instances of software applications, the DAQ system is complex in itself and failures cannot be completely excluded. Moreover, problems in the readout of the detectors,in the first level trigger system or in the high level trigger may provoke anomalous behaviour of the DAQ systemwhich sometimes cannot easily be differentiated from a problem in the DAQ system itself. In order to achieve high data taking efficiency with operators from the entire collaboration and without relying too heavily on the on-call experts, an expert system, the DAQ-Expert, has been developed that can pinpoint the source of most failures and give advice to the shift crew on how to recover in the quickest way. The DAQ-Expert constantly analyzes monitoring data from the DAQ system and the high level trigger by making use of logic modules written in Java that encapsulate the expert knowledge about potential operational problems. The results of the reasoning are presented to the operator in a web-based dashboard, may trigger sound alerts in the control room and are archived for post-mortem analysis - presented in a web-based timeline browser. We present the design of the DAQ-Expert and report on the operational experience since 2017, when it was first put into production.
The Compact Muon Solenoid (CMS) is one of the experiments at the CERN Large Hadron Collider (LHC). The CMS Online Monitoring system (OMS) is an upgrade and successor to the CMS Web-Based Monitoring ...(WBM)system, which is an essential tool for shift crew members, detector subsystem experts, operations coordinators, and those performing physics analyses. The CMS OMS is divided into aggregation and presentation layers. Communication between layers uses RESTful JSON:API compliant requests. The aggregation layer is responsible for collecting data from heterogeneous sources, storage of transformed and pre-calculated (aggregated) values and exposure of data via the RESTful API. The presentation layer displays detector information via a modern, user-friendly and customizable web interface. The CMS OMS user interface is composed of a set of cutting-edge software frameworks and tools to display non-event data to any authenticated CMS user worldwide. The web interface tree-like component structure comprises (top-down): workspaces, folders, pages, controllers and portlets. A clear hierarchy gives the required flexibility and control for content organization. Each bottom element instantiates a portlet and is a reusable component that displays a single aspect of data, like a table, a plot, an article, etc. Pages consist of multiple different portlets and can be customized at runtime by using a drag-and-drop technique. This is how a single page can easily include information from multiple online sources. Different pages give access to a summary of the current status of the experiment, as well as convenient access to historical data. This paper describes the CMS OMS architecture, core concepts and technologies of the presentation layer.
The primary goal of the online cluster of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions ...in the High Level Trigger (HLT) farm for offline storage. With more than 1500 nodes and a capacity of about 850 kHEPSpecInt06, the HLT machines represent similar computing capacity of all the CMS Tier1 Grid sites together. Moreover, it is currently connected to the CERN IT datacenter via a dedicated 160 Gbps network connection and hence can access the remote EOS based storage with a high bandwidth. In the last few years, a cloud overlay based on OpenStack has been commissioned to use these resources for the WLCG when they are not needed for data taking. This online cloud facility was designed for parasitic use of the HLT, which must never interfere with its primary function as part of the DAQ system. It also allows to abstract from the different types of machines and their underlying segmented networks. During the LHC technical stop periods, the HLT cloud is set to its static mode of operation where it acts like other grid facilities. The online cloud was also extended to make dynamic use of resources during periods between LHC fills. These periods are a-priori unscheduled and of undetermined length, typically of several hours, once or more a day. For that, it dynamically follows LHC beam states and hibernates Virtual Machines (VM) accordingly. Finally, this work presents the design and implementation of a mechanism to dynamically ramp up VMs when the DAQ load on the HLT reduces towards the end of the fill.
Performance of the CMS Event Builder Andre, J-M; Behrens, U; Branson, J ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of O(100GB/s) to ...the high-level trigger farm. The DAQ architecture is based on state-of-the-art network technologies for the event building. For the data concentration, 10/40 Gbit/s Ethernet technologies are used together with a reduced TCP/IP protocol implemented in FPGA for a reliable transport between custom electronics and commercial computing hardware. A 56 Gbit/s Infiniband FDR Clos network has been chosen for the event builder. This paper presents the implementation and performance of the event-building system.
The High Luminosity LHC (HL-LHC) will start operating in 2027 after the third Long Shutdown (LS3), and is designed to provide an ultimate instantaneous luminosity of 7:5 × 10
34
cm
−2
s
−1
, at the ...price of extreme pileup of up to 200 interactions per crossing. The number of overlapping interactions in HL-LHC collisions, their density, and the resulting intense radiation environment, warrant an almost complete upgrade of the CMS detector. The upgraded CMS detector will be read out by approximately fifty thousand highspeed front-end optical links at an unprecedented data rate of up to 80 Tb/s, for an average expected total event size of approximately 8 − 10 MB. Following the present established design, the CMS trigger and data acquisition system will continue to feature two trigger levels, with only one synchronous hardware-based Level-1 Trigger (L1), consisting of custom electronic boards and operating on dedicated data streams, and a second level, the High Level Trigger (HLT), using software algorithms running asynchronously on standard processors and making use of the full detector data to select events for offline storage and analysis. The upgraded CMS data acquisition system will collect data fragments for Level-1 accepted events from the detector back-end modules at a rate up to 750 kHz, aggregate fragments corresponding to individual Level- 1 accepts into events, and distribute them to the HLT processors where they will be filtered further. Events accepted by the HLT will be stored permanently at a rate of up to 7.5 kHz. This paper describes the baseline design of the DAQ and HLT systems for the Phase-2 of CMS.
The efficiency of the Data Acquisition (DAQ) of the Compact Muon Solenoid (CMS) experiment for LHC Run 2 is constantly being improved. A significant factor affecting the data taking efficiency is the ...experience of the DAQ operator. One of the main responsibilities of the DAQ operator is to carry out the proper recovery procedure in case of failure of data-taking. At the start of Run 2, understanding the problem and finding the right remedy could take a considerable amount of time (up to many minutes). Operators heavily relied on the support of on-call experts, also outside working hours. Wrong decisions due to time pressure sometimes lead to an additional overhead in recovery time. To increase the efficiency of CMS data-taking we developed a new expert system, the DAQExpert, which provides shifters with optimal recovery suggestions instantly when a failure occurs. DAQExpert is a web application analyzing frequently updating monitoring data from all DAQ components and identifying problems based on expert knowledge expressed in small, independent logic-modules written in Java. Its results are presented in real-time in the control room via a web-based GUI and a sound-system in a form of short description of the current failure, and steps to recover.