The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events of 2MB at a rate of 100 kHz. The event builder collects event fragments from about 750 ...sources and assembles them into complete events which are then handed to the High-Level Trigger (HLT) processes running on
O
(1000) computers. The aging eventbuilding hardware will be replaced during the long shutdown 2 of the LHC taking place in 2019/20. The future data networks will be based on 100 Gb/s interconnects using Ethernet and Infiniband technologies. More powerful computers may allow to combine the currently separate functionality of the readout and builder units into a single I/O processor handling simultaneously 100 Gb/s of input and output traffic. It might be beneficial to preprocess data originating from specific detector parts or regions before handling it to generic HLT processors. Therefore, we will investigate how specialized coprocessors, e.g. GPUs, could be integrated into the event builder. We will present the envisioned changes to the event-builder compared to today’s system. Initial measurements of the performance of the data networks under the event-building traffic pattern will be shown. Implications of a folded network architecture for the event building and corresponding changes to the software implementation will be discussed.
The part of the CMS Data Acquisition (DAQ) system responsible for data readout and event building is a complex network of interdependent distributed applications. To ensure successful data taking, ...these programs have to be constantly monitored in order to facilitate the timeliness of necessary corrections in case of any deviation from specified behaviour. A large number of diverse monitoring data samples are periodically collected from multiple sources across the network. Monitoring data are kept in memory for online operations and optionally stored on disk for post-mortem analysis. We present a generic, reusable solution based on an open source NoSQL database, Elasticsearch, which is fully compatible and non-intrusive with respect to the existing system. The motivation is to benefit from an offthe-shelf software to facilitate the development, maintenance and support efforts. Elasticsearch provides failover and data redundancy capabilities as well as a programming language independent JSON-over-HTTP interface. The possibility of horizontal scaling matches the requirements of a DAQ
monitoring system. The data load from all sources is balanced by redistribution over an Elasticsearch cluster that can be hosted on a computer cloud. In order to achieve the necessary robustness and to validate the scalability of the approach the above monitoring solution currently runs in parallel with an existing in-house developed DAQ monitoring system.
The primary goal of the online cluster of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions ...in the High Level Trigger (HLT) farm for offline storage. With more than 1500 nodes and a capacity of about 850 kHEPSpecInt06, the HLT machines represent similar computing capacity of all the CMS Tier1 Grid sites together. Moreover, it is currently connected to the CERN IT datacenter via a dedicated 160 Gbps network connection and hence can access the remote EOS based storage with a high bandwidth. In the last few years, a cloud overlay based on OpenStack has been commissioned to use these resources for the WLCG when they are not needed for data taking. This online cloud facility was designed for parasitic use of the HLT, which must never interfere with its primary function as part of the DAQ system. It also allows to abstract from the different types of machines and their underlying segmented networks. During the LHC technical stop periods, the HLT cloud is set to its static mode of operation where it acts like other grid facilities. The online cloud was also extended to make dynamic use of resources during periods between LHC fills. These periods are a-priori unscheduled and of undetermined length, typically of several hours, once or more a day. For that, it dynamically follows LHC beam states and hibernates Virtual Machines (VM) accordingly. Finally, this work presents the design and implementation of a mechanism to dynamically ramp up VMs when the DAQ load on the HLT reduces towards the end of the fill.
The data acquisition (DAQ) system of the Compact Muon Solenoid (CMS) at CERN reads out the detector at the level-1 trigger accept rate of 100 kHz, assembles events with a bandwidth of 200 GB/s, ...provides these events to the high level-trigger running on a farm of about 30k cores and records the accepted events. Comprising custom-built and cutting edge commercial hardware and several 1000 instances of software applications, the DAQ system is complex in itself and failures cannot be completely excluded. Moreover, problems in the readout of the detectors,in the first level trigger system or in the high level trigger may provoke anomalous behaviour of the DAQ systemwhich sometimes cannot easily be differentiated from a problem in the DAQ system itself. In order to achieve high data taking efficiency with operators from the entire collaboration and without relying too heavily on the on-call experts, an expert system, the DAQ-Expert, has been developed that can pinpoint the source of most failures and give advice to the shift crew on how to recover in the quickest way. The DAQ-Expert constantly analyzes monitoring data from the DAQ system and the high level trigger by making use of logic modules written in Java that encapsulate the expert knowledge about potential operational problems. The results of the reasoning are presented to the operator in a web-based dashboard, may trigger sound alerts in the control room and are archived for post-mortem analysis - presented in a web-based timeline browser. We present the design of the DAQ-Expert and report on the operational experience since 2017, when it was first put into production.
The Compact Muon Solenoid (CMS) is one of the experiments at the CERN Large Hadron Collider (LHC). The CMS Online Monitoring system (OMS) is an upgrade and successor to the CMS Web-Based Monitoring ...(WBM)system, which is an essential tool for shift crew members, detector subsystem experts, operations coordinators, and those performing physics analyses. The CMS OMS is divided into aggregation and presentation layers. Communication between layers uses RESTful JSON:API compliant requests. The aggregation layer is responsible for collecting data from heterogeneous sources, storage of transformed and pre-calculated (aggregated) values and exposure of data via the RESTful API. The presentation layer displays detector information via a modern, user-friendly and customizable web interface. The CMS OMS user interface is composed of a set of cutting-edge software frameworks and tools to display non-event data to any authenticated CMS user worldwide. The web interface tree-like component structure comprises (top-down): workspaces, folders, pages, controllers and portlets. A clear hierarchy gives the required flexibility and control for content organization. Each bottom element instantiates a portlet and is a reusable component that displays a single aspect of data, like a table, a plot, an article, etc. Pages consist of multiple different portlets and can be customized at runtime by using a drag-and-drop technique. This is how a single page can easily include information from multiple online sources. Different pages give access to a summary of the current status of the experiment, as well as convenient access to historical data. This paper describes the CMS OMS architecture, core concepts and technologies of the presentation layer.
Status of the CMS Detector Control System Bauer, Gerry; Behrens, Ulf; Bowen, Matthew ...
Journal of physics. Conference series,
01/2012, Letnik:
396, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The Compact Muon Solenoid (CMS) is a CERN multi-purpose experiment that exploits the physics of the Large Hadron Collider (LHC). The Detector Control System (DCS) is responsible for ensuring the ...safe, correct and efficient operation of the experiment, and has contributed to the recording of high quality physics data. The DCS is programmed to automatically react to the LHC operational mode. CMS sub-detectors’ bias voltages are set depending on the machine mode and particle beam conditions. An operator provided with a small set of screens supervises the system status summarized from the approximately 6M monitored parameters. Using the experience of nearly two years of operation with beam the DCS automation software has been enhanced to increase the system efficiency by minimizing the time required by sub-detectors to prepare for physics data taking. From the infrastructure point of view the DCS will be subject to extensive modifications in 2012. The current rack mounted control PCs will be replaced by a redundant pair of DELL Blade systems. These blade servers are a high-density modular solution that incorporates servers and networking into a single chassis that provides shared power, cooling and management. This infrastructure modification associated with the migration to blade servers will challenge the DCS software and hardware factorization capabilities. The on-going studies for this migration together with the latest modifications are discussed in the paper.
The supervisory level of the Detector Control System (DCS) of the CMS experiment is implemented using Finite State Machines (FSM), which model the behaviours and control the operations of all the ...sub-detectors and support services. The FSM tree of the whole CMS experiment consists of more than 30.000 nodes. An analysis of a system of such size is a complex task but is a crucial step towards the improvement of the overall performance of the FSM system. This paper presents the analysis of the CMS FSM system using the micro Common Representation Language 2 (mcrl2) methodology. Individual mCRL2 models are obtained for the FSM systems of the CMS sub-detectors using the ASF+SDF automated translation tool. Different mCRL2 operations are applied to the mCRL2 models. A mCRL2 simulation tool is used to closer examine the system. Visualization of a system based on the exploration of its state space is enabled with a mCRL2 tool. Requirements such as command and state propagation are expressed using modal mu-calculus and checked using a model checking algorithm. For checking local requirements such as endless loop freedom, the Bounded Model Checking technique is applied. This paper discusses these analysis techniques and presents the results of their application on the CMS FSM system.
The Compact Muon Solenoid (CMS) experiment has developed an electrical implementation of the S-LINK64 extension (Simple Link Interface 64 bit) operating at 400 MB/s in order to read out the detector. ...This paper studies a possible replacement of the existing S-LINK64 implementation by an optical link, based on 10 Gigabit Ethernet in order to fulfil larger throughput, replace aging hardware and simplify an architecture. A prototype transmitter unit has been developed based on the FPGA Altera PCI Express Development Kit with a custom firmware. A standard PC has been acted as receiving unit. The data transfer has been implemented on a stack of protocols: RDP over IP over Ethernet. This allows receiving the data by standard hardware components like PCs or network switches and NICs. The first test proved that basic exchange of the packets between transmitter and receiving unit works. The paper summarizes the status of these studies.
The CMS detector control system (DCS) is responsible for controlling and monitoring the detector status and for the operation of all CMS sub detectors and infrastructure. This is required to ensure ...safe and efficient data taking so that high quality physics data can be recorded. The current system architecture is composed of more than 100 servers in order to provide the required processing resources. An optimization of the system software and hardware architecture is under development to ensure redundancy of all the controlled subsystems and to reduce any downtime due to hardware or software failures. The new optimized structure is based mainly on powerful and highly reliable blade servers and makes use of a fully redundant approach, guaranteeing high availability and reliability. The analysis of the requirements, the challenges, the improvements and the optimized system architecture as well as its specific hardware and software solutions are presented.
Gaseous detectors with sense wires are still in use today in small experiments as well as modern ones as those at the LHC. This short note is about the construction of a small wire chamber with ...limited resources, which could be used both as an educational tool and also as a tracker in small experiments. The particular detector type selected for this work is the so called "Delay Wire Chamber": it has only two output channels per plane and can be made fully gas tight for educational operations. The design can be made with free software tools, and the construction can be achieved by relatively simple means.