The upgraded LHCb detector, due to start datataking in 2022, will have to process an average data rate of 4 TB/s in real time. Because LHCb’s physics objectives require that the full detector ...information for every LHC bunch crossing is read out and made available for real-time processing, this bandwidth challenge is equivalent to that of the ATLAS and CMS HL-LHC software read-out, but deliverable five years earlier. Over the past six years, the LHCb collaboration has undertaken a bottom-up rewrite of its software infrastructure, pattern recognition, and selection algorithms to make them better able to efficiently exploit modern highly parallel computing architectures. We review the impact of this reoptimization on the energy efficiency of the realtime processing software and hardware which will be used for the upgrade of the LHCb detector. We also review the impact of the decision to adopt a hybrid computing architecture consisting of GPUs and CPUs for the real-time part of LHCb’s future data processing. We discuss the implications of these results on how LHCb’s real-time power requirements may evolve in the future, particularly in the context of a planned second upgrade of the detector.
Real-time data processing is a central aspect of particle physics experiments with high requirements on computing resources. The LHCb experiment must cope with the 30 million proton-proton bunches ...collision per second rate of the Large Hadron Collider (LHC), producing \(10^9\) particles/s. The large input data rate of 32 Tb/s needs to be processed in real time by the LHCb trigger system, which includes both reconstruction and selection algorithms to reduce the number of saved events. The trigger system is implemented in two stages and deployed in a custom data centre. We present Looking Forward, a high-throughput track following algorithm designed for the first stage of the LHCb trigger and optimised for GPUs. The algorithm focuses on the reconstruction of particles traversing the whole LHCb detector and is developed to obtain the best physics performance while respecting the throughput limitations of the trigger. The physics and computing performances are discussed and validated with simulated samples.
The reconstruction of particle trajectories, tracking, is a central process in the reconstruction of particle collisions in High Energy Physics detectors. At the LHCb detector in the Large Hadron ...Collider, bunches of particles collide 30 million times per second. These collisions produce about 10^9 particle trajectories per second that need to be reconstructed in real time, in order to filter and store data. Upcoming improvements in the LHCb detector will deprecate the hardware filter in favour of a full software filter, posing a computing challenge that requires a renovation of current algorithms and the underlying hardware. We present Search by triplet, a local tracking algorithm optimized for parallel architectures. We design our algorithm reducing Read-After-Write dependencies as well as conditional branches, incrementing the potential for parallelization. We analyze the complexity of our algorithm and validate our results. We show the scaling of our algorithm for an increasing number of collision events. We show sustained tests for our algorithm sequence given a simulated dataflow. We develop CPU and GPU implementations of our work, and hide the transmission times between device and host by executing a multi-stream pipeline. Our results provide a reliable basis for an informed assessment on the feasibility of LHCb event reconstruction on parallel architectures, enabling us to develop cost models for upcoming technology upgrades. The created software infrastructure is extensible and permits the addition of subsequent reconstruction algorithms.
The LHCb detector is due to be upgraded for processing high-luminosity collisions, which will increase data bandwidth to the event filter farm from 100 GB/s to 4 TB/s, encouraging us to look for new ...ways of accelerating Online reconstruction. The Coprocessor Manager is a new framework for integrating LHCb's existing computation pipelines with massively parallel algorithms running on GPUs and other accelerators. This paper describes the system and analyzes its performance.
A GPU offloading mechanism for LHC b Badalov, Alexey; Perez, Daniel Hugo Campora; Zvyagin, Alexander ...
Journal of physics. Conference series,
06/2014, Letnik:
513, Številka:
5
Journal Article
Millions of particles are collided every second at the LHCb detector placed inside the Large Hadron Collider at CERN. The particles produced as a result of these collisions pass through various ...detecting devices which will produce a combined raw data rate of up to 40 Tbps by 2021. These data will be fed through a data acquisition system which reconstructs individual particles and filters the collision events in real time. This process will occur in a heterogeneous farm employing exclusively off-the-shelf CPU and GPU hardware, in a two stage process known as High Level Trigger. The reconstruction of charged particle trajectories in physics detectors, also referred to as track reconstruction or tracking, determines the position, charge and momentum of particles as they pass through detectors. The Vertex Locator subdetector (VELO) is the closest such detector to the beamline, placed outside of the region where the LHCb magnet produces a sizable magnetic field. It is used to reconstruct straight particle trajectories which serve as seeds for reconstruction of other subdetectors and to locate collision vertices. The VELO subdetector will detect up to 1000 million particles every second, which need to be reconstructed in real time in the High Level Trigger. We present Search by triplet, an efficient track reconstruction algorithm. Our algorithm is designed to run efficiently across parallel architectures. We extend on previous work and explain the algorithm evolution since its inception. We show the scaling of our algorithm under various situations, and analyze its amortized time in terms of complexity for each of its constituent parts and profile its performance. Our algorithm is the current state-of-the-art in VELO track reconstruction on SIMT architectures, and we qualify its improvements over previous results.
High-energy physics is facing increasingly computational challenges in real-time event reconstruction for the near-future high-luminosity era. Using the LHCb vertex detector as a use-case, we explore ...a new algorithm for particle track reconstruction based on the minimisation of an Ising-like Hamiltonian with a linear algebra approach. The use of a classical matrix inversion technique results in tracking performance similar to the current state-of-the-art but with worse scaling complexity in time. To solve this problem, we also present an implementation as quantum algorithm, using the Harrow-Hassadim-Lloyd (HHL) algorithm: this approach can potentially provide an exponential speedup as a function of the number of input hits over its classical counterpart, in spite of limitations due to the well-known HHL Hamiltonian simulation and readout problems. The findings presented in this paper shed light on the potential of leveraging quantum computing for real-time particle track reconstruction in high-energy physics.
The LHCb experiment1 is preparing a major upgrade resulting in a need for a high-end network for a data acquisition system. Its capacity will grow up to a target speed of 40 Tb/s, aggregated by 500 ...nodes. This can only be achieved reasonably by using links capable of coping with 100 Gb/s line rates. The constantly increasing need for more and more bandwidth has initiated the development of several 100 Gigabit/s networks. There are 3 candidates on the horizon, which need to be considered: Intel® Omni-Path, 100G Ethernet and EDR InfiniBand. We present test results with such links both using standard benchmarks (e.g. iperf) and using a custom built benchmark called LHCB-DAQPIPE. The key benefit of these measurements is that we can get to know better the behavior of the system in the early development stage, thus we can find out the limitations of the different network components. This can give an idea as to whether there is a need for more focus on some elements, which can be optimized in the future.
The upgraded LHCb detector, due to start datataking in 2022, will have to process an average data rate of 4~TB/s in real time. Because LHCb's physics objectives require that the full detector ...information for every LHC bunch crossing is read out and made available for real-time processing, this bandwidth challenge is equivalent to that of the ATLAS and CMS HL-LHC software read-out, but deliverable five years earlier. Over the past six years, the LHCb collaboration has undertaken a bottom-up rewrite of its software infrastructure, pattern recognition, and selection algorithms to make them better able to efficiently exploit modern highly parallel computing architectures. We review the impact of this reoptimization on the energy efficiency of the real-time processing software and hardware which will be used for the upgrade of the LHCb detector. We also review the impact of the decision to adopt a hybrid computing architecture consisting of GPUs and CPUs for the real-time part of LHCb's future data processing. We discuss the implications of these results on how LHCb's real-time power requirements may evolve in the future, particularly in the context of a planned second upgrade of the detector.
Real-time data processing is one of the central processes of particle physics experiments which require large computing resources. The LHCb (Large Hadron Collider beauty) experiment will be upgraded ...to cope with a particle bunch collision rate of 30 million times per second, producing \(10^9\) particles/s. 40 Tbits/s need to be processed in real-time to make filtering decisions to store data. This poses a computing challenge that requires exploration of modern hardware and software solutions. We present Compass, a particle tracking algorithm and a parallel raw input decoding optimised for GPUs. It is designed for highly parallel architectures, data-oriented and optimised for fast and localised data access. Our algorithm is configurable, and we explore the trade-off in computing and physics performance of various configurations. A CPU implementation that delivers the same physics performance as our GPU implementation is presented. We discuss the achieved physics performance and validate it with Monte Carlo simulated data. We show a computing performance analysis comparing consumer and server grade GPUs, and a CPU. We show the feasibility of using a full GPU decoding and particle tracking algorithm for high-throughput particle trajectories reconstruction, where our algorithm improves the throughput up to 7.4\(\times\) compared to the LHCb baseline.