For Run 2 of the LHC, LHCb is replacing a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert ...our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late forking, NUMA balancing and optimal number of threads, i.e. it automatically optimises box-level performance. We have run this procedure on a wide range of Haswell-E CPUs and numerous other architectures from both Intel and AMD, including also the latest Intel micro-blade servers. We present results in terms of performance, power consumption, overheads and relative cost.
The 2020 upgrade of the LHCb detector will vastly increase the rate of collisions the online system needs to process in software in order to filter events in real-time. 30 million collisions per ...second will pass through a selection chain where each step is executed conditional to its prior acceptance. The Kalman filter is a process of the event reconstruction that, due to its time characteristics and early execution in the selection chain, consumes 40% of the whole reconstruction time in the current trigger software. This makes it a time-critical component as the LHCb trigger evolves into a full software trigger in the upgrade. The algorithm Cross Kalman allows performance tests across a variety of architectures, including multi and many-core platforms, and has been successfully integrated and validated in the LHCb codebase. Since its inception, new hardware architectures have become available exposing features that require fine-grained tuning in order to fully utilize their resources. In this paper we present performance benchmarks and explore the Intel® Skylake and Intel® Knights Landing architectures in depth. We determine the performance gain over previous architectures and show that the efficiency of our implementation is close to the maximum attainable given the mathematical formulation of our problem.
The LHCb DAQ Network is a real time high performance network, in which 350 data sources send data over a Gigabit Ethernet LAN to more than 1500 receiving nodes. The aggregated throughput of the ...application, called Event Building, is more than 60 Gbps. The protocol employed by LHCb makes the sending nodes transmit simultaneously portions of events to one receiving node at a time, which is selected using a credit-token scheme. The resulting traffic is very bursty and sensitive to irregularities in the temporal distribution of packet-bursts to the same destination or region of the network. In order to study the relevant properties of such a dataflow, a non-disruptive monitoring setup based on a networking capable FPGA (Netfpga) has been deployed. The Netfpga allows order of hundred nano-second precise time-stamping of packets. We study in detail the timing structure of the Event Building communication, and we identify potential effects of micro-bursts like buffer packet drops or jitter.
The LHCb Data Acquisition during LHC Run 1 Alessio, F; Brarda, L; Bonaccorsi, E ...
Journal of physics. Conference series,
01/2014, Letnik:
513, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The LHCb Data Acquisition system reads data from over 300 read-out boards and distributes them to more than 1500 event-filter servers. It uses a simple push-protocol over Gigabit Ethernet. After ...filtering, the data is consolidated into files for permanent storage using a SAN-based storage system. Since the beginning of data-taking many lessons have been learned and the reliability and robustness of the system has been greatly improved. We report on these changes and improvements, their motivation and how we intend to develop the system for Run 2. We also will report on how we try to optimise the usage of CPU resources during the running of the LHC ("deferred triggering") and the implications on the data acquisition.
During the data taking process in the LHC at CERN, millions of collisions are recorded every second by the LHCb Detector. The LHCb Online computing farm, counting around 15000 cores, is dedicated to ...the reconstruction of the events in real-time, in order to filter those with interesting Physics. The ones kept are later analysed Offline in a more precise fashion on the Grid. This imposes very stringent requirements on the reconstruction software, which has to be as efficient as possible. Modern CPUs support so-called vector-extensions, which extend their Instruction Sets, allowing for concurrent execution across functional units. Several libraries expose the Single Instruction Multiple Data programming paradigm to issue these instructions. The use of vectorisation in our codebase can provide performance boosts, leading ultimately to Physics reconstruction enhancements. In this paper, we present vectorisation studies of significant reconstruction algorithms. A variety of vectorisation libraries are analysed and compared in terms of design, maintainability and performance. We also present the steps taken to systematically measure the performance of the released software, to ensure the consistency of the run-time of the vectorised software.
We describe a fully GPU-based implementation of the first level trigger for the upgrade of the LHCb detector, due to start data taking in 2021. We demonstrate that our implementation, named Allen, ...can process the 40 Tbit/s data rate of the upgraded LHCb detector and perform a wide variety of pattern recognition tasks. These include finding the trajectories of charged particles, finding proton–proton collision points, identifying particles as hadrons or muons, and finding the displaced decay vertices of long-lived particles. We further demonstrate that Allen can be implemented in around 500 scientific or consumer GPU cards, that it is not I/O bound, and can be operated at the full LHC collision rate of 30 MHz. Allen is the first complete high-throughput GPU trigger proposed for a HEP experiment.