A new gridding technique for the solution of partial differential equations in spherical geometry is presented. The method is based on a decomposition of the sphere into six identical regions, ...obtained by projecting the sides of a circumscribed cube onto a spherical surface. By choosing the coordinate lines on each region to be arcs of great circles, one obtains six coordinate systems which are free of any singularity and define the same metric. Taking full advantage of the symmetry properties of the decomposition, a variation of the composite mesh finite difference method can be applied to couple the six grids and obtain, with a high degree of efficiency, very accurate numerical solutions of partial differential equations on the sphere. The advantages of this new technique over both spectral and uniform longitude–latitude grid point methods are discussed in the context of applications on serial and parallel architectures. We present results of two test cases for numerical approximations to the shallow water equations in spherical geometry: the linear advection of a cosine bell and the nonlinear evolution of a Rossby–Haurwitz wave. Performance analysis for this latter case indicates that the new method can provide, with substantial savings in execution times, numerical solutions which are as accurate as those obtainable with the spectral transform method.
The use of GPUs to implement general purpose computational tasks, known as GPGPU since fifteen years ago, has reached maturity. Applications take advantage of the parallel architectures of these ...devices in many different domains. Over the last few years several works have demonstrated the effectiveness of the integration of GPU-based systems in the high level trigger of various HEP experiments. On the other hand, the use of GPUs in the DAQ and low level trigger systems, characterized by stringent real-time constraints, poses several challenges. In order to achieve such a goal we devised NaNet, a FPGA-based PCI-Express Network Interface Card design capable of direct (zero-copy) data transferring with CPU and GPU (GPUDirect) while online processing incoming and outgoing data streams. The board provides as well support for multiple link technologies (1/10/40GbE and custom ones). The validity of our approach has been tested in the context of the NA62 CERN experiment, harvesting the computing power of last generation NVIDIA Pascal GPUs and of the FPGA hosted by NaNet to build in real-time refined physics-related primitives for the RICH detector (i.e. the Cerenkov rings parameters) that enable the building of more stringent conditions for data selection in the low level trigger.
Graphical processors for HEP trigger systems Ammendola, R.; Biagioni, A.; Chiozzi, S. ...
Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment,
02/2017, Letnik:
845
Journal Article
Recenzirano
General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. ...With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
Graphics Processing Units for HEP trigger systems Ammendola, R.; Bauce, M.; Biagioni, A. ...
Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment,
07/2016, Letnik:
824
Journal Article
Recenzirano
General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of ...such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
In the attempt to develop an interconnection architecture optimized for hybrid HPC systems dedicated to scientific computing, we designed APEnet+, a point-to-point, low-latency and high-performance ...network controller supporting 6 fully bidirectional off-board links over a 3D torus topology. The first release of APEnet+ (named V4) was a board based on a 40 nm Altera FPGA, integrating 6 channels at 34 Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. It has been the first-of-its-kind device to implement an RDMA protocol to directly read write data from to Fermi and Kepler NVIDIA GPUs using NVIDIA peer-to-peer and GPUDirect RDMA protocols, obtaining real zero-copy GPU-to-GPU transfers over the network. The latest generation of APEnet+ systems (now named V5) implements a PCIe Gen3 x8 host interface on a 28 nm Altera Stratix V FPGA, with multi-standard fast transceivers (up to 14.4 Gbps) and an increased amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols. Herein we present the APEnet+ V5 architecture, the status of its hardware and its system software design. Both its Linux Device Driver and the low-level libraries have been redeveloped to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware software co-design.
With processor architecture evolution, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters extended the reach of HPC from its roots in modelling and ...simulation of complex physical systems to a broader range of industries, from biotechnology, cloud computing, computer analytics and big data challenges to manufacturing sectors. In this perspective, the near future HPC systems can be envisioned as composed of millions of low-power computing cores, densely packed - meaning cooling by appropriate technology - with a tightly interconnected, low latency and high performance network and equipped with a distributed storage architecture. Each of these features - dense packing, distributed storage and high performance interconnect - represents a challenge, made all the harder by the need to solve them at the same time. These challenges lie as stumbling blocks along the road towards Exascale-class systems; the ExaNeSt project acknowledges them and tasks itself with investigating ways around them.
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading ...module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.
A GPU-based low level (L0) trigger is currently integrated in the experimental setup of the RICH detector of the NA62 experiment to assess the feasibility of building more refined physics-related ...trigger primitives and thus improve the trigger discriminating power. To ensure the real-time operation of the system, a dedicated data transport mechanism has been implemented: an FPGA-based Network Interface Card (NaNet-10) receives data from detectors and forwards them with low, predictable latency to the memory of the GPU performing the trigger algorithms. Results of the ring-shaped hit patterns reconstruction will be reported and discussed.
High-sensitivity double-gap phenolic resistive plate chambers (RPCs) are studied for the Phase-2 upgrade of the Compact Muon Solenoid (CMS) muon system at high pseudorapidity
η
. Whereas the present ...CMS RPCs have a gas gap thickness of 2 mm, we propose to use thinner gas gaps, which will improve the performance of these RPCs. To validate this proposal, we constructed double-gap RPCs with two different gap thicknesses of 1.2 and 1.4 mm by using high-pressure laminated plates having a mean resistivity of about 5 × 10
10
Ω-cm. This paper presents test results using cosmic muons and
137
Cs gamma rays. The rate capabilities of these thin-gap RPCs measured with the gamma source exceed the maximum rate expected in the new high-
η
endcap RPCs planned for future Phase-2 runs of the Large Hadron Collider (LHC).