In the attempt to develop an interconnection architecture optimized for hybrid HPC systems dedicated to scientific computing, we designed APEnet+, a point-to-point, low-latency and high-performance ...network controller supporting 6 fully bidirectional off-board links over a 3D torus topology. The first release of APEnet+ (named V4) was a board based on a 40 nm Altera FPGA, integrating 6 channels at 34 Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. It has been the first-of-its-kind device to implement an RDMA protocol to directly read write data from to Fermi and Kepler NVIDIA GPUs using NVIDIA peer-to-peer and GPUDirect RDMA protocols, obtaining real zero-copy GPU-to-GPU transfers over the network. The latest generation of APEnet+ systems (now named V5) implements a PCIe Gen3 x8 host interface on a 28 nm Altera Stratix V FPGA, with multi-standard fast transceivers (up to 14.4 Gbps) and an increased amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols. Herein we present the APEnet+ V5 architecture, the status of its hardware and its system software design. Both its Linux Device Driver and the low-level libraries have been redeveloped to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware software co-design.
The use of GPUs to implement general purpose computational tasks, known as GPGPU since fifteen years ago, has reached maturity. Applications take advantage of the parallel architectures of these ...devices in many different domains. Over the last few years several works have demonstrated the effectiveness of the integration of GPU-based systems in the high level trigger of various HEP experiments. On the other hand, the use of GPUs in the DAQ and low level trigger systems, characterized by stringent real-time constraints, poses several challenges. In order to achieve such a goal we devised NaNet, a FPGA-based PCI-Express Network Interface Card design capable of direct (zero-copy) data transferring with CPU and GPU (GPUDirect) while online processing incoming and outgoing data streams. The board provides as well support for multiple link technologies (1/10/40GbE and custom ones). The validity of our approach has been tested in the context of the NA62 CERN experiment, harvesting the computing power of last generation NVIDIA Pascal GPUs and of the FPGA hosted by NaNet to build in real-time refined physics-related primitives for the RICH detector (i.e. the Cerenkov rings parameters) that enable the building of more stringent conditions for data selection in the low level trigger.
Graphical processors for HEP trigger systems Ammendola, R.; Biagioni, A.; Chiozzi, S. ...
Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment,
02/2017, Letnik:
845
Journal Article
Recenzirano
Odprti dostop
General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. ...With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
With processor architecture evolution, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters extended the reach of HPC from its roots in modelling and ...simulation of complex physical systems to a broader range of industries, from biotechnology, cloud computing, computer analytics and big data challenges to manufacturing sectors. In this perspective, the near future HPC systems can be envisioned as composed of millions of low-power computing cores, densely packed - meaning cooling by appropriate technology - with a tightly interconnected, low latency and high performance network and equipped with a distributed storage architecture. Each of these features - dense packing, distributed storage and high performance interconnect - represents a challenge, made all the harder by the need to solve them at the same time. These challenges lie as stumbling blocks along the road towards Exascale-class systems; the ExaNeSt project acknowledges them and tasks itself with investigating ways around them.
A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at ...CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.
Abstract
BACKGROUND
Glioblastoma Multiforme (GBM) is one of the most devastating cancer known. Despite decades of research, we still lack an efficient treatment. The heterogeneity in the cell-type ...composition along with the presence of a subpopulation of cells with high tumorigenic capacity named glioblastoma stem cells (GSCs), make GBM extremely hard to treat. An extensive body of works supports the hypothesis that an aberrant functional expression of membrane ion channels mediates the progression of solid cancer tumors. Potassium, calcium, and chloride channels have been largely correlated with carcinogenesis. However, little is known about the voltage-gated-sodium channel (Nav) in GBM. In fact, the role of this membrane ionic permeability in the GBM progression and relapse is yet to be unveiled.
MATERIAL AND METHODS
Experiments have been performed on human GSCs obtained from surgical specimen at the Neurosurgery Department of IRCCS-AOU San Marino IST (Genova, Italy), from patients who had not received therapies before intervention. The mRNA profile of the cell lines as well as the transcript expression of stemness markers were evaluated both in control condition and in the presence of the Nav channel blocker Tetrodotoxin (TTX, 30μM). The protein content for the stemness markers and their intermediates was quantified using Western-Blot analysis. Nav-mediated inward currents were recorded from single cells and measured in voltage clamp by applying consecutive voltage steps of +10 mV from a holding potential of -70 mV and up to +60 mV. Transient inward current was calculated on the peak subtracting the baseline leak currents. Current density (pA/pF) was calculated as the ratio between the peak current recorded at +20 mV and the capacitance of the cell. Resting membrane potential (RMP) was also assessed in each recorded cells.
RESULTS
We have identified a subpopulation of GBM cells (in GBM Proneuronal subtype) that functionally expressed TTX-sensitive inward currents. Transcriptomics investigation reveals the significant expression of SCN1A mRNA. We have shown that Nav density positively correlates with the RMP in GBM cells. Additionally, Nav blockade promotes glioma cell proliferation and G1/S accumulation. Pharmacological blockade of Nav-mediated currents has shown a significant impact on some stemness markers both at the mRNA and protein expression level. A regulatory downstream pathway, modulated by Nav has also been investigated.
CONCLUSION
Our evidences suggests that Nav-mediated currents is significantly expressed in a subpopulation of GBM cells. As a result, the present study intends to demonstrate how Nav plays a fundamental role during GBM progression and correlates with the tumor resistance to treatments.
NaNet is a modular design of a family of FPGA-based PCIe Network Interface Cards specialized for low-latency real-time operations. NaNet features a Network Interface module that implements RDMA-style ...communications both with the host (CPU) and the GPU accelerators memories (GPUDirect P2P/RDMA) relying on the services of a high performance PCIe Gen3 x8 core. NaNet I/O Interface is highly flexible and is designed for low and predictable communication latency: a dedicated stage manages the network stack protocol in the FPGA logic offloading the host operating system from this task and thus eliminating the associated process jitter effects. Between the two aforementioned modules, stand the data processing and switch modules: the first implements application-dependent processing on streams - e.g. performing compression algorithms - while the second routes data streams between the I/O channels and the Network Interface module. This general architecture has been specialized up to now into three configurations, namely NaNet-1, NaNet 3 and NaNet-10 in order to meet the requirements of different experimental setups: NaNet-1 features a GbE channel plus three custom 34 Gbps serial channels and is implemented on the Altera Stratix IV FPGA Development Kit; NaNet 3 is implemented on the Terasic DE5-NET Stratix V FPGA development board and supports four custom 2.5 Gbps deterministic latency optical channels; NaNet-10 features four 10GbE SFP+ ports and is also implemented on the Terasic DE5-NET board. We will provide performance results for the three NaNet implementations and describe their usage in the CERN NA62 and KM3NeT-IT underwater neutrino telescope experiments, showing that the architecture is very flexible and yet capable of matching the requirements of low-latency real-time applications with intensive I/O tasks involving the CPU and/or the GPU accelerators.
The KM3NeT-Italia underwater neutrino detection unit, the tower, consists of 14 floors. Each floor supports 6 Optical Modules containing front-end electronics needed to digitize the PMT signal, ...format and transmit the data and 2 hydrophones that reconstruct in real-time the position of Optical Modules, for a maximum tower throughput of more than 600 MB/s. All floor data are collected by the Floor Control Module (FCM) board and transmitted by optical bidirectional virtual point-to-point connections to the on-shore laboratory, each FCM needing an on-shore counterpart as communication endpoint. In this contribution we present NaNet3, an on-shore readout board based on Altera Stratix V GX FPGA able to manage multiple FCM data channels with a capability of 800 Mbps each. The design is a NaNet customization for the KM3NeT-Italia experiment, adding support in its I/O interface for a synchronous link protocol with deterministic latency at physical level and for a Time Division Multiplexing protocol at data level.