This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers ...selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs.
Fuzzing is a well-established technique in the software domain to uncover bugs and vulnerabilities. Yet, applications of fuzzing for security vulnerabilities in hardware systems are scarce, as ...principal reasons are requirements for design information access, i.e., HDL source code. Moreover, observation of internal hardware state during runtime is typically an ineffective information source, as its documentation is often not publicly available. In addition, such observation during runtime is also inefficient due to bandwidth-limited analysis interfaces, i.e., JTAG, and minimal introspection of hardware-internal modules.In this work, we investigate fuzzing for Xilinx 7-Series and UltraScale(+) FPGA configuration engines, the control plane governing the (secure) bitstream configuration within the FPGA. Our goal is to examine the effectiveness of fuzzing to analyze and document the opaque inner workings of FPGA configuration engines, with a primary emphasis on identifying security vulnerabilities. Using only the publicly available hardware chip and dispersed documentation, we first design and implement ConFuzz, an advanced FPGA configuration engine fuzzing and rapid prototyping framework. Based on our detailed understanding of the bitstream file format, we then systematically define 3 novel key fuzzing strategies for Xilinx FPGA configuration engines. Moreover, our strategies are executed through mutational structure-aware fuzzers and incorporate various novel custom-tailored, FPGA-specific optimizations to reduce search space. Our evaluation reveals previously undocumented behavior within the configuration engine, including critical findings such as system crashes leading to unresponsive states of the whole FPGA. In addition, our investigations not only lead to the rediscovery of the recent starbleed attack but also uncover a novel unpatchable vulnerability, denoted as JustSTART (CVE-2023-20570), capable of circumventing RSA authentication for Xilinx UltraScale(+). Note that we also discuss effective countermeasures by secure FPGA settings to prevent aforementioned attacks.
From edge devices to cloud servers, providing optimized hardware acceleration for specific applications has become a key approach to improve the efficiency of computer systems. Traditionally, many ...systems employ commercial field-programmable gate arrays (FPGAs) to implement dedicated hardware accelerator as the CPU's co-processor. However, commercial FPGAs are designed in generic architectures and are provided in the form of discrete chips, which makes it difficult to meet increasingly diversified market needs, such as balancing reconfigurable hardware resources for a specific application, or to be integrated into a customer's system-on-a-chip (SoC) in the form of embedded FPGA (eFPGA). In this paper, we propose an eFPGA generation suite with customizable architecture and integrated development environment (IDE), which covers the entire eFPGA design generation, testing, and utilization stages. For the eFPGA design generation, our intellectual property (IP) generation flow can explore the optimal logic cell, routing, and array structures for given target applications. For the testability, we employ a previously proposed shipping test method that is 100% accurate at detecting all stuck-at faults in the entire FPGA-IP. In addition, we propose a user-friendly and customizable Web-based IDE framework for the generated eFPGA based on the NODE-RED development framework. In the case study, we show an eFPGA architecture exploration example for a differential privacy encryption application using the proposed suite. Then we show the implementation and evaluation of the eFPGA prototype with a 55nm test element group chip design.
Field Programmable Gate Array (FPGA) is established in the year 1985 and it is well liked in day to day life. FPGA is the semi-conductor device which is around by configurable logic block (CLB) with ...the interconnect matrix which develops the FPGA very strong and flexible. When related to the Application Specific Integrated Circuit (ASIC), FPGA has simpler design cycle and by using Hardware Description Language (HDL) code or Verilog Hardware Description Language (VHDL) code creates the design faster and more efficient and also takes less manufacturing time. As compare to microcontroller, the FPGA has more flexibility and it also has high processing speed. This paper will discuss about classification of routing architecture in FPGA namely Island style FPGA and hierarchical FPGA. The recent application in the field of security system, video and image processing, medical electronics of FPGA.
An Integrated System for Basic Eye Movements Song, Yang; Zhang, Xiaolin
The Journal of The Institute of Image Information and Television Engineers,
2012, Volume:
66, Issue:
11
Journal Article
Open access
A human-like active binocular vision system, inspired by binocular eye movements in animals, would help robots with automatic fast target switching, smooth target pursuing and efficient visual ...stabilization. In this paper, a control model that integrates saccadic eye movement, smooth pursuit eye movement, vestibulo-ocular reflex and optokinetic response is proposed. The control interface of the model has been simplified to one external saccadic command input. By Using this target selection command, target switching, target pursuing and visual stabilization of camera would run automatically. To implement the system with parallel processing, like the one used in neural network, the control model and multi-motor control are implemented in a FPGA chip. Finally, the proposed model was tested Using an image processing PC and a binocular robot head and the results show high efficiency of this control model.
Deep neural networks (DNNs) have attracted significant attention for their excellent accuracy especially in areas such as computer vision and artificial intelligence. To enhance their performance, ...technologies for their hardware acceleration are being studied. FPGA technology is a promising choice for hardware acceleration, given its low power consumption and high flexibility which makes it suitable particularly for embedded systems. However, complex DNN models may need more computing and memory resources than those available in many current FPGAs. This paper presents FP-BNN, a binarized neural network (BNN) for FPGAs, which drastically cuts down the hardware consumption while maintaining acceptable accuracy. We introduce a Resource-Aware Model Analysis (RAMA) method, and remove the bottleneck involving multipliers by bit-level XNOR and shifting operations, and the bottleneck of parameter access by data quantization and optimized on-chip storage. We evaluate the FP-BNN accelerator designs for MNIST multi-layer perceptrons (MLP), Cifar-10 ConvNet, and AlexNet on a Stratix-V FPGA system. An inference performance of Tera opartions per second with acceptable accuracy loss is obtained, which shows improvement in speed and energy efficiency over other computing platforms.
Picnic is a post-quantum digital signature, the security of which relies solely on symmetric-key primitives such as block ciphers and hash functions instead of number theoretic assumptions. One of ...the main concerns of Picnic is the large signature size. Although Katz et al.’s protocol (MPCitH-PP) significantly reduces the size of Picnic, the involvement of more parties in MPCitH-PP leads to longer signing/verification times and more hardware resources. This poses new challenges for implementing high-performance Picnic on resource-constrained FPGAs. So far as we know, current works on the hardware implementation of MPCitH-based signatures are compatible with 3 parties only. In this work, we investigate the optimization of the implementation of MPCitH-PP and successfully deploying MPCitH-PP with more than three parties on resource-constrained FPGAs, e.g., Xilinx Artix-7 and Kintex-7, for the first time. In particular, we propose a series of optimizations, which include pipelining and parallel optimization for MPCitH-PP and the optimization of the underlying symmetric primitives. Besides, we make a slight modification to the computation of the offline commitment, which can further reduce the number of computations of Keccak. These optimizations significantly improve the hardware performance of Picnic3. Signing messages on our FPGA takes 0.047 ms for the L1 security level, outperforming Picnic1 with hardware by a factor of about 5.3, which is the fastest implementation of post-quantum signatures as far as we know. Our FPGA implementation for the L5 security level takes 0.146 ms beating Picnic1 by a factor of 8.5, and outperforming Sphincs by a factor of 17.3.
The Zero Degree Calorimeter (ZDC) was designed to provide the event geometry and luminosity measurements in heavy-ion operation. In order to exploit the potential offered by the LHC’s increased ...luminosity in Run 3, the ZDC upgraded its readout system to acquire all collisions in self triggered mode without dead time. The purpose of the upgrade was to enable the detector to cope with the increased event rate while preserving its time and charge resolution performance. The ZDC operating conditions in Run 3 Pb – Pb collisions are extremely challenging due to the presence of electromagnetic dissociation processes (EMD). For example when running in self-triggered mode the ZDC system will need to sustain a readout rate of ∼2.5 MHz for the channels of the most exposed calorimeters compared to the foreseen hadronic rate of 50 kHz sustained by the other detectors. The previous electronics, based on Charge-to-digital converters (QDCs), with a fixed dead time of ∼ 10 μμs, and on readout through VME bus, could not cope with such a high rate. Moreover, a crucial aspect of the ZDC operation in Run 3 is acquiring the events with a reduced bunch spacing of 50 ns (lower than the length of the signal of ∼ 60 ns) in the presence of high signal dynamics (from a single neutron to ∼ 60 neutrons). The new acquisition chain is based on a 12 bit digitizer with a sampling rate of about 1 GS/s, assembled on an FPGA Mezzanine Card. The signals produced by the ZDC channels are digitized, and samples are processed through an FPGA to extract information such as timing, baseline average estimation and luminosity. The architecture of the new readout system, the auto trigger strategy, the firmware organization and the ZDC performance during 2022 Pb–Pb collisions are presented.
Training convolutional neural networks (CNNs) requires intensive computations as well as a large amount of storage and memory access. While low bandwidth off-chip memories in prior FPGA works have ...hindered the system-level performance, modern FPGAs offer high bandwidth memory (HBM2) that unlocks opportunities to improve the throughput/energy of FPGA-based CNN training. This paper presents a FPGA accelerator for CNN training which (1) uses HBM2 for efficient off-chip communication, and (2) supports various training operations (e.g. residual connections, stride-2 convolutions) for modern CNNs. We analyze the impact of HBM2 on CNN training workloads, provide a comprehensive comparison with DDR3, and present the strategies to efficiently use HBM2 features for enhanced CNN training performance. For training ResNet-20/VGG-like CNNs for CIFAR-10 dataset with low batch size of 2, the proposed CNN training accelerator on Intel Stratix-10 MX FPGA demonstrates 1.4/1.7X energy-efficiency improvement compared to Stratix-10 GX FPGA with DDR3 memory, and 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU.