In recent years FPGA has become popular in CNN acceleration, and many CNN-to-FPGA toolchains are proposed to fast deploy CNN on FPGA. However, for these toolchains, updating CNN network means ...regeneration of RTL code and re-implementation which is time-consuming and may suffer timing-closure problems. So, we propose HBDCA: a toolchain and corresponding accelerator. The CNN on HBDCA is defined by the content of BRAM. The toolchain integrates UpdateMEM utility of Xilinx, which updates content of BRAM without re-synthesis and re-implementation process. The toolchain also integrates TensorFlow Lite which provides high-accuracy quantization. HBDCA supports 8-bits per-channel quantization of weights and 8-bits per-layer quantization of activations. Upgrading CNN on accelerator means the kernel size of CNN may change. Flexible structure of HBDCA supports kernel-level parallelism with three different sizes (3×3, 5×5, 7×7). HBDCA implements four types of parallelism in convolution layer and two types of parallelism in fully-connected layer. In order to reduce access number to memory, both spatial and temporal data-reuse techniques were applied on convolution layer and fully-connect layer. Especially, temporal reuse is adopted at both row and column level of an Input Feature Map of convolution layer. Data can be just read once from BRAM and reused for the following clock. Experiments show by updating BRAM content with single UpdateMEM command, three CNNs with different kernel size (3×3, 5×5, 7×7) are implemented on HBDCA. Compared with traditional design flow, UpdateMEM reduces development time by 7.6X-9.1X for different synthesis or implementation strategy. For similar CNN which is created by toolchain, HBDCA has smaller latency (9.97µs-50.73µs), and eliminates re-implementation when update CNN. For similar CNN which is created by dedicated design, HBDCA also has the smallest latency 9.97µs, the highest accuracy 99.14% and the lowest power 1.391W. For different CNN which is created by similar toolchain which eliminate re-implementation process, HBDCA achieves higher speedup 120.28X.
A series of Binarized Neural Networks (BNNs) show the accepted accuracy in image classification tasks and achieve the excellent performance on field programmable gate array (FPGA). Nevertheless, we ...observe existing designs of BNNs are quite time-consuming in change of the target BNN and acceleration of a new BNN. Therefore, this paper presents FCA-BNN, a flexible and configurable accelerator, which employs the layer-level configurable technique to execute seamlessly each layer of target BNN. Initially, to save resource and improve energy efficiency, the hardware-oriented optimal formulas are introduced to design energy-efficient computing array for different sizes of padded-convolution and fully-connected layers. Moreover, to accelerate the target BNNs efficiently, we exploit the analytical model to explore the optimal design parameters for FCA-BNN. Finally, our proposed mapping flow changes the target network by entering order, and accelerates a new network by compiling and loading corresponding instructions, while without loading and generating bitstream. The evaluations on three major structures of BNNs show the differences between inference accuracy of FCA-BNN and that of GPU are just 0.07%, 0.31% and 0.4% for LFC, VGG-like and Cifar-10 AlexNet. Furthermore, our energy-efficiency results achieve the results of existing customized FPGA accelerators by 0.8× for LFC and 2.6× for VGG-like. For Cifar-10 AlexNet, FCA-BNN achieves 188.2× and 60.6× better than CPU and GPU in energy efficiency, respectively. To the best of our knowledge, FCA-BNN is the most efficient design for change of the target BNN and acceleration of a new BNN, while keeps the competitive performance.
VBSME (variable block size motion estimation) is adopted in the MPEG-4 AVC/H.264 standard. In order to increase the hardware utilization for VBSME with FSBMA (full search block matching algorithm), ...this paper proposed a new high-performance reconfigurable VLSI architecture to support "meander"-like scan format for a high data reuse of search area. The architecture can support the three data flows of the scan format through a reconfigurable computing array and a memory of the search area. The computing array can achieve 100% processing element (PE) utilization and can reuse the smaller blocks' SADs to calculate 41 motion vectors (MVs) of a 16X16 block in parallel. The design is implemented with TSMC 0.18 mum CMOS technology. Under a clock frequency of 180 MHz, the architecture allows the real-time processing of 1280 x 720 at 45 fps in a search range -16, +16.
A novel flame-retardant ternary composite of polymer/crosslinked rubber/nano-Magnesium hydroxide (MH), prepared by blending thermoplastic polymer with a special compound powder of crosslinked ...rubber/nano-MH, was introduced in this paper. The special compound powder of crosslinked rubber/nano-MH was prepared by co-spray drying the fluid mixture of nano-MH slurry and irradiated rubber latex. The cone testing results showed that the new flame-retardant ternary composite had better flame retardancy than the composite obtained by conventional process, such as longer “Time to ignition” and lower “mean heat release rate in initial time”. Thermogravimetry and transmission electron microscope were used to analyze the reason of different flame retardancy. It is found that more uniform dispersion of nano-MH in the new ternary composite than in conventional one maybe the main reason for better flame retardancy.
A synergistic effect on flame retardancy was found when acrylonitrile butadiene ultra-fine fully vulcanized powdered rubber (NB-UFPR) was incorporated into ethylene vinyl acetate/nano-magnesium ...hydroxide (EVA/nano-MH) composite by a new process. The fire performance of EVA and EVA composites was compared in this communication by cone calorimeter test (CCT). The CCT data indicated that the addition of NB-UFPR in EVA/nano-MH system not only reduced the heat release rate, but also prolonged the ignition time of the composite, which is contrary to the effect of NB-UFPR when it was added alone in EVA polymer. Thermogravimetric analysis revealed that nano-MH accelerated the loss of acetic acid, but NB-UFPR assisted to reduce nano-MH's accelerating effect. FTIR spectra showed a new absorption at 3374
cm
−1 in the blends of EVA/NB-UFPR and EVA/NB-UFPR/nano-MH.