In a wireless power transfer (WPT) system via the magnetic resonant coupling, one of the most challenging design issues is to maintain a reasonable level of power transfer efficiency (PTE), even when ...the distance between the transmitter and the receiver changes. When the distance varies, the PTE drastically decreases due to the impedance mismatch between the resonator of the transmitter and that of the receiver. This paper presents a novel serial/parallel capacitor matrix in the transmitter, where the impedance can be automatically reconfigured to track the optimum impedance-matching point in the case of varying distances. The dynamic WPT matching system is enabled by changing the combination of serial and parallel capacitors in the capacitor matrix. An interesting observation in the proposed capacitor matrix is that the resonant frequency is not shifted, even with capacitor-matrix tuning. In order to quickly find the best capacitor combination that achieves maximum power transfer, a window-prediction-based search algorithm is also presented in this paper. The proposed resonance WPT system is implemented using a resonant frequency of 13.56 MHz, and the experimental results with 1W power transfer show that the transfer efficiency increases up to 88 % when the distance changes from 0 to 1.2 m.
In this paper, we present an energy-efficient architecture of the Canny edge detector for advanced mobile vision applications. Three key techniques for reducing computational complexity of the Canny ...edge detector are presented. First, by exploiting the rank characteristic of the convolution kernel of Gaussian smoothing and Sobel gradient filters, common computations are identified and shared in the image filter design to reduce the number of additions and multiplications. For the gradient magnitude/direction computation, only three directions of neighboring pixels are considered to reduce computation energy with minor degradation on conformance performance (CP). For the adaptive threshold selections, an interesting observation is that the mean values of gradient magnitudes show small variations depending on the classified block types. Thus, the threshold selection process can be simplified as multiplying the mean value of the local block with predecided constants. The proposed low complexity Canny edge detector has been implemented using both field-programmable gate arrays (FPGAs) and a 65-nm standard-cell library. The FPGA implementation with Xilinx Virtex-V (XC5VSX240T) shows that our edge detector achieves 48% of area and 73% of execution time savings over the conventional architecture without seriously sacrificing the detection performance. The proposed edge detector implemented with 65-nm standard-cell library can easily support real-time ultrahigh definition video data processing (50 frames/s) with the power consumption of 5.48 mW (108.84 <inline-formula> <tex-math notation="LaTeX">\mu \text{J} </tex-math></inline-formula>/frame).
Recently, the accuracy of spike neural network (SNN) has been significantly improved by deploying convolutional neural networks (CNN) and their parameters to SNN. The deep convolutional SNNs, ...however, suffer from large amounts of computations, which is the major bottleneck for energy efficient SNN processor design. In this paper, we present an input-dependent computation reduction approach, where relatively unimportant neurons are identified and pruned without seriously sacrificing the accuracies. Specifically, a neuron pruning in temporal domain is proposed that prunes less important neurons and skips its future operations based on the layer-wise pruning thresholds of membrane voltages. To find the pruning thresholds, two pruning threshold search algorithms are presented that can efficiently trade-off accuracy and computational complexity with a given computation reduction ratio. The proposed neuron pruning scheme has been implemented using 65 nm CMOS process. The SNN processor achieves a 57% energy reduction and a 2.68× speed up, with up to 0.82% accuracy loss and 7.3% area overhead for CIFAR-10 dataset.
In many digital signal processing applications, some parts of a word stored in the embedded static random access memories (SRAMs) are more important than other parts of the word. Due to the ...differences in importance, memory failures that occur in more important bit locations generally give rise to relatively larger system performance degradation than those in less important locations. This brief presents a low-complexity unequal-error-protection error correcting code (UEEP-ECC) approach for the embedded memories in digital signal processor. In the proposed UEEP-ECC, repetition code is combined with the Bose-Chaudhuri-Hocquenghem code to selectively provide stronger error correction capabilities on more important data portions without a large hardware overhead. An efficient UEEP-ECC generation algorithm that can find the UEEP-ECC code with a minimum power of memory core and ECC logics is also presented. The experimental results show that the UEEP-ECC scheme achieves considerable power savings and data quality improvements in both of the H.264 and fast Fourier transform applications.
This paper presents a hybrid multimode Bose Chaudhuri Hocquenghem (BCH) encoder for reducing the input length of Syndrome calculation (SC) based on re-encoding approach. In previous re-encoding ...approaches, a conventional BCH encoder with long generator polynomials is used as a remainder operator to reduce the input length of SC. However, the input length is still large since long polynomial is used as a denominator of remainder operator for re-encoding. In the proposed approach, several minimal polynomials are employed as the denominators of remainder operators by utilizing the hardware of hybrid multimode BCH encoder. As a result, the minimum input length for SC can be employed for SC implementation through reencoding scheme, which leads to considerable area and latency reduction in SC module design. The proposed BCH encoder architecture and reduced SC modules are implemented using Samsung 65nm technology. The experimental results show that, in case of BCH (8640, 8192, 32) codes, the total area of SC modules are reduced by 96% compared to the previous re-encoding based SC module design, while the proposed multimode BCH encoder architecture also provides the reconfigurable error correction capability for 1 ≤ t sel ≤ 32.
In this paper, we present an energy and area efficient spike neural network (SNN) processor based on novel spike counts based methods. For the low cost SNN design, we propose hardware-friendly ...complexity reduction techniques for both of learning and inferencing modes of operations. First, for the unsupervised learning process, we propose a spike counts based learning method. The novel learning approach utilizes pre- and post-synaptic spike counts to reduce the bit-width of synaptic weights as well as the number of weight updates. For the energy efficient inferencing operations, we propose an accumulation based computing scheme, where the number of input spikes for each input axon is accumulated without instant membrane updates until the pre-defined number of spikes are reached. In addition, the computation skip schemes identify meaningless computations and skip them to improve energy efficiency. Based on the proposed low complexity design techniques, we design and implement the SNN processor using 65 nm CMOS process. According to the implementation results, the SNN processor achieves 87.4% of recognition accuracy in MNIST dataset using only 1-bit 230 k synaptic weights with 400 excitatory neurons. The energy consumptions are 0.26 pJ/SOP and 0.31 μJ/inference in inferencing mode, and 1.42 pJ/SOP and 2.63 μJ/learning in learning mode of operations.
Spiking neural network (SNN) system that uses rank order coding (ROC) as input spike encoding, generally suffers from low recognition accuracy and unnecessary computations that increase complexities. ...In this paper, we present a Spiking convolutional neural network (Spiking CNN) architecture that significantly improves recognition accuracy as well as computation efficiencies based on a novel ROC and modified kernel sizes. The proposed ROC generates spike trains based on maximum input value without sorting operations. In addition, as the recognition accuracy is affected by the reduced number of spikes as layers become deeper, the proposed ROC is inserted just before the final layer to increase the number of input spikes. The 2 × 2 pooling kernels are also replaced with 4 × 4 to reduce the network size. The hardware architecture of the proposed Spiking CNN has been implemented using 65 nm CMOS process. Neuron-centric membrane voltage update approach is also efficiently exploited in convolutional and fully connected layers to improve the hardware energy efficiencies. The Spiking CNN processor is seamlessly processing 2.85 K classifications per second with 6.79 uJ/classification. It also achieves 90.2% of recognition accuracy for MNIST dataset using unsupervised learning with STDP.
Two main bottlenecks encountered when implementing energy-efficient spike-timing-dependent plasticity (STDP) based sparse coding, are the complex computation of winner-take-all (WTA) operation and ...repetitive neuronal operations in the time domain processing. In this article, we present an energy-efficient STDP based sparse coding processor. The low-cost hardware is based on the algorithmic reduction techniques as following: First, the complex WTA operation is simplified based on the prediction of spike emitting neurons. Sparsity based approximation in spatial and temporal domain are also efficiently exploited to remove the redundant neurons with negligible algorithmic accuracy loss. We designed and implemented the hardware of the STDP based sparse coding using 65nm CMOS process. By exploiting input sparsity, the proposed SNN architecture can dynamically trade off algorithmic quality for computation energy (up to 74%) for Natural image (maximum 0.01 RMSE increment) and MNIST (no accuracy loss) applications. In the inference mode of operations, the SNN hardware achieves the throughput of 374 Mpixels/s and 840.2 GSOP/s with the energy-efficiency of 781.52 pJ/pixel and 0.35 pJ/SOP.