Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore's Law. Machine learning, especially ...deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.
This study designs a fuzzy double hidden layer recurrent neural network (FDHLRNN) controller for a class of nonlinear systems using a terminal sliding-mode control (TSMC). The proposed FDHLRNN is a ...fully regulated network, which can be simply considered as a combination of a fuzzy neural network (FNN) and a radial basis function neural network (RBF NN) to improve the accuracy of a nonlinear approximation, so it has the advantages of these two neural networks. The main advantage of the proposed new FDHLRNN is that the output values of the FNN and DHLRNN are considered at the same time, and the outer layer feedback is added to increase the dynamic approximation ability. FDHLRNN was designed to approximate the nonlinear sliding-mode equivalent control term to reduce the switching gain. To ensure the best approximation capability and control performance, the proposed FDHLRNN using TSMC is applied for the second-order nonlinear model. Two simulation examples are implemented to verify that the proposed FDHLRNN has faster convergence speed and the FDHLRNN with TSMC has good dynamic property and robustness, and a hardware experimental study with an active power filter proves the feasibility of the method.
Physics informed neural networks (PINNs) are a novel deep learning paradigm primed for solving forward and inverse problems of nonlinear partial differential equations (PDEs). By embedding physical ...information delineated by PDEs in feedforward neural networks, PINNs are trained as surrogate models for approximate solution to the PDEs without need of label data. Due to the excellent capability of neural networks in describing complex relationships, a variety of PINN-based methods have been developed to solve different kinds of problems such as integer-order PDEs, fractional PDEs, stochastic PDEs and integro-differential equations (IDEs). However, for the state-of-the-art PINN methods in application to IDEs, integral discretization is a key prerequisite in order that IDEs can be transformed into ordinary differential equations (ODEs). However, integral discretization inevitably introduces discretization error and truncation error to the solution. In this study, we propose an auxiliary physics informed neural network (A-PINN) framework for solving forward and inverse problems of nonlinear IDEs. By defining auxiliary output variable(s) to represent the integral(s) in the governing equation and employing automatic differentiation of the auxiliary output to replace integral operator, the proposed A-PINN bypasses the limitation of integral discretization. Distinct from the neural network in the original PINN which only approximates the variables in the governing equation, in the proposed A-PINN framework, a multi-output neural network is constructed to simultaneously calculate the primary outputs and auxiliary outputs which respectively approximate the variables and integrals in the governing equation. Subsequently, the relationship between the primary outputs and auxiliary outputs is constrained by new output conditions in compliance with physical laws. By pursuing the first-order nonlinear Volterra IDE benchmark problem, we validate that the proposed A-PINN can obtain more accurate solution than the conventional PINN. We further demonstrate the good performance of A-PINN in solving the forward problems involving nonlinear Volterra IDEs system, nonlinear 2-dimensional Volterra IDE, nonlinear 10-dimensional Volterra IDE, and nonlinear Fredholm IDE. Finally, the A-PINN framework is implemented to solve the inverse problem of nonlinear IDEs and the results show that the unknown parameters can be satisfactorily discovered even with heavily noisy data.
•A novel PINN method (A-PINN) for nonlinear integro-differential equations (IDEs).•Escape of integral discretization by defining auxiliary variables for integrals.•Multi-output neural network to calculate both primary outputs and auxiliary outputs.•Mesh-free method without suffering from discretization and truncation errors.•Application of A-PINN to solve various forward and inverse problems of IDEs.
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time ...deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces computations by reducing the precision of the datatype. Weights, biases, and activations may be quantized typically to 8-bit integers although lower bit width implementations are also discussed including binary neural networks. Both pruning and quantization can be used independently or combined. We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks.
Prediction of remaining useful life (RUL) is an indispensable part of prognostics health management in complex systems. Considering the parallel integration of the spatial and temporal features ...implicated in measurement data, this paper proposes a novel parallel hybrid neural network that consists of one-dimensional convolutional neural network (1DCNN) and bidirectional gated recurrent unit (BiGRU) to predict RUL. Specifically, the spatial and temporal information from historical data is parallel extracted with the aid of 1DCNN and BiGRU, respectively. On this basis, the trained network can be applied for RUL prediction in real-time. The proposed parallel hybrid neural network is evaluated by two public datasets, in detail, an aircraft turbofan engine dataset and a milling dataset. Experimental results demonstrate that the proposed parallel hybrid network can effectively predict the RUL, which outperforms the existing literature.
Currently, the different deep neural network (DNN) learning approaches have done much for the classification of hyperspectral images (HSIs), especially most of them use the convolutional neural ...network (CNN). HSI data have the characteristics of multidimensionality, correlation, nonlinearity, and a large amount of data. Therefore, it is particularly important to extract deeper features in HSIs by reducing dimensionalities which help improve the classification in both spectral and spatial domains. In this article, we present a spatial-spectral HSI classification algorithm, local similarity projection Gabor filtering (LSPGF), which uses local similarity projection (LSP)-based reduced dimensional CNN with a 2-D Gabor filtering algorithm. First, use the local similarity analysis to reduce the dimensionality of the hyperspectral data, and then we use the 2-D Gabor filter to filter the reduced hyperspectral data to generate spatial tunnel information. Second, use the CNN to extract features from the original hyperspectral data to generate spectral tunnel information. Third, the spatial tunnel information and the spectral tunnel information are fused to form the spatial-spectral feature information, which is input into the deep CNN to extract more effective features; and finally, a dual optimization classifier is used to classify the final extracted features. This article compares the performance of the proposed method with other algorithms in three public HSI databases and shows that the overall accuracy of the classification of LSPGF outperforms all datasets.
The on-chip implementation of learning algorithms would speed up the training of neural networks in crossbar arrays. The circuit level design and implementation of a back-propagation algorithm using ...gradient descent operation for neural network architectures is an open problem. In this paper, we propose analog backpropagation learning circuits for various memristive learning architectures, such as deep neural network, binary neural network, multiple neural network, hierarchical temporal memory, and long short-term memory. The circuit design and verification are done using TSMC 180-nm CMOS process models and TiO 2 -based memristor models. The application level validations of the system are done using XOR problem, MNIST character, and Yale face image databases.
In this article, we propose a complementary deep-neural-network (C-DNN) processor by combining convolutional neural network (CNN) and spiking neural network (SNN) to take advantage of them. The C-DNN ...processor can support both complementary inference and training with heterogeneous CNN and SNN core architecture. In addition, the C-DNN processor is the first DNN accelerator application-specific integrated circuit (ASIC) that can support CNN-SNN workload division by using their magnitude-energy tradeoff. The C-DNN processor integrates the CNN-SNN workload allocator and attention module to find a more energy-efficient network domain for each workload in DNN. They enable the C-DNN processor to operate at the energy optimal point. Moreover, the SNN processing element (PE) array with distributed L1 cache can reduce the redundant memory access for SNN processing, resulting in a 42.2%-49.1% reduction. For high energy-efficient DNN training, the C-DNN processor integrates the global counter and local delta-weight (LDW) unit to eliminate power-consuming counters for a forward delta-weight generation. Furthermore, the forward delta-weight-based sparsity generation (FDWSG) is proposed to reduce the number of operations for training by 31%-79%. The C-DNN processor achieves an energy efficiency of 85.8 and 79.9 TOPS/W for inference with CIFAR-10 and CIFAR-100, respectively (VGG-16). Moreover, the C-DNN processor achieves ImageNet classification with state-of-the-art energy efficiency of 24.5 TOPS/W (ResNet-50). For training, the C-DNN processor achieves the state-of-the-art energy efficiency of 84.5 and 17.2 TOPS/W for CIFAR-10 and ImageNet, respectively. Furthermore, it achieves 77.1% accuracy for ImageNet training with ResNet-50.
This reprint focuses on applications of machine learning models in a diverse range of fields and problems. It reports substantive results on a wide range of learning methods; discusses the ...conceptualization of problems, data representation, feature engineering, machine learning models; undertakes critical comparisons with existing techniques; and presents an interpretation of the results. The topics within the chapters of the publication fall into six categories: computer vision, teaching and learning, social media, forecasting, basic problems of machine learning, and other topics.
•This paper addresses the multi-step ahead prediction tasks.•Develop a deep learning-based approach to fully mine spatial-temporal features of traffic flow.•Employ visualization approach to ...understand the work mechanism of neural networks on traffic flow data.
Deep neural networks (DNNs) have recently demonstrated the capability to predict traffic flow with big data. While existing DNN models can provide better performance than shallow models, it is still an open issue of making full use of spatial-temporal characteristics of the traffic flow to improve their performance. In addition, our understanding of them on traffic data remains limited. This paper proposes a DNN based traffic flow prediction model (DNN-BTF) to improve the prediction accuracy. The DNN-BTF model makes full use of weekly/daily periodicity and spatial-temporal characteristics of traffic flow. Inspired by recent work in machine learning, an attention based model was introduced that automatically learns to determine the importance of past traffic flow. The convolutional neural network was also used to mine the spatial features and the recurrent neural network to mine the temporal features of traffic flow. We also showed through visualization how DNN-BTF model understands traffic flow data and presents a challenge to conventional thinking about neural networks in the transportation field that neural networks is purely a “black-box” model. Data from open-access database PeMS was used to validate the proposed DNN-BTF model on a long-term horizon prediction task. Experimental results demonstrated that our method outperforms the state-of-the-art approaches.