Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-art NNs are ...known to be both computationally and memory intensive, due to the ever-increasing deep structure, i.e., multiple layers with massive neurons and connections (i.e., synapses). Sparse neural networks have emerged as an effective solution to reduce the amount of computation and memory required. Though existing NN accelerators are able to efficiently process dense and regular networks, they cannot benefit from the reduction of synaptic weights. In this paper, we propose a novel accelerator, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency. The proposed accelerator features a PE-based architecture consisting of multiple Processing Elements (PE). An Indexing Module (IM) efficiently selects and transfers needed neurons to connected PEs with reduced bandwidth requirement, while each PE stores irregular and compressed synapses for local computation in an asynchronous fashion. With 16 PEs, our accelerator is able to achieve at most 544 GOP/s in a small form factor (6.38 mm 2 and 954 mW at 65 nm). Experimental results over a number of representative sparse networks show that our accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.
Cambricon-s Zhou, Xuda; Du, Zidong; Guo, Qi ...
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO),
10/2018
Conference Proceeding
Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such as image recognition, speech recognition and natural ...language processing. However, neural networks keep moving towards deeper and larger architectures, posing a great challenge to the huge amount of data and computations. Although sparsity has emerged as an effective solution for reducing the intensity of computation and memory accesses directly, irregularity caused by sparsity (including sparse synapses and neurons) prevents accelerators from completely leveraging the benefits; it also introduces costly indexing module in accelerators.
In this paper, we propose a cooperative software/hardware approach to address the irregularity of sparse neural networks efficiently. Initially, we observe the local convergence, namely larger weights tend to gather into small clusters during training. Based on that key observation, we propose a software-based coarse-grained pruning technique to reduce the irregularity of sparse synapses drastically. The coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio. We further design a hardware accelerator, Cambricon-S, to address the remaining irregularity of sparse synapses and neurons efficiently. The novel accelerator features a selector module to filter unnecessary synapses and neurons. Compared with a state-of-the-art sparse neural network accelerator, our accelerator is 1.71x and 1.37x better in terms of performance and energy efficiency, respectively.
DaDianNao: A Machine-Learning Supercomputer Yunji Chen; Tao Luo; Shaoli Liu ...
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture,
2014-Dec.
Conference Proceeding
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The ...state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.
This note studies the global robust output regulation problem by state feedback for strict feedforward systems. By utilizing the general framework for tackling the output regulation problem , the ...output regulation problem is converted into a global robust stabilization problem for a class of feedforward systems that is subject to both time-varying static and dynamic uncertainties. Then the stabilization problem is solved by using a small gain based bottom-up recursive design procedure.
•LINC01152 is up-expressed in HBV positive HCC.•LINC01152 promotes HCC cells proliferation and tumor formation in nude mice.•HBx increase the transcription of LINC01152.•LINC01152 binds to the ...promoter region of IL-23.•IL-23/Stat3/p-Stat3 pathways are involved in the mechanism LINC01152 effects HCC.
Accumulating evidence suggests that long-noncoding RNA (lncRNA) plays important roles in hepatitis B virus (HBV) infections. However, the mechanism underlying how lncRNA regulate hepatocellular carcinoma process remains largely unknown. In this study we found that the expression of LINC01152 was significantly increased in HBV positive HCC tissues and cells and was induced by HBx in vitro. The overexpression of LINC01152 could increases HCC cell proliferation and promotes tumor formation in nude mice. Mechanistically, HBx could increase the transcription of LINC01152. Elevated LINC01152 binds to the promoter region of IL-23, promoting its transcriptional activity and upregulating the levels of Stat3 and p-Stat3. Our findings suggest that LINC01152 plays an important role in HBV-related hepatocellular carcinoma development and may serve as a therapeutic marker for hepatocellular carcinoma.
The stable spline (SS) kernel and the diagonal correlated (DC) kernel are two kernels that have been applied and studied extensively for kernel-based regularized LTI system identification. In this ...note, we show that similar to the derivation of the SS kernel, the continuous-time DC kernel can be derived by applying the same "stable" coordinate change to a "generalized" first-order spline kernel, and thus, can be interpreted as a stable generalized first-order spline kernel. This interpretation provides new facets to understand the properties of the DC kernel. In particular, we derive a new orthonormal basis expansion of the DC kernel and the explicit expression of the norm of the reproducing kernel Hilbert space associated with the DC kernel. Moreover, for the nonuniformly sampled DC kernel, we derive its maximum entropy property and show that its kernel matrix has tridiagonal inverse.
Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on ...analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well EDAs (with a finite population size larger than two) will scale up when the dimension of the optimization problem (problem size) goes up. This paper studies the computational time complexity of a simple EDA, i.e., the univariate marginal distribution algorithm (UMDA), in order to gain more insight into EDAs complexity. First, we discuss how to measure the computational time complexity of EDAs. A classification of problem hardness based on our discussions is then given. Second, we prove a theorem related to problem hardness and the probability conditions of EDAs. Third, we propose a novel approach to analyzing the computational time complexity of UMDA using discrete dynamic systems and Chernoff bounds. Following this approach, we are able to derive a number of results on the first hitting time of UMDA on a well-known unimodal pseudo-boolean function, i.e., the LeadingOnes problem, and another problem derived from LeadingOnes, named BVLeadingOnes. Although both problems are unimodal, our analysis shows that LeadingOnes is easy for the UMDA, while BVLeadingOnes is hard for the UMDA. Finally, in order to address the key issue of what problem characteristics make a problem hard for UMDA, we discuss in depth the idea of ¿margins¿ (or relaxation). We prove theoretically that the UMDA with margins can solve the BVLeadingOnes problem efficiently.
Hyperparameter optimization still remains the core issue in Gaussian processes (GPs) for machine learning. The classical hyperparameter optimization scheme based on maximum likelihood estimation is ...impractical for big data processing, as its computational complexity is cubic in terms of the number of data points. With the rapid development of efficient parallel data processing on ever cheaper and more powerful hardware, distributed models and algorithms will become ubiquitous. In this letter, we propose an alternative distributed GP hyperparameter optimization scheme using the efficient proximal alternating direction method of multipliers, proposed by Hong et al. in 2016, and we derive the closed-form solution for the local sub-problems. In contrast to the existing schemes of similar kind, our proposed one well balances the computational load on each local machine and the communication overhead required for global consensus of the local hyperparameter estimates. The proposed scheme can work in either a synchronous or an asynchronous manner, thus very flexible to be adopted in different computing facilities. Experimental results with both synthetic and real datasets validate the outstanding performance of the proposed scheme.
Stent intimal hyperplasia leads to in stent restenosis and thrombosis. This study determined whether Fibulin-1 activity in smooth muscle cells (SMCs) contributes to stent restenosis or thrombosis.
...Stent implantation was conducted in a pig model. Target vessel samples were stained and analyzed by protein mass spectrometry. Cell experiments and Fibulin-1 SMC specific knockout mice (Fbln1SMKO) were used to investigate the mechanism of Fibulin-1 induced SMC proliferation and thrombosis.
SMC proliferation and phenotypic transition are the main pathological changes of intimal hyperplasia in venous stents. Protein mass spectrometry analysis revealed a total of 67 upregulated proteins and 39 downregulated proteins in intimal hyperplasia after stent implantation compared with normal iliac vein tissues. Among them, Fibulin-1 ranked among the top proteins altered. Fibulin-1 overexpressing human SMCs (Fibulin-1-hSMCs) showed increased migration and phenotypic switching from contractile to secretory type and Fibulin-1 inhibition decreased the activity of SMCs. Mechanistically, Fibulin-1-hSMCs displayed increased levels of angiotensin converting enzyme (ACE) expression and angiotensin II signaling. Inhibition of ACE or angiotensin II signaling alleviated the migration of Fibulin-1-hSMCs. Using Fibulin-1 SMC specific knockout mice (Fbln1SMKO) and venous thrombosis model, we demonstrated that Fibulin-1 deletion attenuated intimal SMCs proliferation and thrombosis. Further, Fibulin-1 concentration was high in iliac vein compression syndrome (IVCS) patients treated with stent and was an independent predictor of venous insufficiency.
Fibulin-1 promotes SMC proliferation partially through ACE secretion and angiotensin II signaling after stent implantation. Fibulin-1 plays a role in venous insufficiency syndrome, implicating the protein in the detection and treatment of IVCS.
This paper studies the disturbance attenuation problem of a class of nonlinear systems in feedforward form that is subject to both dynamic uncertainty and disturbance. When the disturbance vanishes, ...the equilibrium of the closed-loop system is globally asymptotically stable. Two versions of small gain theorem with restrictions are employed to establish the global attractiveness and local stability of the closed-loop system at the equilibrium respectively.