Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver ...state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip ...DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size N = 4), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW (N = 3).
The authors demonstrate the key role dataflows play in the optimization of energy efficiency for deep neural network (DNN) accelerators. By introducing a systematic approach to analyze the problem ...and a new dataflow, called Row-Stationary, which is up to 2.5 times more energy efficient than existing dataflows in processing a state-of-the-art DNN, this work provides guidelines for future DNN accelerator designs.
A recent trend in deep neural network (DNN) development is to extend the reach of deep learning applications to platforms that are more resource and energy-constrained, e.g., mobile devices. These ...endeavors aim to reduce the DNN model size and improve the hardware processing efficiency and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity . These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes and often require specialized hardware to exploit sparsity for performance improvement. Therefore, many DNN accelerators designed for large DNNs do not perform well on these models. In this paper, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65-nm CMOS process achieves a throughput of 1470.6 inferences/s and 2560.3 inferences/J at a batch size of 1, which is <inline-formula> <tex-math notation="LaTeX">12.6\times </tex-math></inline-formula> faster and <inline-formula> <tex-math notation="LaTeX">2.5\times </tex-math></inline-formula> more energy-efficient than the original Eyeriss running MobileNet.
The independent measurement of the Hubble constant with gravitational-wave standard sirens will potentially shed light on the tension between the local distance ladders and Planck experiments. ...Therefore, thorough understanding of the sources of systematic uncertainty for the standard siren method is crucial. In this Letter, we focus on two scenarios that will potentially dominate the systematic uncertainty of standard sirens. First, simulations of electromagnetic counterparts of binary neutron star mergers suggest aspherical emissions, so the binaries available for the standard siren method can be selected by their viewing angles. This selection effect can lead to ≳2% bias in Hubble constant measurement even with mild selection. Second, if the binary viewing angles are constrained by the electromagnetic counterpart observations but the bias of the constraints is not controlled under ∼10°, the resulting systematic uncertainty in the Hubble constant will be >3%. In addition, we find that both of the systematics cannot be properly removed by the viewing angle measurement from gravitational-wave observations. Comparing to the known dominant systematic uncertainty for standard sirens, the ≤2% gravitational-wave calibration uncertainty, the effects from the viewing angle appear to be more significant. Therefore, the systematic uncertainty from the viewing angle might be a major challenge before the standard sirens can resolve the tension in the Hubble constant, which is currently ∼9%.
Gravitational-wave detections provide a novel way to determine the Hubble constant
, which is the current rate of expansion of the Universe. This 'standard siren' method, with the absolute distance ...calibration provided by the general theory of relativity, was used to measure the Hubble constant using the gravitational-wave detection of the binary neutron-star merger, GW170817, by the Laser Interferometer Gravitational-Wave Observatory (LIGO) and Virgo
, combined with optical identification of the host galaxy
NGC 4993. This independent measurement is of particular interest given the discrepancy between the value of the Hubble constant determined using type Ia supernovae via the local distance ladder (73.24 ± 1.74 kilometres per second per megaparsec) and the value determined from cosmic microwave background observations (67.4 ± 0.5 kilometres per second per megaparsec): these values differ
by about 3σ. Local distance ladder observations may achieve a precision of one per cent within five years, but at present there are no indications that further observations will substantially reduce the existing discrepancies
. Here we show that additional gravitational-wave detections by LIGO and Virgo can be expected to constrain the Hubble constant to a precision of approximately two per cent within five years and approximately one per cent within a decade. This is because observing gravitational waves from the merger of two neutron stars, together with the identification of a host galaxy, enables a direct measurement of the Hubble constant independent of the systematics associated with other available methods. In addition to clarifying the discrepancy between existing low-redshift (local ladder) and high-redshift (cosmic microwave background) measurements, a precision measurement of the Hubble constant is of crucial value in elucidating the nature of dark energy
.
The detection of GW170817 and the identification of its host galaxy have allowed for the first standard-siren measurement of the Hubble constant, with an uncertainty of ∼14%. As more detections of ...binary neutron stars with redshift measurement are made, the uncertainty will shrink. The dominating factors will be the number of joint detections and the uncertainty on the luminosity distance of each event. Neutron star black hole mergers are also promising sources for advanced LIGO and Virgo. If the black hole spin induces precession of the orbital plane, the degeneracy between luminosity distance and the orbital inclination is broken, leading to a much better distance measurement. In addition, neutron star black hole sources are observable to larger distances, owing to their higher mass. Neutron star black holes could also emit electromagnetic radiation: depending on the black hole spin and on the mass ratio, the neutron star can be tidally disrupted, resulting in electromagnetic emission. We quantify the distance uncertainty for a wide range of black hole mass, spin, and orientations and find that the 1σ statistical uncertainty can be up to a factor of ∼10 better than for a nonspinning binary neutron star merger with the same signal-to-noise ratio. The better distance measurement, the larger gravitational-wave detectable volume, and the potentially bright electromagnetic emission imply that spinning black hole neutron star binaries can be the optimal standard-siren sources as long as their astrophysical rate is larger than O(10) Gpc^{-3} yr^{-1}, a value allowed by current astrophysical constraints.
The innate immune system deploys a variety of sensors to detect signs of infection. Nucleic acids represent a major class of pathogen signatures that can trigger robust immune responses. The presence ...of DNA in the cytoplasm of mammalian cells is a danger signal that activates innate immune responses; however, how cytosolic DNA triggers these responses remained unclear until recently. In this review, we focus on the mechanism of DNA sensing by the newly discovered cGAS-cGAMP-STING pathway and highlight recent progress in dissecting the in vivo functions of this pathway in immune defense as well as autoimmunity.
The development of red thermally activated delayed fluorescence (TADF) emitters having excellent optoelectronic properties and satisfactory electroluminescence efficiency is full of challenges due to ...strict molecular design principles. Two red TADF molecules, 3‐(9,9‐dimethylacridin‐10(9H)‐yl)acenaphtho1,2‐bquinoxaline‐9,10‐dicarbonitrile and 3‐(2,7‐dimethyl‐10H‐spiroacridine‐9,9′‐fluoren‐10‐yl)acenaphtho1,2‐bquinoxaline‐9,10‐dicarbonitrile, are developed by adopting a donor–acceptor molecular architecture bearing an electron‐accepting acenaphtho1,2‐bquinoxaline‐9,10‐dicarbonitrile (ANQDC) moiety and a 9,9‐dimethyl‐9,10‐dihydroacridine or 2,7‐dimethyl‐10H‐spiroacridine‐9,9′‐fluorene electron donor. The combined effects of rigid and planar D/A moieties and highly steric hindrance between D and A groups endow both molecules with high rigidity to suppress nonradiative decay processes, resulting in high photoluminescence quantum efficiencies (ΦPLs) of up to 95%. Attributed to the linear and planar acceptor motif and rod‐like molecular configuration, both emitters achieve high horizontal ratios of emitting dipole orientation of ≈80%. The organic light‐emitting diodes (OLEDs) based on both emitters exhibit red emissions peaking at ≈615 nm and successfully afford ultrahigh electroluminescence performance with an external quantum efficiency of nearly 28% and a power efficiency of above 50 lm W−1, on par with the state‐of‐the‐art device efficiency for red TADF OLEDs. This presents a feasible design strategy for excellent TADF emitters simultaneously possessing high ΦPLs and horizontally aligned emitting dipoles.
An ultrahigh‐efficiency red thermally activated delayed fluorescence (TADF) OLED with an external quantum efficiency of nearly 28% and a power efficiency of exceeding 50 lm W−1 is realized. The OLEDs incorporate excellent red TADF emitters, simultaneously exhibiting 95% photoluminescence quantum efficiency and preferentially horizontal emitting dipole orientation.
Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to ...simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy. In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4× to 2.5×) and fully-connected layers (at least 1.3× for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis.