Silicon-based static random access memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-the-art computing platforms. Despite tremendous strides in scaling the ubiquitous ...metal-oxide-semiconductor transistor, the underlying von-Neumann computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-the-art computing systems, to a large extent, result from the well-known von-Neumann bottleneck . The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications such as artificial intelligence, machine learning, and cryptography. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable in-memory Boolean computations. In this paper, we present an augmented version of the conventional SRAM bit-cells, called the X-SRAM , with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations, including NAND, NOR, IMP (implication), XOR logic gates, with respect to different bit-cell topologies − the 8T cell and the 8 + T Differential cell. In addition, we also present a novel 'read-compute-store' scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using the predictive transistor models and detailed Monte-Carlo variation analysis. As an illustration, we also present the efficacy of the proposed in-memory computations by implementing advanced encryption standard algorithm on a non-standard von-Neumann machine wherein the conventional SRAM is replaced by X-SRAM. Our simulations indicated that up to 75% of memory accesses can be saved using the proposed techniques.
The rapid growth of brain-inspired computing coupled with the inefficiencies in the CMOS implementations of neuromrphic systems has led to intense exploration of efficient hardware implementations of ...the functional units of the brain, namely, neurons and synapses. However, efforts have largely been invested in implementations in the electrical domain with potential limitations of switching speed, packing density of large integrated systems and interconnect losses. As an alternative, neuromorphic engineering in the photonic domain has recently gained attention. In this work, we propose a purely photonic operation of an Integrate-and-Fire Spiking neuron, based on the phase change dynamics of Ge
Sb
Te
(GST) embedded on top of a microring resonator, which alleviates the energy constraints of PCMs in electrical domain. We also show that such a neuron can be potentially integrated with on-chip synapses into an all-Photonic Spiking Neural network inferencing framework which promises to be ultrafast and can potentially offer a large operating bandwidth.
A
bstract
We compute the differential cross-section for inclusive prompt photon production in deeply inelastic scattering (DIS) of electrons on nuclei at small
x
in the framework of the Color Glass ...Condensate (CGC) effective theory. The leading order (LO) computation in this framework resums leading logarithms in
x
as well as power corrections to all orders in
Q
s
,
A
2
/
Q
2
, where
Q
s
,
A
(
x
) is the nuclear saturation scale. This LO result is proportional to universal dipole and quadrupole Wilson line correlators in the nucleus. In the soft photon limit, the Low-Burnett-Kroll theorem allows us to recover existing results on inclusive DIS dijet production. The
k
⊥
and collinearly factorized expressions for prompt photon production in DIS are also recovered in a leading twist approximation to our result. In the latter case, our result corresponds to the dominant next-to-leading order (NLO) perturbative QCD contribution at small
x
. We next discuss the computation of the NLO corrections to inclusive prompt photon production in the CGC framework. In particular, we emphasize the advantages for higher order computations in inclusive photon production, and for fully inclusive DIS, arising from the simple momentum space structure of the dressed quark and gluon “shock wave” propagators in the “wrong” light cone gauge
A
−
= 0 for a nucleus moving with
P
N
+
→ ∞.
Spin-transfer torque (STT) mechanisms in vertical and lateral spin valves and magnetization reversal/domain wall motion with spin-orbit torque (SOT) have opened up new possibilities of efficiently ...mimicking "neural" and "synaptic" functionalities with much lower area and energy consumption compared to CMOS implementations. In this paper, we review various STT/SOT devices that can provide a compact and area-efficient implementation of artificial neurons and synapses. We provide a device-circuit-system perspective and envision design of an All-Spin neuromorphic processor (with different degrees of bio-fidelity) that can be potentially appealing for ultra-low power cognitive applications.
In this letter, we investigate the design space of hysteresis-free negative capacitance FETs (NCFETs) by performing a cross-architecture analysis using HfZrO x ferroelectric (FE-HZO) integrated on ...bulk MOSFET, fully depleted SOI-FETs, and sub-10-nm FinFETs. Our simulation analysis shows that FDSOI and FinFET configurations greatly benefit the NCFET performance due to their undoped body and improved gate-control, which enables better capacitance matching with the ferroelectric. A low-voltage NC-FinFET operating down to 0.25 V is predicted using ultra-thin 3-nm FE-HZO.
Non-Boolean computing based on emerging postCMOS technologies can potentially pave the way for low-power neural computing platforms. However, existing work on such emerging neuromorphic architectures ...have either focused on solely mimicking the neuron, or the synapse functionality. While memristive devices have been proposed to emulate biological synapses, spintronic devices have proved to be efficient at performing the thresholding operation of the neuron at ultra-low currents. In this work, we propose an All-Spin Artificial Neural Network where a single spintronic device acts as the basic building block of the system. The device offers a direct mapping to synapse and neuron functionalities in the brain while inter-layer network communication is accomplished via CMOS transistors. To the best of our knowledge, this is the first demonstration of a neural architecture where a single nanoelectronic device is able to mimic both neurons and synapses. The ultra-low voltage operation of low resistance magneto-metallic neurons enables the low-voltage operation of the array of spintronic synapses, thereby leading to ultra-low power neural architectures. Device-level simulations, calibrated to experimental results, was used to drive the circuit and system level simulations of the neural network for a standard pattern recognition problem. Simulation studies indicate energy savings by ~400× in comparison to a corresponding digital/ analog CMOS neuron implementation.
Large-scale digital computing almost exclusively relies on the von Neumann architecture, which comprises separate units for storage and computations. The energy-expensive transfer of data from the ...memory units to the computing cores results in the well-known von Neumann bottleneck. Various approaches aimed toward bypassing the von Neumann bottleneck are being extensively explored in the literature. These include in-memory computing based on CMOS and beyond CMOS technologies, wherein by making modifications to the memory array, vector computations can be carried out as close to the memory units as possible. Interestingly, in-memory techniques based on CMOS technology are of special importance due to the ubiquitous presence of field-effect transistors and the resultant ease of large-scale manufacturing and commercialization. On the other hand, perhaps the most important computation required for applications such as machine learning, etc., comprises the dot-product operation. Emerging nonvolatile memristive technologies have been shown to be very efficient in computing analog dot products in an in situ fashion. The memristive analog computation of the dot product results in much faster operation as opposed to digital vector in-memory bitwise Boolean computations. However, challenges with respect to large-scale manufacturing coupled with the limited endurance of memristors have hindered rapid commercialization of memristive-based computing solutions. In this paper, we show that the standard 8 transistor (8T) digital SRAM array can be configured as an analoglike in-memory multibit dot-product engine (DPE). By applying appropriate analog voltages to the read ports of the 8T SRAM array and sensing the output current, an approximate analog-digital DPE can be implemented. We present two different configurations for enabling multibit dot-product computations in the 8T SRAM cell array, without modifying the standard bit-cell structure. We also demonstrate the robustness of the present proposal in presence of nonidealities such as the effect of line resistances and transistor threshold voltage variations. Since our proposal preserves the standard 8T-SRAM array structure, it can be used as a storage element with standard read-write instructions and also as an on-demand analoglike dot-product accelerator.
Spiking neural networks (SNNs) with a large number of weights and varied weight distribution can be difficult to implement in emerging in-memory computing hardware due to the limitations on crossbar ...size (implementing dot product), the constrained number of conductance states in non-CMOS devices and the power budget. We present a sparse SNN topology where noncritical connections are pruned to reduce the network size, and the remaining critical synapses are weight quantized to accommodate for limited conductance states. Pruning is based on the power law weight-dependent spike timing dependent plasticity model; synapses between pre- and post-neuron with high spike correlation are retained, whereas synapses with low correlation or uncorrelated spiking activity are pruned. The weights of the retained connections are quantized to the available number of conductance states. The process of pruning noncritical connections and quantizing the weights of critical synapses is performed at regular intervals during training. We evaluated our sparse and quantized network on MNIST dataset and on a subset of images from Caltech-101 dataset. The compressed topology achieved a classification accuracy of 90.1% (91.6%) on the MNIST (Caltech-101) dataset with 3.1X (2.2X) and 4X (2.6X) improvement in energy and area, respectively. The compressed topology is energy and area efficient while maintaining the same classification accuracy of a 2-layer fully connected SNN topology.
In this letter, we propose one-transistor ferroelectric NOR type (Fe-NOR) non-volatile memory based on HfZrOx ferroelectric FETs (FeFETs). The enhanced drain-channel coupling in ultra-short channel ...FeFETs is utilized to dynamically modulate the memory window of storage cells, thereby resulting in simple erase-, program-, and read-operations. The simulation analysis predicts sub-1V program/erase voltages in the proposed Fe-NOR memory array and, therefore, presents a significantly lower power alternative to conventional FeRAM and NOR flash memories.
Over the past few years, Spiking Neural Networks (SNNs) have become popular as a possible pathway to enable low-power event-driven neuromorphic hardware. However, their application in machine ...learning have largely been limited to very shallow neural network architectures for simple problems. In this paper, we propose a novel algorithmic technique for generating an SNN with a deep architecture, and demonstrate its effectiveness on complex visual recognition problems such as CIFAR-10 and ImageNet. Our technique applies to both VGG and Residual network architectures, with significantly better accuracy than the state-of-the-art. Finally, we present analysis of the sparse event-driven computations to demonstrate reduced hardware overhead when operating in the spiking domain.