Multivalued logic (MVL) computing could provide bit density beyond that of Boolean logic. Unlike conventional transistors, heterojunction transistors (H‐TRs) exhibit negative transconductance (NTC) ...regions. Using the NTC characteristics of H‐TRs, ternary inverters have recently been demonstrated. However, they have shown incomplete inverter characteristics; the output voltage (VOUT) does not fully swing from VDD to GND. A new H‐TR device structure that consists of a dinaphtho2,3‐b:2′,3′‐fthieno3,2‐bthiophene (DNTT) layer stacked on a PTCDI‐C13 layer is presented. Due to the continuous DNTT layer from source to drain, the proposed device exhibits novel switching behavior: p‐type off/p‐type subthreshold region /NTC/ p‐type on. As a result, it has a very high on/off current ratio (≈105) and exhibits NTC behavior. It is also demonstrated that an array of 36 of these H‐TRs have 100% yield, a uniform on/off current ratio, and uniform NTC characteristics. Furthermore, the proposed ternary inverter exhibits full VDD‐to‐GND swing of VOUT with three distinct logic states. The proposed transistors and inverters exhibit hysteresis‐free operation due to the use of a hydrophobic gate dielectric and encapsulating layers. Based on this, the transient operation of a ternary inverter circuit is demonstrated for the first time.
A new heterojunction transistor structure used to demonstrate a high‐performance ternary inverter circuit is described. As the proposed heterojunction transistors provide a high on/off current ratio, a low contact resistance, hysteresis‐free operation, and negative transconductance behavior, the ternary inverter using them exhibits full VDD‐to‐GND swing, three distinct logic states, and transient operation.
The importance of implementing an efficient convolutional neural network (CNN) is increasing. A weight-sharing spiking CNN inference system (WS-SCNN) employing efficient convolution layers (ECLs) is ...proposed and modeled to enable the compact convolutional processing of the spiking neural network (SNN) inference. The proposed ECL efficiently maps convolutional features between inputs and filter weights. The ECL does not replicate the synaptic filter array with respect to input sliding, which minimizes the number of synaptic devices required to implement hardware SNNs. A four-bit weight quantization capability of a fabricated charge-trap flash synaptic device is used to verify the accurate multiplication and summation of weights in the ECL. Moreover, a nine-layer WS-SCNN consisting of multiple ECLs is modeled, and the benefits of the WS-SCNN in terms of the area and energy are evaluated. Simulation results show that the WS-SCNN has 5.68 and 103.5 times higher energy and area efficiency than conventional SCNN systems, respectively.
We introduce a new clocking approach for digital systems to achieve better resilience to process, voltage, and temperature (PVT) variations. The proposed scheme is based on elastic clock methodology ...that uses locally generated clocks and elastic handshaking control, thereby achieving efficient and fast adaptation to the variations. However, the elastic clock-based design still requires a significant amount of timing margins due to delay mismatch between the critical path and the replica path for local clock generation, thus reducing the advantages of the elastic clock. We propose a timing error correction scheme tailored to the elastic clock methodology to eliminate such an extra timing margin. We implement an encryption/decryption core in 28-nm CMOS technology for silicon verification. Measurement results show that the proposed scheme reduces energy consumption by 35% and achieves 3.86<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> higher performance over the margined baseline design.
We introduce an area/energy-efficient precision-scalable neural network accelerator architecture. Previous precision-scalable hardware accelerators have limitations such as the under-utilization of ...multipliers for low bit-width operations and the large area overhead to support various bit precisions. To mitigate the problems, we first propose a bitwise summation, which reduces the area overhead for the bit-width scaling. In addition, we present a channel-wise aligning scheme (CAS) to efficiently fetch inputs and weights from on-chip SRAM buffers and a channel-first and pixel-last tiling (CFPL) scheme to maximize the utilization of multipliers on various kernel sizes. A test chip was implemented in 28-nm CMOS technology, and the experimental results show that the throughput and energy efficiency of our chip are up to 7.7<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> and 1.64<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> higher than those of the state-of-the-art designs, respectively. Moreover, additional 1.5-3.4<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> throughput gains can be achieved using the CFPL method compared to the CAS.
Binary neural networks (BNNs) largely reduce the memory footprint and computational complexity, so they are gaining interests on various mobile applications. In the BNNs, the first layer often ...accounts for the largest part of the entire computing time because the layer usually uses multi-bit multiplications. However, traditional hardware designed for BNN computing focuses primarily on the rest layers, resulting in significant performance degradation. In this brief, we introduce Binaryware architecture which achieves the high-performance computation on both the first and rest layers. Experimental results show that our Binaryware improves the throughput per compute area by 1.5-<inline-formula> <tex-math notation="LaTeX">13.3\times </tex-math></inline-formula> on various BNN workloads.
In this letter, we demonstrate the conductive-bridging RAM (CBRAM) with excellent multi-level cell (MLC) and linear conductance characteristics for an artificial synaptic device of neuromorphic ...systems. Our findings show that inherent characteristics of CBRAM can achieve the linear conductance and MLC characteristics as a product of an integer unit of the conductance. However, uncontrolled metal-ion injection into the switching layer results in a significant degradation of device uniformity, leading to degradation in the classification accuracy. Thus, we introduce a multi-layer CBRAM configuration (Cu/HfO 2 /Ta/Cu 2 S/W) to control the ionic motion in electrolytes. As a result of device engineering, highly improved classification accuracy is achieved using CIFAR-10 data set.
Multilevel metal interconnects are crucial for the development of large-scale organic integrated circuits. In particular, three-dimensional integrated circuits require a large number of vertical ...interconnects between layers. Here, we present a novel multilevel metal interconnect scheme that involves solvent-free patterning of insulator layers to form an interconnecting area that ensures a reliable electrical connection between two metals in different layers. Using a highly reliable interconnect method, the highest stacked organic transistors to date, a three-dimensional organic integrated circuits consisting of 5 transistors and 20 metal layers, is successfully fabricated in a solvent-free manner. All transistors exhibit outstanding device characteristics, including a high on/off current ratio of ~10
, no hysteresis behavior, and excellent device-to-device uniformity. We also demonstrate two vertically-stacked complementary inverter circuits that use transistors on 4 different floors. All circuits show superb inverter characteristics with a 100% output voltage swing and gain up to 35 V per V.
Artificial neural network (ANN) computations based on graphics processing units (GPUs) consume high power. Resistive random-access memory (RRAM) has been gaining attention as a promising technology ...for implementing power-efficient ANNs, replacing GPU. However, nonlinear I-V characteristics of RRAM devices have been limiting its use for ANN implementation. In this letter, we propose a method and a circuit to address issues due to the nonlinear I-V characteristics. We demonstrate the feasibility of the method by simulating its application to multiple neural networks, from multi-layer perceptron to deep convolutional neural network based on a typical RRAM model. Results from classifying datasets including ImageNet show that the proposed method produces much higher accuracy than the naive linear mapping for a wide range of nonlinearity.
We present a 9T1C SRAM cell-based capacitive computing-in-memory circuit for neural network computation. The proposed design improves tolerance against process variation with a smaller cell area ...compared to previous capacitive SRAM CIM designs while inheriting the advantage of capacitive SRAM CIM hardware such as the linearity in multiply-accumulate (MAC) results and suppression of the static readout current. We also demonstrate a compact and low-power ADC for CIM readout, which improves the energy efficiency significantly. Finally, we demonstrate a programmable on-chip ADC reference voltage generator circuit for adjusting the ADC input range using bitcell replica arrays. The proposed circuit reduces the ADC bit-resolution requirement by considering the distribution of MAC results, and also helps to address the effect of the parasitic bitline capacitance. Measurement results show that a <inline-formula> <tex-math notation="LaTeX">128\times 128 </tex-math></inline-formula> macro fabricated in a 28 nm CMOS achieves 1519.5 TOPS/W at 0.7 V.
Ultra-low voltage operation of memory cells has become a topic of much interest due to its applications in very low energy computing and communications. However, due to parameter variations in scaled ...technologies, stable operation of SRAMs is critical for the success of low-voltage SRAMs. It has been shown that conventional 6T SRAMs fail to achieve reliable subthreshold operation. Hence, researchers have considered different configuration SRAMs for subthreshold operations having single-ended 8T or 10T bit-cells for improved stability. While these bit-cells improve SRAM stability in subthreshold region significantly, the single-ended sensing methods suffer from reduced bit-line swing due to bit-line leakage noise. In addition, efficient bit-interleaving in column may not be possible and hence, the multiple-bit soft errors can be a real issue. In this paper, we propose a differential 10T bit-cell that effectively separates read and write operations, thereby achieving high cell stability. The proposed bit-cell also provides efficient bit-interleaving structure to achieve soft-error tolerance with conventional Error Correcting Codes (ECC). For read access, we employ dynamic DCVSL scheme to compensate bitline leakage noise, thereby improving bitline swing. To verify the proposed techniques, a 32 kb array of the proposed 10T bit-cell is fabricated in 90 nm CMOS technology. The hardware measurement results demonstrate that this bit-cell array successfully operates down to 160 mV. For leakage power comparison, we also fabricated 49 kb arrays of the 6T and the proposed 10T bit-cells. Measurement results show that the leakage power of the proposed bit-cell is close to that of the 6T (between 0.96x and 1.22x of 6T).