Low power 3D-IC is well-suited to mobile systems; however, it poses a number of challenges associated with thermal stress, particularly in designs with many stacked layers. The use of a low supply ...voltage (VDD) and power-down mode help to reduce the power consumption of 3D-ICs, while alleviating aging and thermal effects. These solutions require low-voltage memory and power-down circuitry. Memristor-based logic provides good state retention and restore for power-down operation, and resistive RAM (ReRAM) uses a lower write voltage than conventional Flash memory. This paper reviews design challenges associated with low-voltage SRAM, memristor logic, and ReRAM. We also propose a novel scheme involving homogeneous memory with heterogeneous VDD (HMHV) to further reduce the power consumption of 3D-ICs comprising multiple memory layers.
Resistive random access memory (RRAM) is a promising new non-volatile memory technology capable of operating at low power as well as high speed. Although RRAM is capable of lower energy consumption ...and substantially more cycles than Flash memory, comprehending and maintaining its ability to store data under stressed conditions remains the key challenge for mainstream acceptance. This in large part is due to the filamentary nature of the RRAM element at the nanoscale. A filament-based resistive memory is based on the formation of current-conducting path (filaments) from defects, e.g., oxygen vacancies. The defects often lead to trap-limited current conduction. Without proper process control or RESET algorithms, unwanted defects may be added near the filaments under device stress, further aggravating the resistance instabilities.
This paper proposes a write resistance tracking circuit (WRTC) to improve the memory window of HfOx-based resistive memory. With a 50-ns single voltage pulse, the minimal resistance of the high ...resistance state in the 1-kb array of resistive switching elements can increase from 25kΩ to 65kΩ by using the proposed verify circuit. The WRTC uses the transition current detection method based on the feedback of the memory cell to control the write driver. The WRTC achieves distinct bistable resistance states, avoids the occurrence of over-RESET, and enhances the memory window of the RRAM cell.
This paper proposes a write resistance tracking circuit (WRTC) to improve the memory window of HfOx-based resistive memory. With a 50-ns single voltage pulse, the minimal resistance of the high ...resistance state in the 1-kb array of resistive switching elements can increase from 25kOmega to 65kOmega by using the proposed verify circuit. The WRTC uses the transition current detection method based on the feedback of the memory cell to control the write driver. The WRTC achieves distinct bistable resistance states, avoids the occurrence of over-RESET, and enhances the memory window of the RRAM cell.
This brief presents an event-driven keyword spotting (KWS) system for reducing the significant but usually ignored energy dissipations on the "always-on" A/D converter and microphone. "Low energy per ...inference" and "fast responsiveness" are new design goals of such KWS engine. A 7-layer 1-dimensional binarized convolutional neural network (1D-BCNN) was designed to achieve 95% inference accuracy for detecting 10 keywords, plus silence and unknown, from raw speech, and 64 32-element signed binary inner product units were allocated in the engine to deliver the 4,096 operations/cycle maximum throughput. The 16nm implementation consumes only 0.1mm2 silicon area and 5μJ/inference energy (including memory accesses), while achieving 1.72ms response time. The performance is comparable to state-of-the-art KWS designs without sacrificing number of detectable keywords or inference accuracy.
Many big-data (BD) processors reduce power consumption by employing ternary content-addressable-memory (TCAM) 1-2 with pre-stored signature patterns as filters to reduce the amount of data sent for ...processing in the following stage (i.e., wireless transmission). To further reduce standby power, BD-processors commonly use nonvolatile memory (NVM) to back up the signature patterns of SRAM-based TCAM (sTCAM) 3 during power interruptions or frequent-off operations. However, this 2-macro (sTCAM + NVM) scheme suffers long delays and requires considerable energy for wake-up operations, due to the word-by-word serial transfer of data between NVM and TCAM macros. Most of the signature patterns are seldom updated (written); therefore, single-macro nonvolatile TCAM (nvTCAM) can be used for BD-processors to reduce area and facilitate fast/low-power wake-up operations, compared to the 2-macro approach. Previous nvTCAMs were designed using diode-connected 4T2R with STT-MTJ (D4T2R) 4, 2T2R with PCM 5, and 4T2R with ReRAM 2; however, they suffer the following issues: (1) large cell area (A) and high write energy (E w ) due to the use of two NVM (2R) devices; (2) limited word-length (WDL, /k-bits) caused by small current-ratio (I-ratio= I ML-MIS /(K×I ML )) between match-line (ML) mismatch current (I ML-MIS ) and ML leakage current of k matched cells (k × I ML-MIS ); (3) Long search delays (T SD ) and excessive search energy (E s ) due to large ML parasitic load (C ML ) and small I-ratio. ReRAM is promising for nvTCAM due to its low E w , high resistance-ratio (R-ratio), and multiple-level cell (MLC) capability. To overcome issue (1) to (3), this study develops an MLC-based 3T1R nvTCAM with bi-directional voltage-divider control (BVDC). A 2×64×64b 3T1R nvTCAM macro is fabricated using back-end-of-line (BEOL) ReRAM 6 and a 90nm CMOS process, with 2.27× cell size reduction as compared with sTCAM using the same technology and the T SD (=0.96ns) for WDL=64b.
The wake-up procedure that demands prolonged time and a high electric field poses a significant obstacle to the aggressive scaling of the ferroelectric (FE) Hf 0.5 Zr 0.5 O 2 (HZO) thickness. A ...comprehensive understanding of its origin is thus imperative. A new mechanism, referred to as interfacial-layer soft breakdown (IL-SBD), is proposed to elucidate the wakeup behavior in the ultrathin HZO capacitor. Compelling experimental evidence is presented to support the critical role of IL and its SBD. A multi-domain FE wake-up model is developed that incorporates defect generation, trap-assisted tunneling within the IL, and charge screening at the IL/HZO interface. Remarkably, this model accurately reproduces the trend of thickness-dependent wake-up behavior, emphasizing the utmost significance of IL optimization in ultrathin HZO.
Computing-in-memory (CIM) is renowned in deep learning due to its high energy efficiency resulting from highly parallel computing with minimal data movement. However, current SRAM-based CIM designs ...suffer from long latency for loading weight or feature maps from DRAM for large AI models. Moreover, previous SRAM-based CIM architectures lack end-to-end model inference. To address these issues, this paper proposes CIMR-V, an end-to-end CIM accelerator with RISC-V that incorporates CIM layer fusion, convolution/max pooling pipeline, and weight fusion, resulting in an 85.14% reduction in latency for the keyword spotting model. Furthermore, the proposed CIM-type instructions facilitate end-to-end AI model inference and full stack flow, effectively synergizing the high energy efficiency of CIM and the high programmability of RISC-V. Implemented using TSMC 28nm technology, the proposed design achieves an energy efficiency of 3707.84 TOPS/W and 26.21 TOPS at 50 MHz.
This paper presents a unique opportunity of HZO ferroelectric tunnel junction (FTJ) for in-memory computing. The device operates at an extremely low sub-nA current while simultaneously achieving ...50-ns fast switching, > 10 7 cycling endurance, > 10-yr retention, minimal variability, and analog state modulation. We analyze an FTJ-based deep binary neural network. It achieves better accuracy and remarkable 702, 101, and 7×10 4 times improvements in power, area, and energy-area product efficiency compared with those using NVMs with a typical μA cell current designed for fast memory access.