The Resistive Random Access Memory (RRAM) is a new type of non-volatile memory based on the resistive memory device. Researchers are currently moving from resistive device development to memory ...circuit design and implementation, hoping to fabricate memory chips that can be deployed in the market in the near future. However, so far the low manufacturing yield is still a major issue. In this paper, we propose defect and fault models specific to RRAM, i.e., the Over-Forming (OF) defect and the Read-One-Disturb (R1D) fault. We then propose a March algorithm to cover these defects and faults in addition to the conventional RAM faults, which is called March C*. We also develop a novel squeeze-search scheme to identify the OF defect, which leads to the Stuck-At Fault (SAF). The proposed test algorithm is applied to a first-cut 4-Mb HfO 2 -based RRAM test chip. Results show that OF defects and R1D faults do exist in the RRAM chip. We also identify specific failure patterns from the test results, which are shown to be induced by multiple short defects between bit-lines. By identifying the defects and faults, designers and process engineers can improve the RRAM yield in a more cost-effective way.
Previous SRAM-based computing-in-memory (SRAM-CIM) macros suffer small read margins for high-precision operations, large cell array area overhead, and limited compatibility with many input and weight ...configurations. This work presents a 1-to-8-bit configurable SRAM CIM unit-macro using: 1) a hybrid structure combining 6T-SRAM based in-memory binary product-sum (PS) operations with digital near-memory-computing multibit PS accumulation to increase read accuracy and reduce area overhead; 2) column-based place-value-grouped weight mapping and a serial-bit input (SBIN) mapping scheme to facilitate reconfiguration and increase array efficiency under various input and weight configurations; 3) a self-reference multilevel reader (SRMLR) to reduce read-out energy and achieve a sensing margin 2<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> that of the mid-point reference scheme; and 4) an input-aware bitline voltage compensation scheme to ensure successful read operations across various input-weight patterns. A 4-Kb configurable 6T-SRAM CIM unit-macro was fabricated using a 55-nm CMOS process with foundry 6T-SRAM cells. The resulting macro achieved access times of 3.5 ns per cycle (pipeline) and energy efficiency of 0.6-40.2 TOPS/W under binary to 8-b input/8-b weight precision.
Many mobile SoC chips employ a "two-macro" approach including volatile and nonvolatile memory macros (i.e. SRAM and Flash), to achieve high-performance or low-voltage power-on operation with the ...capability of power-off nonvolatile data storage. However, the two-macro approach suffers from slow store/restore speeds due to word-by-word serial transfer of data between the volatile and nonvolatile memories. Slow store/restore speeds require long power-on/off time and leave the device vulnerable to sudden power failure . This study proposes a resistive memory (memristor) based nonvolatile SRAM (or memristor latch) cell to achieve fast bit-to-bit parallel store/restore operations, low store/restore energy consumption, and a compact cell area. This resistive nonvolatile 8T2R (Rnv8T) cell includes two fast-write memristor (RRAM) devices vertical-stacked over the 8T, and a novel 2T memristor-switch, which provides both memristor control and SRAM write-assist functions. The write assist feature enables the Rnv8T cell to use read favored transistor sizing to prevent read/write failure at lower VDDs. We also fabricated the first macro-level memristor-based (or RRAM-based) nonvolatile SRAM. This 16 Kb Rnv8T macro achieved the lowest store energy and R/W VDDmin (0.45 V) of any nonvolatile SRAM or two-macro solution.
Existing nonvolatile ternary content-addressable-memory (nvTCAM) suffers from limited word-length (WDL), large write-energy (E W ) and search-energy (E S ), and large cell area (A). This paper ...develops a 3T1R nvTCAM cell using a single multiple-level cell (MLC)-resistive RAM (ReRAM) device to achieve long WDL, lower E W and E S , and reduced cell area. Two peripheral control schemes were developed, dual-replica-row selftimed and invalid-entry power consumption suppression (IEPCS), for the suppression of dc current in 3T1R nvTCAM cells in order to reduce E S . Two versions of the IEPCS scheme were developed (basic and charge-recycle-controlled) to alter the tradeoff between area overhead and power consumption in the updating of invalid-bits. A 128 b × 64 b 3T1R nvTCAM macro was fabricated using back-end-of-line ReRAM under 90-nm CMOS process. The fabricated MLC-based 3T1R nvTCAM macro achieved sub-1-ns search-delay and sub-6-ns wake-up time with supply voltage of 1 V and WDL = 64 b.
ReRAM is a promising next-generation nonvolatile memory (NVM) with fast write speed and low-power operation. However, ReRAM faces two major challenges in read operations: 1) low read yield due to ...wide resistance distribution and 2) the requirement of accurate bit line (BL) bias voltage control to prevent read disturbance. This study proposes two process-variation-tolerant schemes for current-mode read operation of ReRAM: parallel-series reference-cell (PSRC) and process-temperature-aware dynamic BL-bias (PTADB) schemes. These schemes are meant to improve the read speed and yield of ReRAM, while taking read disturbance into consideration. PSRC narrows the reference current distribution to achieve high read yield against resistance variation. PTADB achieves small fluctuations in BL bias voltage to prevent read disturbance, while providing rapid BL precharge speeds. This study fabricated a 4-Mb ReRAM macro to confirm the effectiveness of the proposed schemes for both SLC and MLC operations. The fastest sub-8-ns (7.2 ns) read-write random access time among megabit scaled embedded NVM macros has been demonstrated.
This paper outlines the RC-filtered stress-decoupled (RCSD) 4T2R nonvolatile TCAM (nvTCAM) with the following benefits: 1) reduced NVM-stress; 2) reduced ML parasitic load; and 3) suppression of ...match-line (ML) leakage current from match cells. The RCSD-4T2R cell achieves a 6× reduction in NVM-stress, a 2× increase in maximum wordlength, and a 2× reduction in search delay. In this paper, we also outline two search schemes, referred to as dynamic source-line pulse controlled (DSL-PC) search and dataline-pulse controlled (DL-PC) search, which were developed specifically for the RCSD-4T2R nvTCAM. We fabricated a 128 × 32 b RCSD-4T2R nvTCAM macro with HfO ReRAM using a 180 nm CMOS process. Using the DSL-PC and DL-PC schemes, the measured search delay of the RCSD-4T2R nvTCAM macro was 1.2 ns under typical VDD.
Abstract
Magneto-static stray field (
H
stray
) interactions become an important issue when perpendicular CoFeB/MgO magnetic tunnel junctions (MTJs) are miniaturized. This raises the issue of which ...of the two mainstream etching processes, the pillar structure and the step structure, is better able to retain MTJ performance at extremely small scales. In the current study, we first simulated
H
stray
effects as a function of Ruderman–Kittel–Kasuya–Yosida strength within a synthetic antiferromagnetic structure for the two structures. Our results revealed that
H
stray
interactions were less influential (in terms of offset field) in step MTJs than in pillar MTJs during MTJ miniaturization. This is in good agreement with experimental results. This finding is further supported by adding Dzyaloshinskii–Moriya interactions into the free-layer of the two structures. We further simulated thermal stability with the inclusion of
H
stray
for 30 nm MTJs. We found that adding etching damage effects (i.e. assuming both anisotropy constant and saturation magnetization of the free layer had some degree of loss) into the model of the pillar MTJ was necessary to obtain a trend that is close to the experimental results of thermal stability. This information can provide some guidance on the technical choices for the MTJ miniaturization.
Computing-in-memory (CIM) based on SRAM is a promising approach to achieving energy-efficient multiply-and-accumulate (MAC) operations in artificial intelligence (AI) edge devices; however, existing ...SRAM-CIM chips support only DNN inference. The flow of training data requires that CIM arrays perform convolutional computation using transposed weight matrices. This article presents a two-way transpose (TWT) multiply cell with high resistance to process variation and a novel read scheme that uses input-aware zone prediction of maximum partial MAC values to enhance the signal margin for robust readout. A 28-nm 64-kb TWT CIM macro fabricated using foundry-provided compact 6T-SRAM cells achieved <inline-formula> <tex-math notation="LaTeX">T_{\text {AC}} </tex-math></inline-formula> of 3.8-21 ns and energy efficiency of 7-61.1 TOPS/W in performing MAC operations using 2-8-b inputs, 4-8-b weights, and 10-20-b outputs.
This article presents a novel static random access memory computing-in-memory (SRAM-CIM) structure designed for high-precision multiply-and-accumulate (MAC) operations with high energy efficiency ...(EF), high readout accuracy, and short compute latency. The proposed device employs 1) a time-domain incremental-accumulation (TDIA) scheme to enable high-accumulation MAC operations while maintaining a large signal margin across MAC values (MACVs), 2) a dynamic differential-reference (D2REF) scheme based on software-hardware co-design to reduce read energy consumption, and 3) a low-dMACV-aware recursive time-to-digital converter (LMAR-TDC) for implementation with the D2REF scheme to further suppress readout energy consumption. A 28 nm 1 Mb SRAM-CIM macro fabricated using foundry-provided compact 6T-SRAM cells achieved EF of 39.31 TOPS/W and compute latency of 6.6 ns for 8b-MAC operations with 64 accumulations per cycle and near-full output precision (22b).
This work reports the complete framework from device to architecture for deep learning acceleration in an all-spin artificial neural network (ANN) built by highly manufacturable STT-MRAM technology. ...The most compact analog integrate-and-fire neuron reported to date is developed based on the back-hopping oscillation in magnetic tunnel junctions. This novel device is unique because it performs numerous essential neural functions simultaneously, including current integration, voltage spike generation, state reset, and 4-bit precision. The device itself is also a stochastic binary synapse, and thus eases the implementation of the compact all-spin ANN with high accuracy for online training.