Abstract
The impact of a series resistor (R
S
) on the variability and endurance
performance of memristor was studied in the TaO
x
memristive system. A
dynamic voltage divider between the R
S
and ...memristor during both the set
and the reset switching cycles can suppress the inherent irregularity of the voltage
dropped on the memristor, resulting in a greatly reduced switching variability. By
selecting the proper resistance value of R
S
for the set and reset cycles
respectively, we observed a dramatically improved endurance of the TaO
x
memristor. Such a voltage divider effect can thus be critical for the memristor
applications that require low variability, high endurance and fast speed.
Memristor-based synaptic network has been widely investigated and applied to neuromorphic computing systems for the fast computation and low design cost. As memristors continue to mature and achieve ...higher density, bit failures within crossbar arrays can become a critical issue. These can degrade the computation accuracy significantly. In this work, we propose a defect rescuing design to restore the computation accuracy. In our proposed design, significant weights in a specified network are first identified and retraining and remapping algorithms are described. For a two layer neural network with 92.64% classification accuracy on MNIST digit recognition, our evaluation based on real device testing shows that our design can recover almost its full performance when 20% random defects are present.
Metal-insulator-metal (MIM) structures based on titanium dioxide have demonstrated reversible and non-volatile resistance-switching behavior and have been identified with the concept of the ...memristor. Microphysical studies suggest that the development of sub-oxide phases in the material drives the resistance changes. The creation of these phases, however, has a number of negative effects such as requiring an elevated voltage, increasing the device-to-device variability, damaging the electrodes due to oxygen evolution, and ultimately limiting the device lifetime. In this work we show that the deliberate inclusion of a sub-oxide layer in the MIM structure maintains the favorable switching properties of the device, while eliminating many of the negative effects. Electrical and microphysical characterization of the resulting structures was performed, utilizing X-ray and electron spectroscopy and microscopy. In contrast to structures which are not engineered with a sub-oxide layer, we observed dramatically reduced microphysical changes after electrical operation.
In-memory computing (IMC) is attracting interest for accelerating data-intensive computing tasks, such as artificial intelligence (AI), machine learning (ML), and scientific calculus. IMC is ...typically conducted in the analog domain in crosspoint arrays of resistive random access memory (RRAM) devices or memristors. However, the precision of analog operations can be hindered by various sources of noise, such as the nonlinearity of the circuit components and the programming variations due to stuck devices and stochastic switching. Here we demonstrate high-precision IMC by a custom program-verify algorithm that uses redundancy to limit the impact of stuck devices and analog slicing to encode the analog programming error in a separate memory cell. The PageRank problem, consisting of the calculation of the principal eigenvector, is shown as a reference problem, adopting a fully integrated RRAM circuit. We extend these results to also include a convolutional neural network (CNN). We demonstrate a computing accuracy of 6.7 equivalent number of bits (ENOBs). Finally, we compare our results to the solution of the same problem by a static random access memory (SRAM)-based IMC, showcasing an advantage for the RRAM implementation in terms of energy efficiency and scaling.
Vector-matrix multiplication dominates the computation time and energy for many workloads, particularly neural network algorithms and linear transforms (e.g, the Discrete Fourier Transform). ...Utilizing the natural current accumulation feature of memristor crossbar, we developed the Dot-Product Engine (DPE) as a high density, high power efficiency accelerator for approximate matrix-vector multiplication. We firstly invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array, accounting for device physics and circuit issues to reduce computational errors. The accurate device resistance programming in large arrays is enabled by close-loop pulse tuning and access transistors. To validate our approach, we simulated and benchmarked one of the state-of-the-art neural networks for pattern recognition on the DPEs. The result shows no accuracy degradation compared to software approach (99 % pattern recognition accuracy for MNIST data set) with only 4 Bit DAC/ADC requirement, while the DPE can achieve a speed-efficiency product of 1,000× to 10,000× compared to a custom digital ASIC.
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures ...have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to <inline-formula><tex-math notation="LaTeX">8.02\times</tex-math> <mml:math><mml:mrow><mml:mn>8</mml:mn><mml:mo>.</mml:mo><mml:mn>02</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq1-2998456.gif"/> </inline-formula>, <inline-formula><tex-math notation="LaTeX">54.21\times</tex-math> <mml:math><mml:mrow><mml:mn>54</mml:mn><mml:mo>.</mml:mo><mml:mn>21</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq2-2998456.gif"/> </inline-formula>, and <inline-formula><tex-math notation="LaTeX">103\times</tex-math> <mml:math><mml:mrow><mml:mn>103</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq3-2998456.gif"/> </inline-formula> energy reductions as well as <inline-formula><tex-math notation="LaTeX">7.16\times</tex-math> <mml:math><mml:mrow><mml:mn>7</mml:mn><mml:mo>.</mml:mo><mml:mn>16</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq4-2998456.gif"/> </inline-formula>, <inline-formula><tex-math notation="LaTeX">4.02\times</tex-math> <mml:math><mml:mrow><mml:mn>4</mml:mn><mml:mo>.</mml:mo><mml:mn>02</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq5-2998456.gif"/> </inline-formula>, and <inline-formula><tex-math notation="LaTeX">16\times</tex-math> <mml:math><mml:mrow><mml:mn>16</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ankit-ieq6-2998456.gif"/> </inline-formula> execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.
Memristor with tunable non-volatile resistance offers in-memory computing capability that avoids the von-Neumann bottleneck. However, large-scale experimental demonstration to this end is yet to be ...implemented due to the immaturity of the device and integration technologies. Here in this paper we report our recent process in analog computing using analog-voltage-amplitude-vector input and analog-memristor-conductance matrix, with applications in signal and image processing. The vector matrix multiplication is processed in the memristor crossbars in one step, with 5-8 bit precision depending on the array size. The demonstration is made possible by high memristor yield (99.8%), stable multilevel memresistance states, linear current-voltage (IV) relation in the operation range, and low wire resistance between the cells.