DDR4 SDRAM introduced a new hierarchy in DRAM organization: bank-group (BG). The main purpose of BG is to increase I/O bandwidth without growing DRAM-internal bus-width. We, however, found that other ...benefits can be derived from the new hierarchy. To achieve the benefits, we propose a new DRAM architecture using the BG-hierarchy, leading to a creation of BG-Level Parallelism (BGLP). By exploiting BGLP, the overall parallelism grows in DRAM operations. We also argue that BGLP is a feasible solution in the cost-sensitive DRAM industry because the additional cost is negligible and only cost-insensitive area needs to be modified.
In-DRAM Data Initialization Seol, Hoseok; Shin, Wongyu; Jang, Jaemin ...
IEEE transactions on very large scale integration (VLSI) systems,
2017-Nov., 2017-11-00, Volume:
25, Issue:
11
Journal Article
Peer reviewed
Initializing memory with zero data is essential for safe memory management. However, initializing a large memory area slows down the system significantly. The most likely cause for initialization to ...slow down the system is the limited DRAM initialization method. At present, the only way to initialize DRAM area is to execute multiple WRITE commands. However, the WRITE command slows the initialization because of its small granularity and data bus occupancy. In this brief, we propose an efficient in-DRAM initialization method inspired by the internal structure and operation of DRAM. The proposed method, called row reset, uses a DRAM row buffer to zero out a single DRAM row at a time. Row Reset allows for parallel initialization on multiple DRAM banks without using off-chip data transfer, thus reducing initialization time by up to 63 times. Row reset is a practical approach, because it can be implemented with existing circuitry in DRAM without additional area overhead.
Rank-Level Parallelism in DRAM Shin, Wongyu; Jang, Jaemin; Choi, Jungwhan ...
IEEE transactions on computers,
2017-July-1, 2017-7-1, 20170701, Volume:
66, Issue:
7
Journal Article
Peer reviewed
DRAM systems are hierarchically organized: Channel-Rank-Bank. A channel is connected to multiple ranks, and each rank has multiple banks. This hierarchical structure facilitates creating parallelisms ...in DRAM. The current DRAM architecture supports bank-level parallelism; as many rows as banks can be moved simultaneously at bank-level. However, rank-level parallelism is not supported. For this reason, only one column can be accessed at a time, although each rank has its own data bus that can carry a column. Namely, current DRAM operations do not exploit the structural opportunity created by multiple ranks. We, therefore, propose a novel DRAM architecture supporting rank-level parallelism. Thereby, as many columns as ranks can be moved concurrently at rank-level. In this paper, we illustrate the rank-level parallelism and its benefit in DRAM operations.
It is widely known that relatively long DRAM latency forms a bottleneck in computing systems. However, DRAM vendors are strongly reluctant to decrease DRAM latency due to the additional manufacturing ...cost. Therefore, we set our goal to reduce DRAM latency without any modification in the existing DRAM structure. To accomplish our goal, we focus on an intrinsic phenomenon in DRAM: electric charge variation in DRAM cell capacitors. Then, we draw two key insights: i) DRAM row-access latency of a row is a function of the elapsed time from when the row was last refreshed, and ii) DRAM row-access latency of a row is also a function of the remaining time until the row is next refreshed. Based on these two insights, we propose two mechanisms to reduce DRAM latency: NUAT-1 and NUAT-2. NUAT-1 exploits the first key insight and NUAT-2 exploits the second key insight. For evaluation, circuit- and system-level simulations are performed, which show the performance improvement for various environments.
A feedback edge combiner is proposed for the duty-cycle corrector (DCC) of a delay locked loop (DLL) to increase the range of allowed input duty cycle. The feedback edge combiner generates the rising ...edge of a DCC output at the rising edge of an input clock. It generates the falling edge of the DCC output at the rising edge of a feedback clock that is a half-period-delayed signal of the DCC output. A dual-delay-line digitally controlled delay line (DCDL) is used for seamless boundary switching. The chip area of the DCDL is reduced by around 46% by employing the architecture of two short coarse delay lines followed by a fine phase mixer (FPM) and a long coarse delay line in series instead of the architecture of two long coarse delay lines followed by an FPM. The measurements on the chip fabricated in the 65-nm CMOS show the allowed input duty cycle in the range from 20% to 80%; root-mean-square and peak-to-peak jitters of 2.69 and 14.0 ps, respectively, at 2 GHz and 1.2 V; and the operating frequency range from 0.12 to 2.0 GHz at 1.2 V. The measured power consumption is 3.3 mW/GHz at 1.2 V. The chip area is 0.059 mm 2 .
To support smart and converged services for autonomous vehicles, reliable vehicular ad hoc networks play important roles. To achieve this goal, this article proposes an application-layer overlay ...platform for hyper-connected vehicular ad hoc networks. By effectively exploiting redundancy of communication among vehicles and infrastructures on the road, this article aims to provide reliable communications among vehicles with high performance. Furthermore, the proposed approach is built at the application layer, and hence, it is compatible with existing routing protocols. Also, it minimizes communication and computation overheads. To verify the proposed scheme, we evaluate performance of the proposed approach compared with existing routing protocols.
Current computer systems require large memory capacities to manage the tremendous volume of datasets. A DRAM cell consists of a transistor and a capacitor, and their size has a direct impact on DRAM ...density. While technology scaling can provide higher density, this benefit comes at the expense of low drivability, due to the increase in series resistance of the smaller transistor, which slows the process of restoring the charge in cells. DRAM operations require recovery processes due to the destructive nature of DRAM cells. Among such operations, the write recovery process has the most difficulty in meeting the timing constraints. In this paper, we explore an intrinsic mechanism in the DRAM write operation, and find a relation between restoration and retention times. Based on our observation, we propose a practical mechanism, Relaxed Refresh with Compensated Write Recovery (RRCW), which efficiently mitigates refresh overheads by providing longer restoration periods. Furthermore, to minimize the penalty of the longer restoration, we also introduce another mechanism, Refresh-Aware Write Recovery (RAWR), which appropriately curtails longer recovery time according to the waiting time until being refreshed. Lastly, we introduce a scheduling policy to efficiently utilize RAWR. Evaluations show that the benefits of our mechanisms increase as memory intensity increases.
The relatively high latency of DRAM is mostly caused by the long row-activation time which in fact consists of sensing and restoring time. Memory controllers cannot distinguish between them since ...they are performed consecutively by a single row-activation command. If these two steps are separated, the restoring can be delayed until DRAM access is uncongested. Hence, we propose Quick-Access DRAM (Q-DRAM) which discriminates between sensing and restoring. Our approach is to allow destructive access (i.e., only sensing is performed without restoring by a row-activation command) using per-bank multiple row-buffers. We call the destructive access and per-bank multiple row-buffers quick-access and quick-buffers (q-buffers) respectively. In addition, we propose Quick-access Trigger (Q-TRIGGER) and RESTORER to utilize Q-DRAM. Q-TRIGGER makes a decision whether quick-access is required or not, and RESTORER decides when to restore the data at the destructed cell. Specifically, RESTORER detects the proper timing to hide restoring time by predicting data bus occupation and by exploiting bank-level locality. Evaluations show that Q-DRAM significantly improved performance for both single- and multi-core systems.
As DRAM data bandwidth increases, tremendous energy is dissipated in the DRAM data bus. To reduce the energy consumed in the data bus, DRAM interfaces with symmetric termination, such as Pseudo Open ...Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted in modern DRAMs. In interfaces using asymmetric termination, the amount of termination energy is proportional to the hamming weight of the data words. In this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases the hamming weight of data words, leading to a reduction in energy consumption in the modern DRAM data bus. Since smaller hamming weight of the data words also reduces switching activity, switching energy and power noise are also both reduced. BD-Encoding exploits the similarity in data words in the DRAM data bus. We observed that similar data words (i.e. data words whose hamming distance is small) are highly likely to be sent over at similar times. Based on this observation, BD-coder stores the data recently sent over in both the memory controller and DRAMs. Then, BD-coder transfers the bitwise difference between the current data and the most similar data. In an evaluation using SPEC 2006, BD-Encoding using 64 recent data reduced termination energy by 58.3% and switching energy by 45.3%. In addition, 55% of the LdI/dt noise was decreased with BD-Encoding.
Multiple clone row DRAM Choi, Jungwhan; Shin, Wongyu; Jang, Jaemin ...
2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA),
06/2015
Conference Proceeding
Several previous works have changed DRAM bank structure to reduce memory access latency and have shown performance improvement. However, changes in the area-optimized DRAM bank can incur large ...area-overhead. To solve this problem, we propose Multiple Clone Row DRAM (MCR-DRAM), which uses existing DRAM bank structure without any modification.
Our key idea is Multiple Clone Row (MCR), in which multiple rows are simultaneously turned on or off to consist of a logically single row. MCR provides two advantages which enable our low-latency mechanisms (Early-Access, Early-Precharge and Fast-Refresh). First, MCR increases the speed of the sensing process by increasing the number of sensed-cells. Thus, it enables a READ/WRITE command to an MCR to be issued earlier than possible for a normal row (Early-Access). Second, DRAM cells in an MCR exhibit more frequent refreshes without additional REFRESH commands, thereby reducing the amount of charge leakage during the refresh interval for the identical cell. The reduced amount of charge leakage enables a PRECHARGE command to be served before the activated-cells are fully restored (Early-Precharge) and a REFRESH operation to be completed before the refreshed-cells are fully restored (Fast-Refresh).
Even though MCR-DRAM sacrifices memory capacity for low-latency, it can be dynamically reconfigured from low-latency to full-capacity DRAM. MCR-DRAM improves both performance and energy efficiency for both single-core and multi-core systems.