论文标题

NOM:高度银行记忆中银行间数据传输的内存网络记忆

NOM: Network-On-Memory for Inter-Bank Data Transfer in Highly-Banked Memories

论文作者

Rezaei, Seyyed Hossein SeyyedAghaei, Modarressi, Mehdi, Ausavarungnirun, Rachata, Sadrosadati, Mohammad, Mutlu, Onur, Daneshtalab, Masoud

论文摘要

数据副本是许多程序和操作系统服务中广泛使用的内存操作。在传统的计算机中,数据副本通常由两个单独的读取和写入交易进行,这些交易在DRAM芯片和处理器芯片之间来回传递。一些先前的机制建议通过使用DRAM芯片中的共享内部总线直接复制DRAM芯片中的数据(例如,在两个DRAM银行之间),避免了这种不必要的数据移动。尽管这些方法与传统技术相比表现出色,但跨不同DRAM银行的数据副本仍然比同一DRAM银行中的数据副本要慢得多。因此,这些技术对新兴的3D堆栈记忆(例如HMC和HBM)的好处有限,这些记忆(例如HMC和HBM)在多个内存控制器上包含数百个DRAM库。在本文中,我们提出了内存网络(NOM),这是一种轻巧的银行间数据通信方案,可在3D堆叠内存的两个内存库中进行直接数据复制。 NOM采用基于TDM的电路开关设计,其中电路设置由内存控制器完成。与最先进的方法相比,NOM启用了多个DRAM银行和并发数据传输操作之间的快速数据副本。我们的评估表明,与基线传统的3D堆叠DRAM架构和最新技术相比,NOM将数据密集型工作负载的性能提高了3.8倍和75%。

Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in the DRAM chip to directly copy data within the DRAM chip (e.g., between two DRAM banks). While these methods exhibit superior performance compared to conventional techniques, data copy across different DRAM banks is still greatly slower than data copy within the same DRAM bank. Hence, these techniques have limited benefit for the emerging 3D-stacked memories (e.g., HMC and HBM) that contain hundreds of DRAM banks across multiple memory controllers. In this paper, we present Network-on-Memory (NoM), a lightweight inter-bank data communication scheme that enables direct data copy across both memory banks of a 3D-stacked memory. NoM adopts a TDM-based circuit-switching design, where circuit setup is done by the memory controller. Compared to state-of-the-art approaches, NoM enables both fast data copy between multiple DRAM banks and concurrent data transfer operations. Our evaluation shows that NoM improves the performance of data-intensive workloads by 3.8X and 75%, on average, compared to the baseline conventional 3D-stacked DRAM architecture and state-of-the-art techniques, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源