Published December 11, 2024 | Version v1
Journal article Open

UpDown: A Novel Architecture for Unlimited Memory Parallelism

Description

The emergence of HBM as a high-volume memory product has made memory bandwidths of 1.2TB/s (1 stack) to 4.8TB/s (4 stacks) feasible. Exploiting such bandwidths requires high memory level parallelism, but the memory access mechanisms in today's CPUs are ill-suited. We define the Memory Parallelism Abstract Machine (MPAM) that characterizes limits of a variety of various commercial and research designs.

We propose the UpDown architecture that generates unlimited, cost-efficient memory parallelism using split-transaction accesses and a large compute-namespace to synchronize memory responses. Using MPAM, we show that UpDown can generate unlimited memory parallelism constrained only by the memory technology servicing the system and memory reference issue rate.

Our evaluation shows that the smallest compute element of UpDown, a single lane can generate up to 3.5x more memory parallelism compared to a modern out-of-order CPU core, despite its much smaller area (<1%). We also show that 64 lanes of the UpDown architecture can sustain 1,673 outstanding memory references to nearly saturate the full bandwidth of 1 HBM3e stack (1.2TB/s). Finally, we also show that UpDown is much more energy and power efficient.

Files

UpDown.pdf

Files (2.1 MB)

Name Size Download all
md5:ef437886087f7c56d62e72245ea11b2d
2.1 MB Preview Download

Additional details

Identifiers

DOI
10.1145/3695794.3695801
Other
oai:uchicago.tind.io:14326

Funding

Army Research Office
Advanced Graphical Intelligence Logical Computing Environment (AGILE) research program
National Science Foundation
CNS-1907863
National Science Foundation
Computation Innovation Fellows Award

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science