Near-Memory Processing: It's the Hardware AND Software, Stupid!


Boris Grot, University of Edinburgh, GB


Conventional computing systems are increasingly challenged by the need to process rapidly growing volumes of data, often at online speeds. One promising way to boost compute efficiency is through Near-Memory Processing (NMP), which integrates light-weight compute logic close to the memory arrays. NMP affords massive bandwidth to the memory-resident data and dramatically reduces energy-hungry data movement.  A key challenge for effectively leveraging NMP is that today's high-performance data processing algorithms have been designed for CPUs with powerful cores, large caches, and bandwidth-constrained memory interfaces. Meanwhile, NMP architectures are limited to simple logic and small caches while offering abundant memory bandwidth. Hence, achieving high efficiency with NMP requires a careful algorithm-hardware co-design to maximize bandwidth utilization given a highly constrained area and power budget. I will describe one instance of such a co-designed NMP architecture for data analytics, and show that it reaps significant performance and energy-efficiency advantages over both CPU-based and baseline NMP systems.