For decades, the von Neumann architecture has constrained computing performance. In this model, memory and compute are physically separated, forcing data to shuttle back and forth across the chip. As AI models have grown, this bottleneck has become severe: memory bandwidth now limits performance more than raw compute power. Research shows that while peak FLOPS have scaled rapidly, DRAM and interconnect bandwidth have lagged significantly. The result is the “memory wall”.
Why AI hit the wall first
Modern AI workloads - especially large language models - require constant movement of weights and activations. Even if only 20% of data must be fetched from DRAM, the latency is enough to dominate total execution time. This means that adding more compute units no longer improves performance; they simply sit idle waiting for data.
Breaking the wall: new AI chip architectures
1. Compute‑in‑memory (CIM)
A major breakthrough highlighted in recent research integrates computation directly into memory arrays, eliminating long‑distance data movement. CIM architectures dramatically reduce energy consumption and latency by performing operations where data already resides.
2. 3D‑stacked AI chips
Another approach is vertical integration. Instead of laying memory and logic side‑by‑side, 3D chips stack them, shortening data paths from millimetres to microns. A recent study involving Stanford, MIT, and Carnegie Mellon demonstrates that monolithic 3D chips can remove the internal “distance penalty” that slows AI workloads.
3. Rethinking model and hardware co‑design
Researchers argue that AI models themselves must evolve to reduce memory traffic, aligning architectures with hardware realities.
Merging for the memory
AI chips are breaking the memory wall not by making processors faster, but by bringing memory and compute closer together - or merging them entirely. These innovations mark a fundamental shift in chip design and will define the next decade of AI performance.