The Network is The Computer
Yesterday as I mowed the lawn I was listening to the latest episode of The Talk Show, which discussed the two rumoured hardware announcements at Apple’s now-imminent World-Wide Developer Conference. All eyes are understandably on the AR/VR headset, but there’s also good reason to expect them to complete the transition to Apple Silicon with a new Mac Pro. The show dealt with these as two distinct developments, but on reflection I think there’s a connection between them: connection.
Since John Ternus described the Mac Pro as a story for another day, a persistent question has been how does it go beyond the Mac Studio? A key factor in what makes Apple silicon perform so well is the fact that everything is integrated into a single package. The higher end parts are essentially more of the same blocks, but still in a single package, and the M1 Ultra has already seemingly pushed this approach as far as it can go. Moreover, expandability has previously been what sets the Mac Pro apart. How does that work in the world of Apple Silicon?
On the headset side, one interesting aspect of the rumoured specs is that it will contain two M2s. Leaving aside the why, the answer to the how might be the same as that for the above question. Both point towards the idea that Apple is developing some kind of high-speed device-level interconnect.
Does that mean Apple is abandoning its approach of extreme integration at the silicon level? Far from it — such an interconnect would be the next step along the path, allowing everything to be in a single chip1 where that makes sense, but not being limited where it doesn’t. In effect, instead of being a souped-up iPhone, the Mac Pro would be a network of souped-up iPhones in a single device, in an interesting inversion of the old Sun Microsystems slogan at the top of this post2.
However fast such an interconnect is, though, it can never be as fast as staying on the same piece of silicon. This kind of non-uniform architecture is nothing new. The most obvious example is the memory hierarchy (registers, cache, RAM, swap), but a more directly relevant example would be the first multi-socket AMD x64 systems of the early 2000s. Each CPU had its own directly connect pool of RAM, and while another CPU could address that memory, that request had to go via the “owning” CPU and was thus a lot slower. These details were abstracted away, but if you cared about performance you needed to pay attention to them. While this complexity is tempting to avoid, it’s often the right trade-off.
Another, more modern, example of non-uniform computing is the GPU. Not only do these often have their own RAM3, but a completely separate programming model. If you’re not abstracting away the complexity with a high level framework like Unity or TensorFlow, this requires far more investment and effort that simply keeping an eye on processor affinity. The benefits of accelerated computing are such that, for a lot of applications, this is more than worth it.
Even GPUs, though, don’t scale up infinitely. Nvidia, the leader in that space, has moved from selling chips, to cards, to integrated systems with optimised internal interconnects. Beyond this, CEO Jensen Huang is emphasising “data center level computing”, adding another level to the hierarchy.
For the most demanding types of computing, this kind of fractal approach has always been a feature. Occasionally, the particulars of implementation mask it at certain levels, but I suspect that with a new Mac Pro (and almost incidentally with the headset) Apple will be bringing it back to the fore. We’ll see tomorrow.
I’m using “chip” loosely here — like many modern devices, Apple Silicon is actually multiple semiconductor dies tightly integrated into a single package. This represents, of course, another point on the interconnect continuum, a half-step between SoC and PCB. [back]
By coincidence, this slogan also came up in The Talk Show, but in the context of ambient, or ubiquitous, computing. This is a very interesting angle (for me in particular) on both the headset and computing more generally, but that’s for another day. [back]
Apple Silicon’s GPUs are a notable exception to this — the CPU, GPU and other accelerators all use a common pool of RAM. Programmed correctly, this gives a big performance boost by avoiding the need to shunt data to and from GPU RAM. This stark difference in model is often cited as a reason why external GPUs don’t make sense for an Apple Silicon Mac Pro, but I don’t see any reason why the two models couldn’t coexist in the same machine, especially in the increasingly common case of using the GPU for non-graphical tasks. [back]