Out on a Limb
I’ve been intending to write more here, but most of what is happening in the world feels too big to tackle. To ease myself in gently, I decided to concentrate on relative trivialities, and you don’t get a lot more trivial than Apple predictions that will be defunct in a day.
The murmur of rumours that Apple is about to transition the Mac line to in-house ARM processors has built to a crescendo over recent months, and the general consensus is that they’re going to announce this transition at next week’s remote WWDC. The commentariat have the general, plausible predictions covered, so here’s a long shot — unlikely to come to pass, but interesting to think about.
The Mac has already gone through not one but two transitions of processor architecture (68000 to PowerPC to x86), and in both cases offered an emulation layer to allow software compiled for the old architecture to run on the new. This avoids early adopters of the new platform facing a wilderness of applications until developers catch up.
Despite this, there’s overwhelming scepticism that Apple will pull the sane trick this time around, for the simple reason that it doesn’t seem feasible. Intel’s Core chips were sufficiently faster than the contemporary PowerPC G4s that they could offer an acceptable emulation experience, and the earlier PowerPCs were so much faster than 68000s that the emulator soon offered better performance. In contrast, goes the standard reasoning, while Apple’s A-series ARM processors are certainly impressive they don’t provide the order-of-magnitude improvement over the current Intel chips to make emulation any more than a miserable experience.
However, I think this misses a key point. What if Apple’s new Mac chips — let’s call them “X-series” — can run x86 code natively? Their existing, highly optimised cores could be paired with a translation layer that took in an x86 instruction stream and converted it into the ARM instruction set. This idea isn’t as absurd as it may seem at first glance, for a number of reasons.
Firstly, this is essentially how CISC architectures like x86 are implemented anyway — translation of the incoming instructions to microcode for actual execution. In fact, this structure was one of the key motivations for RISC architectures, as described by John Hennessy in a recent interview on ACM ByteCast:
I think Dave and I both also had exposure to the primary way in which many computers and even mainframes were designed then, using lots of microcode. I think we both looked at it and said, “Well this machine is doing a lot of things at runtime that could be done at compile time with less overhead and more efficiency. So why not just do it then, simplify the instruction set. Why not make the micro instructions the instruction set rather than add an extra level of interpretation. I think that was a great insight.
While the distinction between RISC and CISC is less than clear-cut in modern designs, ARM’s origins in the above mindset make it a plausible candidate to serve as CISC microcode.
Secondly, there’s ample precedent in an adjacent field — games consoles. The SuperNES and MegaDrive both contained pretty much complete hardware versions of their predecessors (the NES and Master System, respectively), as did the early models of the PlayStation 3. It doesn’t seem likely that Apple would include a separate x86 chip alongside an ARM, as that blows away the cost and power saving advantages of the transition. However, it validates the idea of including dedicated hardware to support backwards compatibility.
Finally, and more importantly, there are numerous precedents to be found in the history of ARM itself. ARM processors have long supported multiple distinct instruction sets at once. Beyond the standard 32 and 64 bit variants, there have been several iterations of the Thumb instruction, a related but distinct instruction set that trades flexibility for code compactness.
A bigger departure, and a closer analogue to a hypothetical x86 layer is Jazelle. This was an optional extension to the architecture that allowed processors to execute Java bytecode directly. While the Java Virtual Machine is far simpler than x86, the same principles could be applied.
One of these principles was using software to fill in the gaps. The most common instructions were supported directly in hardware, but obscure features or corner cases raise an interrupt, falling back to a software implementation. Unlike, say, Qualcomm or Samsung, Apple are in an excellent position to follow this lead, due to their oft-cited advantage of controlling both the hardware and software. Designing both in tandem is their bread and butter.
Apple’s control of the OS gives them a further advantage — they can tailor exactly how they support running x86 code. Rosetta, the technology used for the PowerPC to Intel transition, was implemented as a just-in-time translation layer that the OS wired in to processes running binaries that hadn’t been recompiled. This architecture would allow the system to work around any gaps in the hardware emulation in advance.
To approach things from the other direction, the software translation layer could convert x86 code into ARM, augmented by additional instructions for operations that can’t be translated to efficient RISC code. This would allow the additional hardware to be limited to just those instructions, and seems like the most attractive option if the emulation is only a transitional measure while developers port their applications to the new architecture.
Of course, this kind of hybrid hardware/software emulation is easy to conjecture about, but a major effort to actually implement. Apple may have decided that the cost isn’t justified, or they may have made a positive choice not to offer x86 compatibility for some broader strategic reason. Hence, I won’t be surprised if tomorrow they unveil an ARM macOS that won’t run old software. If they do offer backwards compatibility, I’ll be very curious to read about the details. Either way, it’ll be an interesting step for ARM.
Update 2020-06-28: So, I got the day wrong, and this article preceded the announcement by a matter of hours, rather than a day and a half. It looks like I was right about the what, in that the new ARM Macs will indeed run x86 code via a translation (Rosetta 2), but not the how. I’ve not seen anything to suggest that Rosetta 2 takes advantage of any specific hardware support, and the fact that the Developer Transition Kit is based on an existing iPad processor (the A12Z) heavily implies that it doesn’t.
That said, the DTK is most definitely not representative of ARM Mac hardware that will eventually be released as a product. It’s entirely possible that the Mac-specific Apple Silicon will include features to accelerate Rosetta 2. Whatever the case, the overall news from WWDC is making it increasingly likely that the Apple device I buy this year won’t be a phone.
Update 2020-11-26: Perhaps I wasn’t so far off after all; to quote an excellent thread from @erratarob on Twitter:
So Apple simply cheated. They added Intel’s memory-ordering to their CPU. When running translated x86 code, they switch the mode of the CPU to conform to Intel’s memory ordering.
Not full, native x86 support by any means, but a smart approach: spending die area on the pain points, and do the rest in software. Whatever the how, the what is rock-solid, performant emulation. I’ve been using an M1 Macbook Air for a few days now, and one of the many things that’s impressive is that x86 software just works — no glitches or weird behaviour, and at the speed you’d expect. This was an area where the ARM transition could have bitten Apple hard, but they’ve exceeded even my most optimistic expectations. A tremendous feat of engineering.