Mach-O and Universal Binaries
by Jon on Friday, June 9, 2006 file under: Technology
Universal Binary

Today I was reading about the ELF file format, which Linux and many Unix-like operating systems use for their executables (programs, apps, whatever you want to call them). Even operating systems like GNU Hurd, which uses GNU Mach as its microkernel uses ELF, as opposed to Mach-O, which many Mach-based operating systems, including OS X use. So why does OS X stick with Mach-O instead of moving to ELF, like most of the other OSes? Wikipedia suggests Universal Binaries, but that's not exactly true.

Like ELF, Mach-O only supports one type of machine code per file. So if you have a Intel Mac, and I have a PowerPC mac, one Mach-O file will not work for us both. Mach-O doesn't natively support multiple architectures, so an archive format was created (recently marketed as Universal Binary), which merges multiple Mach-O files into a single file. So if I have two binaries, say [PowerPC App] and [Intel App], each in their own Mach-O files, the Universal equivalent would be similar to [i|PowerPC App|Intel App], where the i at the beginning of the file is information about which part of the file is for which architecture. The application loader can take either a standard Mach-O file, or a multiple binary file, which is denoted by a special header.

There's quite a few interesting tidbits of information surrounding these "Universal" or fat binaries. First of all, this isn't the first time Apple has packaged binaries for multiple architectures into one file. The current scheme, however, was used by NeXT to provide binaries for multiple architectures, and again by Apple in to provide support for both 32- and 62-bit PowerPC architectures, well before the term Universal Binary was being thrown around.

In 1994, when Apple moved from Motorola 68K chips to PowerPC, the migration process consisted partially of fat binaries with both 68K and PowerPC code. However, this was a much different scheme which used the resource and data fork design of Apple's file systems to separate the code blocks.

In 1996, when NeXT started branching out to x86 and RISC based processors, it developed the current scheme of combining all binaries into a single file. Anyone familiar with Apple history knows that NeXTSTEP eventually became OS X, which shared this capability but didn't utilize it until the release of the PowerMac G5. At this point, Apple again needed a method of allowing programs to use the full capabilities of the 64-bit G5, while still allowing programs to work on the 32-bit G4.

With the introduction of the Intel-based Macs, the fat binaries, as they were previously known (or to most people unknown), began to be marketed as Universal Binaries, but the truth is, the capability was always present in OS X, and was actually being used to some extent with PPC/PPC64 binaries. What's really interesting - and not very publicized - is that a Universal Binary might actually include three different binaries for the program, one for the PowerPC (G3, G4) version, one for the PowerPC 64 (G5) version and the Intel version (and I'm sure in the not to distant future, the Intel 64 version as well). Along with a decent emulation layer for old programs which weren't compiled with native code for various architectures, the idea of fat binaries makes architecture transitions transparent to end users. This fits perfectly with Apple's "it just works" policy of software and hardware.

Let's take a look at the binary structure of a Universal Binary. This example uses the binary for SubEthaEdit. I've broken the fat header down into 32-bit chunks, the size of all of the values in the fat binary header. This section starts right at the beginning of the file.

cafe babe # Fat Binary Magic Number #
0000 0002 # Number of archives      #
0000 0012 # CPU Type       ##
0000 0000 # CPU Subtype      #
0000 1000 # Archive Offset    # Archive 1
000b 71ec # Archive Size     #
0000 000c # Alignment?     ##
0000 0007 # CPU Type       ##
0000 0003 # CPU Subtype      #
000b 9000 # Archive Offset    # Archive 2
000c 2c18 # Archive Size     #
0000 000c # Alignment      ##

The first block contains the "Magic Number". This is how OS X knows what kind of binary its going to try to run. CAFEBABE, is the magic number used for fat binaries. The second number is for the number of archives in the binary. Like I said earlier, the current Apple documentation suggests that up to three binaries can be put into one file, but from the size of the field, it looks like 4294967296 binaries could be stored. (this would be of limited usefulness, however, since the offset address would run out of space much more quickly). Next come the headers for each of the binaries. The first block indicates the CPU Type, and the CPU Subtype. From this example, it would seem that 0x12 is PowerPC, and 0x7 is x86. The next block is the CPU Subtype. Again, for this example, the subtype for PowerPC is empty (0x0), and the Intel SubType is 0x3.

Next comes different information about the addressing of the archive. We see the offset, which tells us where the program starts in this file, and how large that block is. If we go to those address in the file we'll see:

          0 1  2 3  4 5  6 7  8 9  a b  c d  e f
0001000   feed face 0000 0012 0000 0000 0000 0002
00b81e0   696e 6b55 524c 4b65 7900 0000 0000 0000
00b81f0   0000 0000 0000 0000 0000 0000 0000 0000
00b9000   cefa edfe 0700 0000 0300 0000 0200 0000
017bc00   5765 6245 6c65 6d65 6e74 4c69 6e6b 5552
017bc10   4c4b 6579 0000 0000
          0 1  2 3  4 5  6 7  8 9  a b  c d  e f

As you can see, at offset 0x1000, we have the Mach-O magic number FEEDFACE, and at 0xb9000 we have... wait a second, what is CEFAEDFA? What's neat is not only does this file contain data for two different process architectures, but one is Big Endian (most significant bit), and the other is Little Endian (least significant bit). Basically, PowerPCs read chunks of data the opposite way that Intel processors read them. If we reverse the the order, of the bytes, we'll get FEEDFACE at 0xb9000, just like we expected.

What we also see is, the end of the first archive ends at 0x1000 (the offset) + 0xb71ec (the size) - 0x1 (address = size - 1), which is 0xb81eb. We see the end of the Intel binary at 0x17BC17 using the same math.

I'd imagine after this point, the Universal Binary loader passes the address off to the Mach-O loader, and away it goes, none-the wiser that the program was anything but a Mach-O file. Because native code is being run, it runs just as fast as a pure native Mach-O file. The only drawback to a fat binary is its size, which is about double the size of a loan Mach-O file. But since we know the file structure of the fat binary, we could easily trim away the fat and get a pure native binary back. In fact, there are utilities like lipo, which do just that.

Another interesting idea is that there is nothing to suggest that all the binaries in the fat binary archive need to provide similar functions. The binary could be a web browser when run on one architecture, and a solitaire game when run on another. I don't know how useful that would be, but there's no reason you couldn't do it.

So wait, lets go back to the beginning, the Wikipedia on Mach-O says that Apple uses Mach-O to support Universal Binaries, which isn't true, according to Apple documentation. The Apple documentation says a Mach-O file can only contain one type of binary, so Universal Binaries are really an archive format capable of storing multiple different types of Mach-O binaries. Pheww, that was a lot to say.

So it still begs the question why didn't Apple choose to go with ELF for their Intel binaries, like most other OS vendors? Well, after being side tracked by all this cool fat binary information, I never did find specific advantages to Mach-O or ELF. Hopefully, I'll keep reading, and if I find something, I'll let you know.

Permanent link to Mach-O and Universal Binaries

hohle.net | hohle.org | hohle.name | hohle.co.uk | hohle.de | hohle.info