How does Rosetta 2 work?

I'd love to understand more about how Rosetta 2 works. The Apple Developer article is brief. Has anyone done a deep analysis on how Rosetta 2 works, how it is invoked and whether it's possible to use it via an API?

Some questions:

  • How are x86_64 applications launched under Rosetta?
  • Is it possible to dynamically invoke translation for a portion of x86 instructions?
  • Might it be possible to bridge Rosetta to QEMU or similar to allow fast virtualization of Intel Docker images?

Solution 1:

Rosetta 2 works by doing an ahead-of-time (AOT) translation of the Intel code to corresponding ARM code. It is able to do this efficiently and easily mainly because the M1 CPU contains a special instruction that switches the memory-ordering model observed by the CPU for that thread into a model equivalent to the Intel x86 model (TSO - total store order). This has to do with how programs can expect memory consistency to work when having multiple processors (i.e. cores in this case).

User's can observe the translation the first time they launch an Intel app on the M1 as the first launch is slow. The translated code is cached and used on subsequent, much faster launches.

If you have a binary that is valid for several different architectures, you can specifically invoke Rosetta 2 by specifying that you want to launch the Intel code. You can do that from the terminal like this:

arch -x86_64 ./mycommand

Note that this setting also applies to any program that the "mycommand" process should choose to run.

Rosetta 2 as delivered by Apple in macOS Big Sur is not setup to dynamically invoke translation for a portion of x86 instructions. It is entirely focused on doing an AOT translation of the whole binary in advance. There's no user interface for translating a few small set of instructions on the fly. Rosetta 2 does include a JIT engine that allows translating instructions on the fly (for example if you run an Intel-based browser with a JIT JavaScript engine) - it is however not a general purpose JIT-engine that you could use for other purposes through an API or similar.

If you want to do that for research purposes or just out of "pure interest", then you could just take the instructions you want to translate and add them to a simple application shell (essentially adding them to a simple main()-only C program for example) and run it. The cached, translated version of the program then includes the translated instructions for inspection.

The cache is available in these folders:

/var/db/oah/
/System/Library/dyld/aot_shared_cache

There's no immediate way of "bridging" Rosetta 2 to QEMU to allow fast virtualization of Intel Docker images. QEMU contains its own Intel x86 emulation, so you could get it to run Intel Docker images on the M1 without involving Rosetta 2 at all. In this case, "fast" is a very subjective measure.

Solution 2:

This is an answer for a deep analysis on how Rosetta 2 works.

I have reverse-engineered Rosetta 2 a little bit. For more details, see the GitHub pages of this project.

Cache files of Rosetta 2 are located at both /System/Library/dyld/aot_shared_cache and /var/db/oah/.

/System/Library/dyld/aot_shared_cache contains only translated results for System dylibs (e.g., /usr/lib/system/libsystem_blocks.dylib, /usr/lib/system/libxpc.dylib You can see the full list here for macOS Big Sur version 11.1). This file contains multiple cache files for System dylibs, so its file size is huge.

For non-system binaries such as third-party x86_64 binaries, the files with aot extension under /var/db/oah contain the translated results.

aot_shared_cache is a (big) file, are there any tools available to extract specific content?

@nohillside I have created a simple python script to show the contents of aot_shared_cache. You can use this script to extract specific content.