What's the proper implementation for hardware emulation?

I'm going to programme a Game Boy emulator (Z80 is the CPU in case somebody is not familiar with it), and while I was doing my research, I've found some things I'm not so sure about.

The first one was that C is the programming language to choose here. That's not so much of a problem, but I'd like to hear your opinion from today's point of view. Even C++ was not recommended.

The second thing I found out was that everybody was using one function per opcode. That seems logical since it's just one function call and probably better optimised than having one function for the "ADD" instruction and then you've got to find out what registers are used here. But how necessary is that today? Is it something I should stick to or should I rather rewrite my emulator if I notice that another way which might be more convenient just doesn't cut it (more or less modern gaming consoles pop into my mind right now)?

Also, it's kind of demotivating to write a function for "add that register to this register" over and over again. Is there a way to automate that from an opcode map or something like that?


I mostly agree with WingsOfIcarus. I wrote a few emulators already so here is my insight:

  1. The use of function pointers is a good idea (for speed and clarity of code)
  2. OOP is not a problem

    Yes, member calls are a little bit slower, but if you are careful it will not affect performance too much. On the other hand, OOP emulation code is much better to manage/read/understand.

  3. Use an instruction database instead of fixed instruction decoding.

    I am using a single text file which consist of all the necessary information for all instructions. The emulator parses it during initialization (feeds the arrays of function pointers and operands...). In this architecture it is very easy to correct errors in the instruction set without any code change.

    Complex instruction sets documentation are almost always faulty to some point. The worst case is Z80 (I have never see a 100% error-free instruction set). So use more instruction sets, compare them and create an error-free set (if you can).

  4. Add sound, video, keyboard and mouse to your emulation

    This is usually not a problem. On Windows use WaveOut instead of DirectSound. It's more stable, much faster (usable latencies of DSound are sometimes even > 400 ms). With WaveOut I was able to lover latency to 20-80 ms which is OK.

  5. Apply limit speed by T cycles of emulated CPU per second

    I am using machine cycle correct timings which is much slower, but allows me to correctly implement any hardware periphery emulations as (FDC, DMAC, sound chips,...without any hacks)

  6. Apply load/save of files for the emulated platform

For example, this is part of my instruction set (which is directly fed to CPU emulation:

opc      T0 T1 MC1   MC2   MC3   MC4   MC5   MC6   MC7   mnemonic

B8       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,B
B9       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,C
BA       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,D
BB       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,E
BC       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,H
BD       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,L
BE       07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,(HL)
BF       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,A
C0       11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET NZ
C1       10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 POP BC
C2L2H2   10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP NZ,U16
C3L1H1   10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP U16
C4L2H2   17 10 M1R 4 MRD 3 MRD 4 MWR 3 MWR 3 ... 0 ... 0 CALL NZ,U16
C5       11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 PUSH BC
C6U2     07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 ADD A,U8
C7       11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 RST 00H
C8       11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET Z
C9       10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET
CAL2H2   10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP Z,U16

opc:    operation code [hex]
        L1,H1,U1,S1 means first operand direct number or address
        L2,H2,U2,S2 means second operand direct number or address
        L3,H3,U3,S3 means third operand direct number or address
        H,L ... U16 high and low byte
        U   ... U8 unsigned byte
        S   ... S8 signed byte

T0      normal instruction duration [T] always 2 decimal digits
T1      instruction duration if condition not met [T] always 2 decimal digits

MC1++   Machine cycle first is type,second is duration [T] always 1 decimal digit
        ...     unused
        M1R     M1 cycle
        MRD     memory read
        MWR     memory write
        IOR     IO read
        IOW     IO write
        NON     no external operation (internal computation)
        INT     interrupt cycle

mnem    instruction text (mnemonic)
  • opc is used for the address in an array of pointers
  • mnemonic is used to select the proper function pointer, and operands type
  • T0 and T1 are used for instructions timing (this is enough for rough emulations)
  • MC1++ are used for correct MC timings (to implement correct hardware emulation and contentions timing)

Here is my Zilog Z80A complete instruction set with machine cycle timing link for download. Feel free to use (just mention my nick somewhere). After porting to this I was finally able to 100% pass the ZEXALL test. For more info see Writing a graphical Z80 emulator in C or C++.


First suggestion, you shouldn't use nested switch statements, you should rather use array of function pointers, alot faster -> better emulation, and nicer code, nested switch-es can also get a bit messy, here are some links where you can read more about these arrays
http://www.newty.de/fpt/fpt.html
http://www.multigesture.net/wp-content/uploads/mirror/zenogais/FunctionPointers.htm

Second suggestion, Yes you can do it in C#, Java, C++, but since you want every single bit of your CPU cycles so you can get as close emulation as possible - emulating one CPU cycle of target architecture with least number of CPU cycles on curret architecture, and OOP isn't so good in this case from what I heard/read from people. One of the things is performance, and second is pretty much obvious, emulation is, as you probably noticed, really complex task and wraping it in OOP can be unnecessary pain in the neck.


Here's a pretty cool implementation of working with some opcodes for an NES emulator:

http://bisqwit.iki.fi/jutut/kuvat/programming_examples/nesemu1/

Here's the accompanying youtube videos that have a little more explanation as to what's going on

http://www.youtube.com/watch?v=y71lli8MS8s

It uses C++ templates and some additional C++11 features. As to whether you choose C++ or C that is up to you but it shouldn't really matter a whole lot. If you're just emulating a gameboy I doubt that speed is going to be an issue on modern processors so try to just use whatever you're comfortable with.