Saturday, April 24, 2010

Cycle Accurate Computer Emulation

Background

    A computer emulator is a program that imitates the functionality of another computer. This article discusses techniques primarily addressing needs of emulators that mimic old game consoles and computer systems. These emulators allow users to get close to the same look and feel of an old computer using a modern PC.

 
Emulator running the game Metal Gear
 
    Computer emulation requires a fair amount of resources on the host computer. Early emulators had to implement several shortcuts to speed up the emulation in order to run games at normal speed on a PC.

    One common optimization was to break the emulated time into chunks that are small enough to not have a major impact to the appearance but large enough to allow optimizing the implementation of each emulated subsystem. These chunks of time were called cyclic tasks and they were often the length of a video scan line. On 80’s computers, a video scan line is usually a few hundred CPU cycles. During each scan line cycle, the emulator is advancing the time by emulating the behavior of each subsystem independently of each other.

    This approach works pretty well and in many cases a user wouldn’t be able to tell much difference between a real old system and the emulated system. But when looking more closely to the output of the emulated system, you would see several small graphical glitches where the wrong pixels are shown on the screen. The emulator will basically fail to update the screen accurately if any change to the graphics is made in the middle of a scan line. Even worse cases occur as well, when a game may rely on the synchronization between  subsystems such as timing of status register changes. This may lead to a game or application to fail to run at all.

Issues

    Some modern emulators try to address these issues and avoid artifacts caused by running subsystems independently for fixed relatively large chunks of time. The most obvious way to address this would perhaps be to advance the emulated time with one CPU cycle at the time and make sure all peripheral devices are updated each cycle. Although this is technically possible, it would require too much resource from the host computer and not even a modern PC would be able to emulate an 80’s computer at normal speed.

    Another approach to handle synchronization between subsystems without artifacts and glitches is to keep the idea of executing each subsystem independently from each other as much as possible, but recognize the points in time when subsystems interact.

    The basic idea is to identify synchronization points where two subsystems interact and advance the emulated time of the subsystems independently to that point in time. There are several possible synchronization points:
  • Writing to memory shared by subsystems (eg. Video RAM)
  • Reading from memory mapped I/O
  • Reading from and writing to I/O ports
  • Interrupts
    All these synchronization points occur within the emulated system, but there is also some synchronization required between the emulated system and the host system, in particular:
  • Video rendering
  • Audio playback
Solution

    An easy way to accommodate the need for dynamic synchronization points is to build the synchronization around a timeout service. The timeout service would allow subsystems to register timeouts at points in time when a subsystem will perform an operation that may affect other subsystems. For example a video system may set up a timeout to occur when the horizontal blank status bit is modified, or a DMA device could set up a timeout when the DMA transfer is complete. 

    The timeout service will basically manage all scheduled timeouts and pick the timeout that is nearest into the future. The timeout service will then tell each subsystem that requires synchronization at that time to run their emulation forward to the point in time of the timeout. When the emulated time is advanced, the timeout service will pick the next timeout and repeat the operation.

    With most peripheral devices, the next synchronization point is well known, but in the main CPU, it may be harder to know when the CPU access a shared resource. The reason is of course that the CPU executes a program that has alternative flows that are not known ahead of time. In a single CPU system this is not really an issue. The CPU can be the first subsystem to run to the next synchronization point, and if it needs to access shared resources, it inserts a new synchronization point at the time of the interaction. The timeout service will then make sure all devices are synchronized before the CPU access the shared resource. 

Conclusion

    This approach is similar to the early emulators in the way that subsystems advance the emulated time independently. The big difference is that nowadays, a host computer is powerful enough to dynamically set these synchronization points instead of using few statically defined synchronization points.

    These and other techniques are used in the blueMSX emulator (www.bluemsx.com) which is a cycle accurate emulator for Z80 based computer systems.

0 comments:

Post a Comment