Monday, March 12, 2012

Fast Forwarding and Simpoint support in MARSS

One of the most requested feature in MARSS has been fast-forwarding N instructions before starting simulation. Recently we have developed some hooks into QEMU's translation logic to count number of instructions emulated. With this logic we are now able to support fast-forwarding and simpoint in MARSS. Both of these features are available in features branch right now.

Fast-Forwarding
Following new simconfig options are added for fast-forwarding support in MARSS.
  • -fast-fwd-insns N: Fast-forward N number of total instructions. This includes user level and kernel level instruction across all emulated CPUs. After specified limit is reached it will switch to simulation mode.
  • -fast-fwd-user-insns N: Fast-forward N number of user level instructions. This mode will emulate kernel level instructions but doesn't count them. After specified user level instructions are executed it will switch to simulation mode and it will simulate both user and kernel level instructions.
  • -fast-fwd-checkpoint CHK_NAME: Create a checkpoint named 'CHK_NAME' after fast-forwarding specified amount of instructions. It will kill simulation instance after a checkpoint is created.
As mentioned above the default behavior is to switch to simulation mode after specified instructions are emulated. Because of non-deterministic behavior of kernel level execution in emulation mode we also support creating a checkpoint after fast-forwarding so users can use fast-forwarding only once and then rely on checkpoint to start simulation at specific RIP everytime.

For multicore emulation/simulation, fast-forwarding specified amount of instruction is little complicated because many times only few of CPUs are executing any code and others are in idle mode. At first MARSS equally divides total number of instructions to emulate and allocate them to all CPUs in machine. When part of those CPUs are in idle mode then MARSS will re-allocate portion of remaining instruction count from idle CPUs to non-idle CPUs to reach to specified instructions limit.

Simpoint
Simpoints has been one of the most used and reliable method in computer-architecture research to simulate part of applications that is representative of full application run. We decided to implement support for Simpoints to evaluate performance of different applications with real-hardware, but more on that later. MARSS uses simpoint file to create checkpoints after specified instructions are emulated. To create checkpoints based on simpoints and how to use weights file and mstats.py to calculate weighted IPC please refer to wiki page on Simpoints.

Counting Emulated Instructions
QEMU's emulation engine - TCG is designed to be fast which first convert instruction into list of micro-instructions and then convert these micro-instructions into binary buffer that can be executed and re-executed without any change. This makes little bit difficult to directly count the number of instructions emulated.

Recently we developed support for Simpoints in which we need to count number of instructions emulated and create a checkpoint after specific number of instructions. During this development we figured out a way to add hooks into each translated binary block to count number of instructions emulated at run-time. We added a counter to each CPU context which is set to number of instructions a CPU context is allowed to execute. In each translated block we added a hook that decrements CPU context's instruction counter by number of instructions in the block. When the counter reaches to 0 we switch to simulation mode or create a checkpoint based on user's configuration options.

1 comment:

  1. Thanks for your introduction, it helps me a lot. I has a problem about "-fast-fwd-user-insns N".
    In QEMU, if i set checkpoint and also use fast forward command. The benchmark will be started
    from checkpoint and then do fast forward for N insts. But according to your description, "users can use fast-forwarding only once and then rely on checkpoint to start simulation at specific RIP everytime", it seems that if I set checkpoint and also use fast forward command, MARSS will directly be in simulation mode instead of doing fast forward, isn't it? How can I do if I want to do fast forward form the check point in the benchmark like what I have done in QEMU? Thanks so much for your replying in advance.

    ReplyDelete