Tuesday, May 1, 2012

Using R to Analyze, Manipulate and Generate graphs from Periodic Statistics

As most of you might know that MARSS provides a simple interface to generate periodic dump of selected statistics counters.   You can easily input this periodic stats file into Excel or similar application and generate graphs or process the data according to your requirements.  But many times, due to long simulation runs, the size of this data grows significantly large and most of the spreadsheet applications crashes or become too slow to handle it.  Recently I ran into similar issue while generating some results for our research paper where I needed to plot the data from CSV file and also perform some simple equations on selected columns.  Then I discovered 'The R-Project' which provides a great environment to plot and manipulate large amount of data.  This blog will show you how to use R to generate periodic graphs and perform simple mathematics operations on one or more columns.

Periodic graph of L2 Cache Miss Ratio generated using R

Generating Periodic Stats file

In MARSS code call 'enable_periodic_dump()' on the StatObject, then compile and give -time-stats-logfile TIME_STATS_FILENAME and -time-stats-period 100K options in your simconfig file to dump value of your stats counter for every 100K cycle interval.  The output file is in plain CSV format and contains sim_cycle count as first column and one stats counter per column.

Reading a file and Plotting a simple graph

To start with R, install the core packages using apt-get or yum and invoke the interactive shell by typing 'R' in your terminal.  To read in the CSV file using 'readfile.csv()' function:

> data <- read.csv(YOUR_CSV_FILENAME)


In 'R' assigning a value to variable is done using '<-' symbol.  Now 'data' variable contains all our columns and can be accessed using following two ways:

> data[["sim_cycle"]]

> data$sim_cycle


Above two command will print the "sim_cycle" column from our periodic statistics file.  Now lets assume that your periodic stats file has columns for L2 cache read-hit.  You can easily print the L2 cache read-hit column using following command:

> data[["base_machine.L2_0.cpurequest.count.hit.read.hit"]]


Printing this data on commandline is not so useful (!) so lets plot it:

> plot(data[["base_machine.L2_0.cpurequest.count.hit.read.hit"]])


This will show a plot with default plotting options printing one 'dot' for each datapoint.  The x-axis of this plot is the row number for each data-point, but for analysis we want to see L2 cache read-hit with sim_cycle counter.  Use following command to plot L2 cache read-hit count with sim_cycle on x-axis:

> plot(data[["sim_cycle"]], data[["base_machine.L2_0.cpurequest.count.hit.read.hit"]])

Simple Operations on Multiple Columns

Once you understand how to access various data columns in R, its pretty simple to perform some simple mathematics operations on one or more column.  For example to generate L2 cache miss rate plot:

> l2_read_hit <- data[["base_machine.L2_0.cpurequest.count.hit.read.hit"]]

> l2_write_hit <- data[["base_machine.L2_0.cpurequest.count.hit.write.hit"]]

> l2_read_miss <- data[["base_machine.L2_0.cpurequest.miss.read"]]

> l2_write_miss <- data[["base_machine.L2_0.cpurequest.miss.write"]]


# Calculate total hit/miss and total access

> l2_hit <- l2_read_hit + l2_write_hit

> l2_miss <- l2_read_miss + l2_write_miss

> l2_access <- l2_hit + l2_miss


# Now generate miss ratio and plot it

> l2_miss_ratio <- l2_miss / l2_access

> plot(data[["sim_cycle"]], l2_miss_ratio, type="l")

# type="l" will print lines instead of dots


Above example shows how easy it is to generate periodic graph of L2 miss ratio for your simulation results using periodic stats and R.

EDIT : Here is a sample R script that generates IPC, Cache miss rate and Cache miss ratio graphs from periodic dump files.

References for R




1 comment:

  1. Hello,
    Thanks for your post, but I really do have a question regarding MARSS
    as I can see from the recent paper it supports full-system simulation which means it should be able to run whatever OS over the Qemu but do it support simulation up to kilo-cores, or even can I run it over many machines so I can create my own distributed version then I will make use of all simulated cores over these machines as a one kilo-core simulated environment. If not would you recommend any of those working sims like your especially supports a full-system simulation like that one you did

    ReplyDelete