VPC3: A High-Performance Trace Compression Utility

This page is a mirror of http://www.csl.cornell.edu/~burtscher/research/tracecompression/index.html

This work has largely been superseded by TCgen.

 

 

Contact

 

Prof. Martin Burtscher

Computer Systems Laboratory

School of Electrical and Computer Engineering

Cornell University, Ithaca, NY 14853

 

E-mail: burtscher @ csl.cornell.edu

Web: http://www.csl.cornell.edu/~burtscher/

 

 

Overview

 

In 2003, we introduced VPC1 and VPC2, two trace compression algorithms that are based on value predictors.  Both algorithms are quite effective on extended traces (i.e., traces that capture PCs plus other information) but suffer from a relatively slow decompression speed.

 

VPC3 fixes this shortcoming.  It employs a brand new compression algorithm that compresses and decompresses extended program traces faster than other algorithms while at the same time achieving a much higher compression rate.

 

The key idea behind VPC3 is to use value predictors to expose and enhance the patterns in the traces.  It does this by splitting the original trace into four streams (subtraces).  Bzip2 is then used to compress the four streams.

 

 

Downloading and Using VPC3

 

The Source Code of VPC3 and a Sample Trace are available for download and use under these Software License Terms and Conditions.  The provided code compresses and decompresses traces that consist of a 4-byte header (which has to spell “vpc3” in ASCII) followed by one or more trace entries, each of which comprises a 4-byte PC field and an 8-byte extended data (ED) field.

 

4-byte header; 4-byte PC1, 8-byte ED1; 4-byte PC2, 8-byte ED2; …

 

 

Sample Results

 

The table below shows the geometric-mean compression rate, compression time, and decompression time of three popular compression algorithms as well as VPC3.  The traces we used for this study were taken from the SPECcpu2000 benchmark programs and capture the PC and the effective address of all load and store instructions that miss in a simulated 16kB, direct-mapped, 64-byte line, write-allocate L1 data cache.

 

 

gzip

bzip2

sequitur

vpc3

compression rate

7.0

11.5

21.0

38.2

compression time (min)

12.1

7.6

7.0

2.1

decompression time (min)

0.23

1.30

0.80

0.76

 

 

VPC3’s Operation

 

Below is a simple example illustrating how VPC3 compresses each PC and extended data (ED) value from the uncompressed trace.  For simplicity, the specific internals of each of the predictors have been left out.  Note that the numbers next to the arrows represent bus widths and not PC or ED values.  VPC3 converts a trace into four output streams, each of which is then compressed with bzip2.  The four compressed streams can optionally be concatenated into one file.

 

 

Let us assume that the current PC value in the trace is 1024 and the corresponding ED value is 16.  The four predictors VPC3 uses to predict PCs make their predictions and compare them to the actual PC.  In our example, predictor1’s prediction of 1024 matches the current PC value.  The selector then writes that predictor’s identification code (1) to the first stream.

 

Then the current PC value is used to index into the ED predictors.  Each predictor outputs its current prediction and compares it to the true current ED value.  Of the ten predictors VPC3 uses to predict ED values, predictor10’s prediction of 16 is a match.  The selector writes the corresponding predictor code (10) to the third stream.

 

Finally, all the predictors are updated with the true PC or ED value before moving on to the next PC/ED pair in the trace.

 

 

Let us assume the next PC/ED pair in the trace is 8200/385.  Again, the four PC predictors output four predictions.  However, for this PC none of the predictors produces a match.  The selector thus writes a special code (0) to the first stream to flag the value as unpredictable, and the unpredictable PC value (8200) is written to the second stream.

 

Once more, the current PC value is used to index into the ED predictors.  This time none of the ED predictors produces a correct prediction.  Hence, the selector writes a special code (0) to the third stream, indicating an unpredictable value, and the actual ED value (385) is recorded in the fourth stream.

 

Then, all the predictors are updated appropriately before moving on to the next PC/ED pair in the trace.

 

 

Related Publications

 

M. Burtscher.  “VPC3: A Fast and Effective Trace-Compression Algorithm.”  Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’04).  June 2004.  [abstract] [pdf]

 

M. Burtscher and M. Jeeradit.  “Compressing Extended Program Traces Using Value Predictors.”  12th International Conference on Parallel Architectures and Compilation Techniques (PACT’03), pp. 159-169.  September 2003.  [abstract] [pdf]

 
 

Compression Links

 

MACHE: a trace-compression utility that exploits the locality of memory references

PDATS2: a compression algorithm for traces that is based on storing only the differences between successive memory references

SEQUITUR: a trace-compression algorithm that builds a context-free grammar and allows the extraction of interesting information from the compressed trace without the need for decompression

SBC: a stream-based compression algorithm for program traces

 

GZIP: a general-purpose compression utility found on most UNIX systems

BZIP2: a general-purpose compressor providing better but slower compression than gzip

LZOP: a general-purpose compression utility providing very high compression and decompression speeds

 

General FAQ on compression

 
 

Acknowledgment

 

This work was supported in part by the National Science Foundation under Grants No. 0208567 and 0312966.

 

 


Page maintained by Martin Burtscher, Nana B. Sam, Ilya Ganusov, Sandra J. Jackson, Paruj Ratanaworabhan, and Jian Ke.

Last update: 17 June 2004