Trace-based Affine Reconstruction of Codes

This webpage contains a software implementation of the ideas presented in the paper Trace-Based Affine Reconstruction of Codes, accepted for presentation in the 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2016).

The present artifact consists of three parts:

  1. An appendix to the accepted paper: trace_reconstruction.pdf
  2. A .tgz file containing code, data and instructions as described in the appendix: trace_artifact.tgz. It contains the following files:
    1. trace_ae.py: source code of the reconstruction engine. It has been annotated with meaningful information for the artifact evaluation process. It contains all the executable parts of the artifact.
    2. polybench-c-3.2.tgz: contains annotated PolyBench sources, to give insight about how the trace library included in the Python file corresponds to memory references in PolyBench codes. These can be used to generate evaluation traces at the reviewers' discretion.
    3. workflow.ipynb: iPython Notebook containing an example session using the tool in an interactive way.
    4. workflow.html: same iPython Notebook in HTML version (this document).
    5. cholesky.4.trace.bz2: trace file used in the example notebook.
  3. A set of pre-processed trace files that can optionally be used as input to the reconstruction engine: Traces

Experiment workflow

This example describes how to use the tool in four different use cases: manual reconstruction of synthetically generated traces, usage of automatic test features included in the tool to reconstruct synthetically generated traces, reconstruction of pre-processed traces downloaded from the authors personal page, and local generation of custom traces by the reviewer. Note that this section is also available as an iPython Notebook in files $\texttt{workflow.ipynb}$ and $\texttt{workflow.html}$, packed in the artifact.

Reconstruction of synthetically generated traces

Besides the reconstruction engine, $\texttt{trace_ae.py}$ includes a trace library including all the PolyBench/C 3.2 memory references. The $\texttt{generate_trace()}$ function can be used to synthetically generate a trace equivalent to any PolyBench one in the following way:

In [ ]:
import numpy as np
import trace_ae
(a,c,d) = trace_ae.generate_trace(
               test="cholesky",
              access=4, 
             size="standard" )

The previous command will generate a trace equivalent to $\texttt{cholesky.4}$, with only two differences:

  • The base address of the access (first referenced address) is 0.
  • The base stride (data size) is 1, instead of the size in bytes of the base array data type.

As mentioned in the last paragraph of Section 3.4, these are not relevant for the reconstruction process. The $\texttt{generate_trace()}$ function returns three values:

  • $\texttt{a}$: NumPy array containing the synthetic trace.
  • $\texttt{c}$: NumPy array containing the first observed stride, used to start the reconstruction process, and calculated as:
In [ ]:
c = np.array([a[1]-a[0]])
  • $\texttt{d}$: NumPy array containing the first order differences of trace $\texttt{a}$, calculated as:
In [ ]:
d = a[2:] - a[1:-1]

Once the synthetic trace is generated, the reconstruction process is started by calling:

In [ ]:
trace_ae.extract( d, c, cut=3 )

In the previous call, the parameter $\texttt{cut}$ indicates the depth threshold for the maximum solution size to try. Since the simplest representation of the $\texttt{cholesky.4}$ reference is a 3-level loop, using $\texttt{cut}$ below 3 will not return a valid reconstruction. This is a very fast process. Using $\texttt{cut}$ above 3 will return a valid reconstruction, but it may not be a minimal one from the structural point of view (i.e., it may use loops of depth greater than 3). Besides, the number of explored branches grows exponentially with loop depth, and therefore it is not desirable to try a reconstruction with a depth greater than strictly necessary. For this reason, in order to guarantee a breadth-first traversal of the solution space, in our tests $\texttt{extract()}$ is called with incremental values of $\texttt{cut}$ until a solution is found (see Section A.4.2 in the paper).

Upon completion, $\texttt{extract()}$ returns the components of the solution, if it has been found:

  • Coefficients vector $\overrightarrow{c}$.
  • Iteration matrix $\mathbf{I}$, containing a subset of the iteration indices in the reconstruction.
  • Bounds matrix $\mathbf{U}$.
  • Bounds vector $\overrightarrow{w}$.

The Artifact Evaluation version also prints two values necessary to assess the experiments carried out in the paper:

  • Number of references which were predicted by $\overrightarrow{\gamma}$ versus total number of references in the trace.
  • Maximum memory usage for storing the solution elements and backtracking stack during the reconstruction process.

Note that this manual way of reconstructing a synthetic trace is not convenient for performance (runtime) measurements, which are covered in the following section.

Automatic reconstruction using system tests

In order to avoid the manual process of constructing a synthetic trace through $\texttt{generate_trace()}$, the Python code includes three convenience functions:

  • To run the reconstruction on a single reference of a benchmark, run:
In [ ]:
trace_ae.system_tests( 
    test="cholesky",
    access=4,
    size="standard",
    timing=True,
    debug=True )

This call will generate and reconstruct reference $\texttt{cholesky.4}$. It will show the time for the reconstruction process and, after it ends, will check that running the generated loop rebuilds the exact memory trace used as input. Note that the reconstruction process will first try to reconstruct the trace using $\texttt{cut=1}$, then $\texttt{cut=2}$, and finally $\texttt{cut=3}$ before it suceeds. Reported time includes all three calls to $\texttt{extract()}$. Also note that the debugging process is very costly: reconstructing the original trace from $\mathbf{U}$, $\overrightarrow{w}$ and $\overrightarrow{c}$ and checking that it is exactly equal to the original one takes longer than the reconstruction process itself (although, admittedly, this process is not as optimized as the reconstruction which is the subject of the paper).

  • To run the reconstruction of all the references of a benchmark, run:
In [ ]:
trace_ae.all_accesses(
    test="cholesky",
    size="standard",
    timing=True,
    debug=True )

This call invokes $\texttt{system_tests()}$ for all the references in the trace library belonging to benchmark $\texttt{cholesky}$. Reconstruction times are reported per reference.

  • To run the full system tests, reconstructing all references of all PolyBench benchmarks, run:
In [ ]:
trace_ae.all_tests(
    size="standard",
    timing=True,
    debug=True )

Note that this is a very lengthy process (it may take 24-48h, depending on the hardware used for the tests). Alternatively, the reviewer may use $\texttt{size="small"}$ to generate PolyBench traces equivalent to those obtained when compiling PolyBench codes using $\texttt{-DSMALL_DATASET}$. This process will only take a few minutes, and is enough for assessing the reconstruction from the qualitative point of view.

Reconstruction of downloaded traces

The reconstruction process for downloaded traces is straightforward:

  1. Download the chosen trace from the artifact home page.
  2. Uncompress it using $\texttt{bunzip2}$.
  3. Read the trace into memory using the following command:
In [ ]:
(a,c,d)=trace_ae.read_trace( "cholesky.4.trace" )

Where $\texttt{read_trace()}$ expects a full or relative path to the trace file. The function returns $\texttt{a}$, $\texttt{c}$ and $\texttt{d}$ with the same meanings as when generating the trace synthetically using $\texttt{generate_trace()}$.

4. Start the reconstruction process as before:

In [ ]:
trace_ae.extract( d, c, cut=3 )

The solution must be structurally equal to the one obtained using synthetic traces. The only differences should reside in the base address, and in $\overrightarrow{c}$ which will be multiplied in this case by the data size of the base array.

Local generation of custom traces by the reviewer

The reviewer may easily generate a single reference trace from the PolyBench sources packed with the artifact:

  1. Choose a benchmark and reference, e.g., $\texttt{cholesky.4}$.
  2. Edit the source code of the $\texttt{cholesky}$ benchmark, found in $\texttt{polybench-c-3.2/linear-algebra/kernels/cholesky/cholesky.c}$.
  3. In the $\texttt{scop}$ section of the code, look for the access annotated with the number $\texttt{4}$. In particular, $\texttt{cholesky.4}$ appears in line 82:

    x = x - A[j][k] * A[i][k]; // 4 -> A[j][k] (read)

  4. Insert a $\texttt{printf()}$ immediately after the reference in the code:

    printf( "4 \%lx\n", &A[j][k] );

  5. Run the code and store the output of the inserted $\texttt{printf()}$ into a file $\texttt{cholesky.4.trace}$.
  6. Use the reconstruction engine to process the generated trace.

Evaluation and expected result

There are four evaluation dimensions that the reviewer should take into account:

  1. Qualitative evaluation: the reconstruction process should return $\mathbf{U}$, $\overrightarrow{w}$ and $\overrightarrow{c}$ such that they reconstruct the exact affine input trace.

  2. Performance evaluation: the reconstruction process should finish in a time that conforms to the findings in the paper.

  3. Evaluation as a predictor: the reconstruction process should predict the number of accesses shown in Table 2.
  4. Memory consumption: Section 4 gives the lower and upper bounds on memory consumption. The code in $\texttt{trace_ae.py}$ automatically measures this consumption for each invocation of $\texttt{extract()}$.