Trace-Based Affine Reconstruction of Codes

This webpage covers the trace reconstruction tool presented in the paper Trace-based Affine Reconstruction of Codes, by Gabriel Rodríguez, José M. Andión, Mahmut T. Kandemir, and Juan Touriño, presented in the 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2016). The software has been licensed under an Apache 2.0 License and publish and we are considering its publication in github.

The provided Python file is a proof-of-concept implementation of the trace reconstruction tool. Although it includes many performance optimizations it is by no means optimal. It includes a synthetic library of traces extracted from the PolyBench C/3.2 suite, which can be used to evaluate and test the capabilities of the tool.

Experiment workflow

This example describes how to use the tool in two different use cases: manual reconstruction of synthetically generated traces, and generation of custom traces by the user.

Reconstruction of synthetically generated traces

Besides the reconstruction engine, the tool includes a trace library capable of generatic traces equivalent to all the PolyBench/C 3.2 memory references. The $\texttt{generate_trace()}$ function can be used to synthetically generate a trace in the following way:

In [ ]:
import numpy as np
import trace_1_0 as trace
(a,c,d) = trace.generate_trace(
                size="standard" )

The previous command will generate a trace equivalent to $\texttt{cholesky.4}$, with only two differences:

  • The base address of the access (first referenced address) is 0.
  • The base stride (data size) is 1, instead of the size in bytes of the base array data type.

As mentioned in the last paragraph of Section 3.4 of the paper, these are not relevant for the reconstruction process. Note that $\texttt{cholesky.4}$ is a notation used in the synthetic trace library to reference the different accesses in each application. The $\texttt{generate_trace()}$ function returns three values:

  • $\texttt{a}$: NumPy array containing the synthetic trace.
  • $\texttt{c}$: NumPy array containing the first observed stride, used to start the reconstruction process, and calculated as:
In [ ]:
c = np.array([a[1]-a[0]])
  • $\texttt{d}$: NumPy array containing the first order differences of trace $\texttt{a}$, calculated as:
In [ ]:
d = a[2:] - a[1:-1]

Once the synthetic trace is generated, the reconstruction process is started by calling:

In [ ]:
trace_ae.extract( d, c, cut=3 )

In the previous call, the parameter $\texttt{cut}$ indicates the depth threshold for the maximum solution size to try. Since the simplest representation of the $\texttt{cholesky.4}$ reference is a 3-level loop, using $\texttt{cut}$ below 3 will not return a valid reconstruction. This is a very fast process. Using $\texttt{cut}$ above 3 will return a valid reconstruction, but it may not be a minimal one from the structural point of view (i.e., it may use loops of depth greater than 3). Besides, the number of explored branches grows exponentially with loop depth, and therefore it is not desirable to try a reconstruction with a depth greater than strictly necessary. For this reason, in order to guarantee a breadth-first traversal of the solution space, in our tests $\texttt{extract()}$ is called with incremental values of $\texttt{cut}$ until a solution is found (see Section A.4.2 in the paper).

Upon completion, $\texttt{extract()}$ returns the components of the solution, if it has been found:

  • Coefficients vector $\overrightarrow{c}$.
  • Iteration matrix $\mathbf{I}$, containing a subset of the iteration indices in the reconstruction.
  • Bounds matrix $\mathbf{U}$.
  • Bounds vector $\overrightarrow{w}$.

Local generation of custom traces by the user

The user may easily generate a single reference trace from the PolyBench sources:

  1. Choose a benchmark and reference, e.g., $\texttt{cholesky.4}$.
  2. Edit the source code of the $\texttt{cholesky}$ benchmark, found in $\texttt{polybench-c-3.2/linear-algebra/kernels/cholesky/cholesky.c}$.
  3. In the $\texttt{scop}$ section of the code, look for the following access:

    x = x - A[j][k] * A[i][k]; // 4 -> A[j][k] (read)

  4. Insert a $\texttt{printf()}$ immediately after the reference in the code:

    printf( "4 \%lx\n", &A[j][k] );

  5. Run the code and store the output of the inserted $\texttt{printf()}$ into a file $\texttt{cholesky.4.trace}$.
  6. Use the reconstruction engine to process the generated trace.

After being loaded into memory, the processed trace can be reconstructed using the $\texttt{reconstruct()}$ function, which automatically creates the coefficients vector and the first-order differences:

In [ ]:
trace.reconstruct( a, max_dimension=3 )

Exhaustive reconstructions

In order to improve performance, the standard version of the reconstruction algorithm performs limited backtracking. This drastically reduces the number of potential solutions explored, and is enough to reconstruct most traces (e.g., the entire PolyBench suite). However, sometimes it is not possible to find a minimal solution without an exhaustive exploration. It can be turned on using the "exhaustive" boolean variable. The synthetic trace library includes a test to exemplify this:

In [ ]:
In [ ]:
trace.reconstruct( a, max_dimension=3, exhaustive=False )
In [ ]:
trace.reconstruct( a, max_dimension=3, exhaustive=True )

Note how using the exhaustive exploration the process is much slower, but it manages to find the minimal solution. Also note that the minimal solution is not the only feasible one:

In [ ]:
trace.reconstruct( a, max_dimension=5, exhaustive=False )