This webpage contains a software implementation of the ideas presented in the paper Trace-Based Affine Reconstruction of Codes, accepted for presentation in the 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2016).
The present artifact consists of three parts:
This example describes how to use the tool in four different use cases: manual reconstruction of synthetically generated traces, usage of automatic test features included in the tool to reconstruct synthetically generated traces, reconstruction of pre-processed traces downloaded from the authors personal page, and local generation of custom traces by the reviewer. Note that this section is also available as an iPython Notebook in files $\texttt{workflow.ipynb}$ and $\texttt{workflow.html}$, packed in the artifact.
Besides the reconstruction engine, $\texttt{trace_ae.py}$ includes a trace library including all the PolyBench/C 3.2 memory references. The $\texttt{generate_trace()}$ function can be used to synthetically generate a trace equivalent to any PolyBench one in the following way:
import numpy as np
import trace_ae
(a,c,d) = trace_ae.generate_trace(
test="cholesky",
access=4,
size="standard" )
The previous command will generate a trace equivalent to $\texttt{cholesky.4}$, with only two differences:
As mentioned in the last paragraph of Section 3.4, these are not relevant for the reconstruction process. The $\texttt{generate_trace()}$ function returns three values:
c = np.array([a[1]-a[0]])
d = a[2:] - a[1:-1]
Once the synthetic trace is generated, the reconstruction process is started by calling:
trace_ae.extract( d, c, cut=3 )
In the previous call, the parameter $\texttt{cut}$ indicates the depth threshold for the maximum solution size to try. Since the simplest representation of the $\texttt{cholesky.4}$ reference is a 3-level loop, using $\texttt{cut}$ below 3 will not return a valid reconstruction. This is a very fast process. Using $\texttt{cut}$ above 3 will return a valid reconstruction, but it may not be a minimal one from the structural point of view (i.e., it may use loops of depth greater than 3). Besides, the number of explored branches grows exponentially with loop depth, and therefore it is not desirable to try a reconstruction with a depth greater than strictly necessary. For this reason, in order to guarantee a breadth-first traversal of the solution space, in our tests $\texttt{extract()}$ is called with incremental values of $\texttt{cut}$ until a solution is found (see Section A.4.2 in the paper).
Upon completion, $\texttt{extract()}$ returns the components of the solution, if it has been found:
The Artifact Evaluation version also prints two values necessary to assess the experiments carried out in the paper:
Note that this manual way of reconstructing a synthetic trace is not convenient for performance (runtime) measurements, which are covered in the following section.
In order to avoid the manual process of constructing a synthetic trace through $\texttt{generate_trace()}$, the Python code includes three convenience functions:
trace_ae.system_tests(
test="cholesky",
access=4,
size="standard",
timing=True,
debug=True )
This call will generate and reconstruct reference $\texttt{cholesky.4}$. It will show the time for the reconstruction process and, after it ends, will check that running the generated loop rebuilds the exact memory trace used as input. Note that the reconstruction process will first try to reconstruct the trace using $\texttt{cut=1}$, then $\texttt{cut=2}$, and finally $\texttt{cut=3}$ before it suceeds. Reported time includes all three calls to $\texttt{extract()}$. Also note that the debugging process is very costly: reconstructing the original trace from $\mathbf{U}$, $\overrightarrow{w}$ and $\overrightarrow{c}$ and checking that it is exactly equal to the original one takes longer than the reconstruction process itself (although, admittedly, this process is not as optimized as the reconstruction which is the subject of the paper).
trace_ae.all_accesses(
test="cholesky",
size="standard",
timing=True,
debug=True )
This call invokes $\texttt{system_tests()}$ for all the references in the trace library belonging to benchmark $\texttt{cholesky}$. Reconstruction times are reported per reference.
trace_ae.all_tests(
size="standard",
timing=True,
debug=True )
Note that this is a very lengthy process (it may take 24-48h, depending on the hardware used for the tests). Alternatively, the reviewer may use $\texttt{size="small"}$ to generate PolyBench traces equivalent to those obtained when compiling PolyBench codes using $\texttt{-DSMALL_DATASET}$. This process will only take a few minutes, and is enough for assessing the reconstruction from the qualitative point of view.
The reconstruction process for downloaded traces is straightforward:
(a,c,d)=trace_ae.read_trace( "cholesky.4.trace" )
Where $\texttt{read_trace()}$ expects a full or relative path to the trace file. The function returns $\texttt{a}$, $\texttt{c}$ and $\texttt{d}$ with the same meanings as when generating the trace synthetically using $\texttt{generate_trace()}$.
4. Start the reconstruction process as before:
trace_ae.extract( d, c, cut=3 )
The solution must be structurally equal to the one obtained using synthetic traces. The only differences should reside in the base address, and in $\overrightarrow{c}$ which will be multiplied in this case by the data size of the base array.
The reviewer may easily generate a single reference trace from the PolyBench sources packed with the artifact:
x = x - A[j][k] * A[i][k]; // 4 -> A[j][k] (read)
printf( "4 \%lx\n", &A[j][k] );
There are four evaluation dimensions that the reviewer should take into account:
Qualitative evaluation: the reconstruction process should return $\mathbf{U}$, $\overrightarrow{w}$ and $\overrightarrow{c}$ such that they reconstruct the exact affine input trace.
Performance evaluation: the reconstruction process should finish in a time that conforms to the findings in the paper.