Parallel and distributed computing / Computer Architecture Group.

+ Parallel and distributed computing:

- Compilers for parallel architectures:

Automatic generation of parallel code: This research line addresses the automatic parallelization of sequential programs. The focus is on the development of compiler techniques to convert a sequential program into a concurrent program to be executed in modern multi-core and many-core architectures. We are working on the development of advanced program analyses to discover the parallelism implicit in sequential programs and the development of code transformation techniques to build the most efficient parallel version of a sequential program.

Iterative Optimization: We are working on the iterative optimization of codes on heterogeneous architectures including GPUs and/or CPUs. Iterative optimization allows generating a huge number of optimized versions of the same application and selecting the fastest version for a given architecture by means of analytical models, heuristics or the execution of the real code.

 

- Languages and tools for parallel programming:

Our group designs and builds tools (Servet) and libraries (HTA, UPCBLAS) to improve the productivity of programmers, particularly in the development of parallel applications. Servet is a portable suite of benchmarks to obtain the most relevant hardware parameters to support the automatic optimization of applications on multicore clusters. The Hierarchically Tiled Array or HTA data type is a class designed to facilitate the writing of programs based on tiles in object-oriented languages. HTAs allow exploiting locality as well as to express parallelism with much less effort than other approaches. UPCBLAS is a parallel numerical library for dense matrix computation using the PGAS (Partitioned Global Address Space) UPC (Unified Parallel C) language. The popularity of PGAS languages has increased during the last years thanks to their high programmability and performance especially on hierarchical architectures such as multicore clusters.

Our proposals, which cover distributed, shared and hybrid memory systems, lead to codes which are better structured, more readable and easier to maintain than those built using standard tools, while the performance is very similar. Much of this research has been performed in close collaboration with leading universities such as the University of Illinois at Urbana-Champaign or first level companies such as HP and IBM.

 

- Fault tolerance and malleability of parallel applications:

Systems intended for the execution of long-running parallel applications should provide fault tolerant capabilities, since the probability of failure increases with the execution time and the number of nodes. Checkpointing and rollback recovery is one of the most popular techniques to provide fault-tolerance support. We have developed CPPC (ComPiler for Portable Checkpointing), an application-level checkpointing tool for message-passing applications designed with special focus on portability.

Currently we are exploring the possibility of implementing malleability of MPI applications as an extension to the CPPC tool, so that the final approach enables transparent reconfiguration during the application’s execution.

 

 

- General purpose computation on GPUs:

In this field we are working in the development of tools that allow the automatic or semi-automatic implementation of algorithms on a GPU. Additionally, another objective is to develop high-level libraries for multi-GPU systems focused on providing simple yet efficient communication among the different GPUs. All our research is centred in providing solutions based on the main programming languages in these platforms, such as OpenCL and Cuda.

 

 

 

<-- Back to research