Jorge Veiga Fachal

About Me

I am a PhD candidate currently working on distributed computation of Big Data workloads. My main research interests cover related technologies, while also including topics like Cloud Computing and High Performance Computing.

Research Interests

Big Data
MapReduce
Hadoop
Distributed Storage
High Performance Computing
Cloud Computing

Journal Publications

- Jorge Veiga, Roberto R. Expósito, Bruno Raffin and Juan Touriño.
  "Optimization of real-world MapReduce applications with Flame-MR: practical use cases"
  IEEE Access
  Vol. 6, pp. 69750-69762, November 2018.
  Preprint Online
- Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño.
  "Enhancing in-memory efficiency for MapReduce-based data processing"
  Journal of Parallel and Distributed Computing
  Vol. 120, pp. 323-338, October 2018.
  Preprint Online
- Jorge Veiga, Jonatan Enes, Roberto R. Expósito and Juan Touriño.
  "BDEv 3.0: Energy efficiency and microarchitectural characterization of Big Data processing frameworks"
  Future Generation Computer Systems
  Vol. 86, pp. 565-581, September 2018.
  Preprint Online
- Roberto R. Expósito, Jorge Veiga, Jorge González-Domínguez and Juan Touriño
  "MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud"
  Bioinformatics
  Vol. 33, Issue 17, pp. 2762-2764, May 2017.
  Online
- Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño
  "Flame-MR: An event-driven architecture for MapReduce applications"
  Future Generation Computer Systems
  Vol. 65, pp. 46-56, December 2016.
  Preprint Online
- Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño
  "Analysis and evaluation of MapReduce solutions on an HPC cluster"
  Computers & Electrical Engineering
  Vol. 50, pp. 200-216, February 2016.
  Preprint Online

Book chapters

- Jorge Veiga, Roberto R. Expósito and Juan Touriño
  "Performance Evaluation of Big Data Analysis"
  Encyclopedia of Big Data Technologies
  pp. 1-6, January 2018.
  Preprint Online

Conference Papers

- Jorge Veiga, Roberto R. Expósito, Xoán C. Pardo, Guillermo L. Taboada and Juan Touriño
  "Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics"
  2016 IEEE International Conference on Big Data (IEEE BigData 2016)
  Washington, DC, USA, December 2016.
  Preprint Online
- Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño
  "MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads"
  International Conference on Computational Science (ICCS'15)
  Reykjavík, Iceland, June 2015.
  Preprint Online
- Jorge Veiga, Guillermo L. Taboada, Xoán C. Pardo, y Juan Touriño
  "Optimización del tiempo y coste de almacenamiento en la nube con el servicio HPS3"
  XXV Jornadas de Paralelismo
  Valladolid, Spain, September 2014.
- Jorge Veiga, Guillermo L. Taboada, Xoán C. Pardo, and Juan Touriño
  "The HPS3 service: reduction of cost and transfer time for storing data on clouds"
  16th IEEE International Conference on High Performance Computing and Communications (HPCC'14)
  Paris, France, August 2014.
  Preprint Online

Projects

BDEv

BDEv is a tool to evaluate Big Data processing solutions in terms of performance and resource efficiency. It includes several ready-to-use frameworks (e.g. Hadoop, Spark, Flink) and manages the configuration needed to leverage the available computational resources, like CPU, memory and network interfaces. The evaluation of these frameworks can be done by using different benchmarks (e.g. TeraSort, WordCount) included in the BDEv distribution, while also enabling the execution of custom commands. Moreover, BDEv eases the execution of experiments and the task of recovering results by providing automatically generated graphs.

BDEv Homepage
Flame-MR

Flame-MR is a MapReduce framework which improves the performance of Hadoop applications. It employs several kinds of optimizations, like avoidance of memory copies, efficient sort and merge algorithms and flexible use of resources. Moreover, its event-driven architecture overlaps the data transferring and processing. Flame-MR also keeps binary compatibility with Hadoop, so applications do not have to be modified or recompiled to be executed. The experimental results show that Flame-MR can reduce the execution time of iterative workloads by a half.

Flame-MR Homepage
MarDRe

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads in large scale FASTQ/FASTA datasets. Duplicate reads can be seen as identical or nearly identical sequences with some mismatches, so removing them decreases memory requirements and computational time of downstream analysis, without damaging biological information. MarDRe is written in Java and built upon Apache Hadoop.

MarDRe Homepage

PhD candidate

Contact me

About Me

Research Interests

Big Data

MapReduce

Hadoop

Distributed Storage

High Performance Computing

Cloud Computing

Journal Publications

Book chapters

Conference Papers

Projects

BDEv

Flame-MR

MarDRe

Contact information