About Me
I am a PhD candidate currently working on distributed computation of Big Data workloads. My main research interests cover related technologies, while also including topics like Cloud Computing and High Performance Computing.
Research Interests
Big Data
MapReduce
Hadoop
Distributed Storage
High Performance Computing
Cloud Computing
Journal Publications
-
- Roberto R. Expósito, Jorge Veiga, Jorge González-Domínguez and Juan Touriño
"MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud"
Bioinformatics
Vol. 33, Issue 17, pp. 2762-2764, May 2017.
Online
- Roberto R. Expósito, Jorge Veiga, Jorge González-Domínguez and Juan Touriño
Book chapters
Conference Papers
-
- Jorge Veiga, Guillermo L. Taboada, Xoán C. Pardo, y Juan Touriño
"Optimización del tiempo y coste de almacenamiento en la nube con el servicio HPS3"
XXV Jornadas de Paralelismo
Valladolid, Spain, September 2014.
- Jorge Veiga, Guillermo L. Taboada, Xoán C. Pardo, y Juan Touriño
Projects
-
BDEv
BDEv is a tool to evaluate Big Data processing solutions in terms of performance and resource efficiency. It includes several ready-to-use frameworks (e.g. Hadoop, Spark, Flink) and manages the configuration needed to leverage the available computational resources, like CPU, memory and network interfaces. The evaluation of these frameworks can be done by using different benchmarks (e.g. TeraSort, WordCount) included in the BDEv distribution, while also enabling the execution of custom commands. Moreover, BDEv eases the execution of experiments and the task of recovering results by providing automatically generated graphs.
-
Flame-MR
Flame-MR is a MapReduce framework which improves the performance of Hadoop applications. It employs several kinds of optimizations, like avoidance of memory copies, efficient sort and merge algorithms and flexible use of resources. Moreover, its event-driven architecture overlaps the data transferring and processing. Flame-MR also keeps binary compatibility with Hadoop, so applications do not have to be modified or recompiled to be executed. The experimental results show that Flame-MR can reduce the execution time of iterative workloads by a half.
-
MarDRe
MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads in large scale FASTQ/FASTA datasets. Duplicate reads can be seen as identical or nearly identical sequences with some mismatches, so removing them decreases memory requirements and computational time of downstream analysis, without damaging biological information. MarDRe is written in Java and built upon Apache Hadoop.
Contact information
- Faculty of Informatics
Campus de Elviña s/n
A Coruña, Spain - +34 881 011 212
- jorge.veiga@udc.es