Work in Progress: 20200317

Deep Learning chroma keyer implementation

(gl) Implementación dun chroma keyer usando Deep Learning
(es) Implementación de un chroma keyer usando Deep Learning

Student

Daniel Castro Veiga

Supervision

Luis Omar Álvarez Mures (Cinfo, UDC)
Francisco Javier Taibo Pena (UDC)
Emilio José Padrón González (UDC)

Brief description

Chroma key compositing, or chroma keying, is a visual effects/post-production technique for compositing (layering) two images or video streams together based on color hues (chroma range). The technique has been used heavily in many fields to remove a background from the subject of a photo or video – particularly the newscasting, motion picture, and video game industries. A color range in the foreground footage is made transparent, allowing separately filmed background footage or a static image to be inserted into the scene.

The chroma keying technique is commonly used in video production and post-production. This technique is also referred to as color keying, colour-separation overlay (CSO; primarily by the BBC), or by various terms for specific color-related variants such as green screen, and blue screen – chroma keying can be done with backgrounds of any color that are uniform and distinct, but green and blue backgrounds are more commonly used because they differ most distinctly in hue from most human skin colors. No part of the subject being filmed or photographed may duplicate the color used as the backing [1].

Deep learning is an aspect of artificial intelligence (AI) that is concerned with emulating the learning approach that human beings use to gain certain types of knowledge. At its simplest, deep learning can be thought of as a way to automate predictive analytics. While traditional machine learning algorithms are linear, deep learning algorithms are stacked in a hierarchy of increasing complexity and abstraction. Each algorithm in the hierarchy applies a nonlinear transformation on its input and uses what it learns to create a statistical model as output. Iterations continue until the output has reached an acceptable level of accuracy. The number of processing layers through which data must pass is what inspired the label deep [2].

Because deep learning models process information in ways similar to the human brain, models can be applied to many tasks people do. Deep learning is currently used in most common image recognition tools, NLP processing and speech recognition software. These tools are starting to appear in applications as diverse as self-driving cars and language translation services.

The goal of this project is to segment objects of interest from the background in real-time using SegNet. SegNet is a deep encoder-decoder architecture for multi-class pixelwise segmentation researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK [3]. This segmentation will be used to provide a 3D chroma key in real time. Specific application domains would be games, broadcasting, etc.

[1] The Green Screen Handbook: Real-World Production Techniques
Jeff Foster.
Sybex; 1 edition (March 15, 2010)
ISBN-10: 0470521074

[2] Deep Learning in Neural Networks: An Overview
Juergen Schmidhuber.
Neural Networks, Vol 61, pp 85-117, Jan 2015
DOI: 10.1016/j.neunet.2014.09.003

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
Vijay Badrinarayanan, Ankur Handa, Roberto Cipolla. 2015
https://arxiv.org/abs/1505.07293

Specific objectives

  • The main objective of this project is to develop a 3D chroma keyer using a SegNet-based architecture.

  • The student will explore new innovative methods to perform chroma keying in real-time.

  • A standalone proof-of-concept application will be developed.

Methodology

An Agile development method will guide the project, with relatively short sprints to build the different tasks, after a preliminary work of study and documentation.

Development steps

  • Analysis of requirements and project scheduling, according to student disponibility.

  • Study and documentation.

    • Chroma keying.
    • TensorFlow, C++, libav.
    • SegNet.
  • Incremental, iterative work sequences (sprints) to develop a real-time chroma keyer.

Material

  • Personal computer with GPU and internet access.
Teaching and Researching in Computer Science/Engineering

My research interests include High Performance Computing (HPC) and Computer Graphics.