Image processing algorithm optimization with CUDACompute Unified Device Architecture for Pure Data

Author: Rudi Giot

Download full paper Media:Image Processing Algorithm Optimization with CUDA for Pd.pdf

Image Processing Production lines featuring industrial vision are becoming more and more widespread. That kind of automation needs systems able to capture pictures, analyze and learn from them in order to take appropriate action. These processes are often heavy and applied to high-definition images with important frame rate. Powerful calculators are thus needed to follow the ever growing production rate. NVIDIA is currently designing interfaces providing a CUDACompute Unified Device Architecture[1] allowing parallel data computation. This could increase the performance of every operating system using graphical processing units (GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte)). A CUDACompute Unified Device Architecture program is made up of two parts: one running on the host (CPUCentral Processing Unit, an →IC, generally refered to as the processor in a computer) and the other exploiting the device (GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte)). The non-parallelizable stages of the program are run on the host, while the parallelizable ones are run on the device. Pure Data, thanks to its graphical modular development environment, allows fast prototype development. Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDACompute Unified Device Architecture. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB[[Farbe#Farbräume|Red Green Blue]] Colorspace image conversion to grey scale image, tests demonstrate that GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte) computing grants an average accelerating factor of 109 comparing to the “only-CPUCentral Processing Unit, an →IC, generally refered to as the processor in a computer” based computing. However a CPUCentral Processing Unit, an →IC, generally refered to as the processor in a computer + GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte) architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPUCentral Processing Unit, an →IC, generally refered to as the processor in a computer and GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte) for each CUDACompute Unified Device Architecture function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPUCentral Processing Unit, an →IC, generally refered to as the processor in a computer to GPUGraphics Processing Unit, the main →IC on a graphics adapter (Grafikkarte), at the start of the program and one second backward transfer at the end containing the result from the whole process. In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDACompute Unified Device Architecture blocks inside Pure Data facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.

References

  1. Compute Unified Device Architecture

Kreativfonds Bauhaus-Univeristät WeimarElectronic Arts Blog für digitale SpielkulturThe Mozilla FoundationAllied Vision TechnologiesReality Jockey Ltd.Freistaat ThüringenBauhaus-Universität WeimarHochschule für Musik Franz Liszt WeimarFraunhofer Institute for Digital Media Technology IDMTStadt WeimarKlassik Stiftung WeimarNKFaculty of MediaStudio for electro-acoustic MusicKulturTragWerk e.V.Elektronisches Studio der TU BerlinMaschinenraum Hackerspace WeimarParlomar5 e.V.Lab for Electronic Arts and PerformanceRadio Lotte WeimarSponsors and partners of the 4th internationals Pure Data Convention in Weimar 2011

4th international Pure Data Convention 2011 Weimar ~ Berlin