PDCON:Conference/Image Processing Algorithm Optimization with CUDA for Pure Data: Difference between revisions

From Medien Wiki
mNo edit summary
Line 9: Line 9:
Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDA. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB image conversion to grey scale image, tests demonstrate that GPU computing grants an average accelerating factor of 109 comparing to the “only-CPU” based computing. However a CPU + GPU architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPU and GPU for each CUDA function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPU to GPU, at the start of the program and one second backward transfer at the end containing the result from the whole process.
Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDA. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB image conversion to grey scale image, tests demonstrate that GPU computing grants an average accelerating factor of 109 comparing to the “only-CPU” based computing. However a CPU + GPU architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPU and GPU for each CUDA function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPU to GPU, at the start of the program and one second backward transfer at the end containing the result from the whole process.
In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDA blocks inside [[Pure Data]] facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.
In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDA blocks inside [[Pure Data]] facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.
<videoflash type="vimeo">36434429|700|400</videoflash>


===References===
===References===