Home  
 

Main Menu
Home
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
News
Secure Login
Events Calendar
« < October 2017 > »
M T W T F S S
25 26 27 28 29 30 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
Login

subTask 1.5(UM): Multimedia applications

Leader: Gregorio Bernabé; Researchers: Joaquín Franco, Juan Fernández

1. Brief Description of the Goals

There are two multicore platforms that are currently concentrating an enormous attention due to their tremendous potential in terms of sustained performance: the Cell Broadband Engine (Cell BE from now on) and the NVIDIA Tesla computing solutions. The former is a recent heterogeneous chip-multiprocessor (CMP) architecture jointly developed by IBM, Sony and Toshiba to offer very high performance, especially on game and multimedia applications. In fact, it is the heart of the PlayStation 3. The latter are general-purpose GPUs (GPGPU) used as dataparallel computing devices based on the Computed Unified Device Architecture (CUDA) common to the latest NVIDIA GPUs. The common denominator is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this task we try to provide some insight into the peculiarities of both, as regards their cost, performance, programmability and limitations, in order to target scientific computing.

2. Scientific and Technical Developed Activities

Our studies began with the implementation of the algorithm of the 3D fast wavelet transform in a single processor with and without HyperThreading. Some results of these works were published by Bernabé et al. in Journal of Parallel Computing, vol. 33, n. 1, 2007, by Vélez et al. in the book Emerging Technology in Breast Imaging and Mammography, chapter 22, 2007 and by Bernabé et al. in Journal of Systems & Software, vol. 82, n. 3, 2009. We continued with the implementation of the wavelet transform on the GPUs and Cell Be published by Fernández et al. in CSC 2008 Conference. We provided some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we showed that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192x8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory. Results were published by Franco et al. in PDP 2009 Conference and more recently by Franco et al. in Journal of Real-Time Image Processing, vol. 7, nº 3, 2012. We extended the analysis to the 3D-FWT scenario, where speed-up factors have been improved using a new set of optimization techniques. We presented different alternatives and programming techniques for an efficient parallelization of the 3D FastWavelet Transform on multicore CPUs and manycore GPUs. OpenMP and pthreads were used on the CPU to build different implementations in order to maximize parallelism, whereas CUDA and OpenCL were selected for data parallelism exploitation on the GPU with an explicit memory handling. Speedups of the CUDA version on Fermi architecture were the highest obtained, improving the execution times on CPU on a range from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains in the range from 2x for small frame sizes to 3x for larger ones. Some results of these works were published by Franco et al. in ICCS 2010 Conference and by Bernabé et al. in LASCAS 2012 Conference.


Publications: [Bernabe07], [Velez07], [Fernandez08], [Franco08], [Bernabe09][Franco09]

 Projects funded by Public Calls:  

External collaborations Academia: Manuel Ujaldon

External collaborations Industry: --

Company Agreements: --

PhD dissertations: --

Patents: --