Main Menu
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
Secure Login
Events Calendar
« < November 2018 > »
29 30 31 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 1 2

Activity 5: Cell BE Processor (Cell BE form now on)

Leader: Juan Fernández; Researchers: José Luis Abellán, Manuel E. Acacio

Removal Activity 5: Cell BE processor

Reason: The IBM Cell BE support has been deprecated.

1. Brief Description of the Goals

Nowadays, among all contemporary CMP (or chip-multiprocessor) architectures, there is one that is currently concentrating an enormous attention due to its architectural particularities and tremendous potential in terms of sustained performance: the Cell Broadband Engine (Cell BE from now on). From the architectural point of view, the Cell BE can be classified as a heterogeneous CMP that integrates up to nine cores of two different types with SIMD capabilities, thus allowing us to exploit not only thread-level parallelism but also data-level parallelism. In addition, two or more Cell BE processors can be linked together through the Element Interconnect Bus (EIB) in Cell-based blade configurations. As for its memory model, the Cell BE comes with a hybrid scheme with a common shared main memory and eight small private local memories. The combination of all these factors makes programming of the Cell BE a really complex task. Cell BE programmers must explicitly cope with multiple threads that have to orchestrate frequent data movements to overlap computation and communication due to the small size of the private local memories.

In this scenario, a number of programming models and platforms have been proposed for the Cell BE. Some of them are based on well-known shared-memory and message-passing libraries such as OpenMP and MPI. In the meantime, others such as CellSs and RapidMind are based on task-parallel and stream programming models, respectively. In any of them, synchronization and coordination of multiple threads is a common source of programming errors and performance bottlenecks. For that reason, these programming models either offer explicit synchronization and communication collective primitives, such as MPI Barrier or #pragma omp barrier, or even hide synchronization and communication issues, in order to make code less error prone and/or more efficient. In any case, fast and efficient collective primitives are clearly needed since they are the building blocks for more elaborated algorithms to carry out critical tasks such as thread scheduling or load balancing.

This research line mainly focuses on devising fast and efficient synchronization, coordination and communication primitives with hardware support for the Cell BE and other CMP architectures. In addition, we will try to take advantage of these primitives to optimize the performance of real applications.

2. Scientific and Technical Developed Activities

At the first stage of this activity, we carried out parallel implementations of well-known problems. In particular, we contributed to both the parallelization of a common search algorithm and a well-known scientific application, namely SWEEP3D. The results of these works were published by Villa et al. in 2007 IPDPS Int’l conference and by Petrini et al. in 2007 IPDPS Int’l conference. After that, we developed a tool to analyze and evaluate all of the synchronization and communication mechanisms provided by the Cell BE. Using such a tool, we fully characterized the behavior of a dual Cell-based blade. The results of such characterization were published by Abellán et al. 2008 Euromicro PDP Int’l conference, 2008 ICCS Int’l conference and in The Journal of Supercomputing, vol. 53, issue 2, 2010. Then, we studied several implementations of synchronization and collective primitives on a dual Cell-based blade, and developed extremely fast and efficient versions of barrier, broadcast and reduce collective primitives. The results of this investigation were published by Gaona et al. in 2009 Europar Int’l conference. After this, we decided to cancel this activity since the Cell BE processor had no continuity and became obsolete. Instead, we dedicated the resources originally planned for this activity to the investigation of efficient hardware implementations of barrier and lock synchronization primitives in the context of many-core CMPs. This research has been developed as part of subTask 1.2 (UM). 

Publications: [Villa07], [Petrini07], [Abellan07], [Abellan08a ], [Abellan08b], [Abellan08c], [Fernández08], [Abellan09], [Gaona09]

Projects funded by Public Calls:  TIN2009-14475-C04-01HiPEAC by  national grants,

External collaborations Academia: --

External collaborations Industry: --

Company Agreements: --

PhD dissertations:José Luis Abellán Miguel

Patents:  --