Home  
 

Main Menu
Home
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
News
Secure Login
Events Calendar
« < December 2017 > »
M T W T F S S
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Login

Activity 4: True zero-copy communication protocol and associated hardware and software interfaces

Leader: Jose Duato; Researchers: Federico Silla, Yana Esteves, Santiago Mislata, Rosa Ana Bellver, Ferran Pérez

1. Brief Description of the Goals

Eliminating buffer-to-buffer copies is the most efficient way to increase communication performance. Current zero-copy communication protocols do not account for the copy from where the data are generated (usually a CPU ALU) to a memory buffer before a send call is invoked, nor for the copy from the reception buffer to where data are processed at the receiving side. Eliminating those copies would enhance communication performance but would prevent optimizations based on the transmission of long messages and, most important of all, would dramatically change the way message-passing applications are developed. Previous attempts (e.g. iWarp) failed to gain wide acceptance despite the clear technical superiority.

A completely different approach to zero-copy communication consists of replacing the message-passing communication model with a shared-memory model. Data accesses to remote memory can now be performed simply by issuing load/store instructions. Additionally, by removing communication libraries and implementing mechanisms in hardware, latency is really minimized. The important thing is that this approach for cluster communication matches the current communication model in multicore chips, thus benefiting from the expertise in programming those chips. Moreover, memory regions and memory-mapped peripheral controllers can now be easily shared among different cluster nodes, and flexibly assigned to different virtual machines.

But shared-memory architectures exist for several decades. They are significantly more expensive and less scalable than message-passing ones. The fundamentally different approach followed in this project to achieve scalability and low cost consists of not enforcing cache coherence. By removing support for cache coherence, scalability can be dramatically improved, and changes to existing processors are minimal. Moreover, those changes can be implemented in silicon external to the processors. The price to pay is that programming becomes a bit more complex than with cache coherence support. But multiple research groups all over the world are working on non-coherent shared-memory models (usually implemented on top of message-passing hardware).

Moreover, in order to make sure that the proposed architecture is accepted by the market, we are implementing a working prototype and, most important of all, we are working shoulder to shoulder with industry to standardize the developed solutions.

2. Scientific and Technical Developed Activities

This task has explored a new distributed shared-memory architecture with the aim of being accepted by the market. For so, the new architecture should not only present performance and scalability features beyond current shared-memory architectures but it should also be based on commodity hardware so that it is cost-effective and, furthermore, have the support from industry because it is a standard solution.

In order to achieve the last two goals (standard solution supported by industry and based on commodity hardware), we set-up of a direct path for technology transfer to top information technology companies (AMD, Sun, HP) through membership to the HyperTransport Technology Consortium (HTC), membership to the HTC Technical Working Group (the committee that makes decisions on standardization of HyperTransport technology), and leadership of the HTC Advanced Technology Group (the committee that proposes technology to standardize, in addition to the HTC member companies). The main result was that we were the main contributors to the High Node Count HyperTransport Specification 1.0, announced by the HyperTransport Technology Consortium on February 11, 2009. Other results from this effort were the standardization of the physical connectors and cables for the previous standard as well as the creation of a standard that describes how to use this new technology with commodity hardware (the Ethernet interconnects).

Regarding the first goal (technology features beyond current state-of-the-art), we set-up a cluster prototype with 1024 cores and FPGA-based network interfaces which allowed us to implement and evaluate the HyperTransport extensions on FPGAs, as well as to prove the scalability of our shared-memory architecture and its associated software. On top of this architecture we have executed not only regular shared-memory applications, but also a distributed shared-memory database, showing the noticeable benefits of the proposal. Results from this work have been published F. Silla et al. in CLUSTER 2010, HPCC 2011 and HiPC 2011 conferences.


Publications: [Duato06a], [Duato06b

Projects funded by Public Calls:  TIN2006-15516-C04-01,  TIN2009-14475-C04-01 PROMETEO/2008/060

External collaborations Academia: Holger Fröning and Ulrich BrüningSudhakar Yalamanchili

External collaborations Industry: --

Company Agreements: HyperTransport Tecnology Consortium (US)

PhD dissertations: --

Patents: --