Main Menu
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
Secure Login
Events Calendar
« < October 2017 > »
25 26 27 28 29 30 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

subTask 1.3(UPV): Development of congestion management techniques

Leader: Antonio Robles; Researchers: Joan Lluís Ferrer, Gaspar Mora, Jesús Escudero (UCLM), Pedro Javier García (UCLM), José Flich, Elvira Baydal, Pedro López, Francisco J. Quiles

1. Brief Description of the Goals

Network congestion may arise when several packet flows persistently request the same output inside a switch. In lossless network technologies, congestion propagates quickly through the network due to flow control, forming congestion trees and giving raise to the head-of-line (HOL) blocking phenomenon. Congestion dramatically degrades network performance (throughput drops, latency increases exponentially). The specific cause of this degradation is that congested flows may share some network resources (queues, links) with non-congested flows, thereby the former slowing the advance of the latter. Therefore, it is necessary to detect early the beginning of congestion, as well as the nodes responsible for that situation, applying corrective actions only over them.  

The main objective of this task is to design efficient and scalable congestion management techniques for network interconnects commonly used in PC clusters. To this end, we have worked in two different approaches for dealing with congestion: detection & recovery (detection-notification-throttling) strategies and HOL elimination. In the first approach, we will assume the use of Explicit Congestion Notifications in the context of an end-to-end congestion management mechanism. Our work will be focused on developing thorough packet marking techniques to properly identify hot flows and also a fair scheme of corrective actions to be applied at the source hosts. The second approach will assume the use of Regional Explicit Congestion Management (RECN), an efficient and scalable solution for eliminating HOL blocking previously proposed by our group. In this project, we propose similar techniques in those interconnect technologies where RECN cannot be applied. Specifically, RECN has been designed for networks using source routing and switches with queues at both input and output ports, thus we will propose techniques suitable for networks using deterministic distributed routing and switches with queues only at input ports. Additionally, we will try to propose simple and cost-effective HOL blocking elimination techniques that reduce RECN switch complexity.

2. Scientific and Technical Developed Activities

We have proposed a cost-effective end-to-end congestion management scheme for PC clusters based on Explicit Congestion Notification which combines a novel packet marking mechanism with a novel two-level scheme of corrective actions. The packet marking consists in marking packets at the switch input buffers and validating them at the output buffers that feed those links that are truly saturated. Only the packets belonging to flow responsible for congestion (hot flows) will be validated, thus avoiding applying corrective actions on cold flows. The corrective actions are based on a two-level scheme. In the first level the adjustment of the packet injection rate is performed by using a dynamic window. It is based on the idea of limiting, for each flow, the number of outstanding packets in the network. If congestion persists, a second level will reduce even more the injection rate. It consists of introducing waiting periods in between the injection of two packets meanwhile the window size is kept to the minimum value of 1. The results of this preliminary work were published by Ferrer et al. in 2007 Euromicro PDP Conference.

Additionally, an effective injection recovery strategy that meets the tradeoff between achieving a fast response time when congestion has finished and avoiding to inject too much traffic when the network is still congested has been proposed. Evaluation results show the robustness of the proposed scheme, which behaves better than previous proposals, especially in terms of the lower penalization suffered by flows non responsible for congestion. Also, a thorough analysis of the interaction between different packet marking mechanisms and corrective/recovery actions schemes in order to identify the factors that influence more the overall behavior of the congestion management scheme has been carried out. The results of this work were published in 2008 EuroPar Conference. Finally, we have improved the proposed mechanism in to aspects. First, we have proposed a new packet marking technique that simplifies the packet marking and enables early congestion detection, whose results were published in 2010 Euromicro PDP Conference. Second, we have improved its scalability by reducing the storage resources required in the network interfaces of the source nodes without penalizing performance noticeably. As summarize, the main results of the referred research line were published by Ferrer at al. in IEEE Transactions on Computers, vol. 61, no. 9, 2012.

The previous results have constituted the main contributions of the PhD thesis of Joan-Lluís Ferrer, defended by December 2012.

We have proposed several congestion-control techniques based on the same approach as the previously proposed RECN (Regional Explicit Congestion Notification) technique. Specifically, the key idea is to isolate the hot flows that contribute to congestion in queues dynamically allocated, in order to eliminate the HoL-blocking caused by congestion situations.

The first of the techniques proposed according to this approach is known as RECN-IQ (RECN-Input-Queued), published by Mora et al. in ICPP 2007 conference. It is suitable for networks using source-based routing, and based on switches with queues only at input ports. It is one of the main contributions of the PhD dissertation of Gaspar Mora. These results helped us to define the patent “Method for Congestion Management of a Network, a Switch, and a Network”, which improves over a previous joint patent with Xyratex. The second technique is known as FBICM (Flow-Based Implicit Congestion Management), published by Escudero et al. in HiPC Conference 2008. An improved version of FBICM was published by Escudero et al. in ICPADS Conference 2010. This technique is suitable for networks using deterministic distributed routing, so the managing of congestion information is completely different from those policies in the RECN and RECN-IQ techniques. Finally, we have proposed a third technique based on the RECN approach, known as DRBCM (Distributed-Routing-Based Congestion Management), published by Escudero et al. in Transactions on Parallel and Distributed Systems (paper accepted, publication pending). Like FBICM, DRBCM is suitable for networks using deterministic distributed routing, but only for those with fat-tree topology. This last condition allows us to reduce the resources required to implement FBICM. DRBCM is the base of a pending Spanish patent, titled "Procedimiento de control de congestión en redes multietapa con encaminamiento distribuido".

Besides, we have proposed a technique known as CCFIT (Combined Congested-Flow Isolation and Throttling), published by Escudero et al. in ICCPP Conference 2011. This technique is a combination of FBICM with the injection-throttling (Ith) mechanism implemented in InfiniBand-based networks, so that the resources required by FBICM to isolate hot flows are reduced, as Ith reduces the injection rate of many of these flows.

On the other hand, we have proposed several congestion-control techniques based on traffic-unaware (i.e. independent on traffic situation in the network) queuing schemes to separate packet flows, so that the HoL-blocking is reduced without introducing significant complexity. According to this straightforward approach, we proposed OBQA (Output-Based Queue Assignment), published by Escudero et al. in EuroPar Conference 2010 and in Journal of Parallel and Distributed Systems vol. 71, issue 11, 2011. OBQA is suitable for networks with fat-tree topology using the DESTRO routing algorithm. An improved version of OBQA which dynamically assigns memory slots to queues, in order to leverage memory resources, was also proposed. This version is known as OBDQA (Output-Based Dynamic Queue Assignment), published by Escudero et al. in Concurrency and Computation: Practice and Experience, vol. 23, issue 17, 2011.

It is worth mentioning that FBICM, DRBCM, CCFIT, OBQA and OBDQA are the main contributions of the PhD dissertation of Jesús Escudero.

Publications: [Ferrer07], [Ferrer08a], [Ferrer08b], [Ferrer09], [Garcia07], [Mora07a], [Mora07b], [Escudero07], [Escudero08a], [Escudero08b], [Escudero08c], [Escudero09], [Eberle08]   

Projects funded by Public Calls: TIN2006-15516-C04-01TIN2009-14475-C04-01 TC20080546.

External collaborations Academia: --

External collaborations Industry: Sun Microsystems (Norway)Xyratex (United Kingdom)

Company Agreements: --

PhD dissertations: Joan-Lluís Ferrer i PérezJesús Escudero Sahuquillo

Patents: Method for Congestion Management of a Network, a Switch, and a Network, 2009/2011