Main Menu
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
Secure Login
Events Calendar
« < April 2019 > »
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5

subTask 2.3(UPV): Design of a new fault-tolerant communication model for on-chip networks

Leader: Federico Silla; Researchers: Carles Hernández

1. Brief Description of the Goals

Nowadays, VLSI manufacturing processes allow designers to include a large number of cores in a single die, leading to what is known as CMP (Chip MultiProcessors). Cores in the CMP must be interconnected by a network-on-chip (NoC). Unfortunately, current integration scales not only allow to include several billion transistors in a single die working at gigahertz frequencies, but also new challenges have to be faced because of the process variation introduced as a consequence of shrinking transistor size, which causes some unpredictability in the behaviour of the final chip.

In future CMPs, expected to be implemented with a 22nm technology by 2015, the probability of having faulty links in the network will noticeably increase because of the high number of links that a NoC is composed of and because of the large process variability expected. Therefore, it is required a mechanism able to adapt to the exact conditions of the manufactured chip, so that chip performance is not drastically reduced. Several are the approaches to deal with process variability:

i. Discarding the faulty chip. This option would increase manufacturing cost because a non-negligible percentage of chips should have to be discarded as variability increases.

ii. Reducing clock frequency for the entire chip. This option does not discard the chip and therefore its cost is not so high. By using this approach, lower performance chips would be marketed. However, this approach slows down the entire chip when only some of the links in the NoC are faulty.

iii. The best choice is keeping the initial clock frequency, despite the fact that several links are not able to switch at that rate. This would increase performance with respect to the previous approach. The question to be solved is how to efficiently use the slower links so that the maximum available bandwidth is used. 

2. Scientific and Technical Developed Activities

In order to achieve the goals of this task, the first step has been to quantify process variation for different technology scales. For this purpose we have developed a methodology that takes into account the physical layout of the network, the physical and electrical characteristics of the technology used to implement the chip and provides the delay estimation for each of the transistors of the network. This accurate delay information can later be fed into network simulators in order to assess the benefits of different policies intended to minimize the effects of process variation.

Once we were able to quantify process variation, we devised several mechanisms able to retrieve the available bandwidth still available in links affected from process variation, which would be otherwise labeled as faulty. These mechanisms retrieve different percentage of the available bandwidth. Thus, depending on the required cost, the designer would use one or another. Briefly, these mechanisms were a) PR (Phit Reduction), where wires in a link slower than a threshold are discarded (a circuit for properly slicing flits at the transmitter side and a circuit to rebuild the received flits are required) and b) SMC (Space-Multiplexed Channels), which consists of dividing the link into a small number of sublinks and driving each of those sublinks at the maximum frequency allowed by the wires that belong to that sublink.

In addition to retrieving the still available bandwidth in the links, we also developed process mapping strategies that consider the different frequencies present in the chip so that global task scheduling minimizes execution time.

The results from this task were published by C. Hernández et al. in IPDPS 2009, DATE 2010, NoCs 2010 and ICPP 2011 conferences. Further results were published also by C. Hernández et al. in Journal of Parallel and Distributed Computing, vol. 75, issue 5, 2011, in IEEE Computer Architecture Letters, vol. 30 issue 4, 2011, and in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, issue 2, 2012. Finally, the research carried out in this task has been also published as the PhD Thesis of C. Hernández.

Publications: [Hern09a] , [Hern09b ] , [Hern08s ] , [Hern08b ], [Hern08c ]

Projects funded by Public Calls: TIN2006-15516-C04-01TIN2009-14475-C04-01STREP Num. 248972 

External collaborations Academia: --

External collaborations Industry: EXTOLL (Germany)AIC (Taiwan)

Company Agreements: --

PhD dissertations: Carles Hernández Luz

Patents: --