Home arrow Projects funded by public calls arrow subTask 2.1 (UM) Design and implementation of efficient mechanisms for data transfer and storage in  

Main Menu
About us
Project Description
Quantitative Results
Research Lines
Research Results
Impact on Society
Press room
Contact us
Secure Login
Events Calendar
« < April 2019 > »
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5

subTask 2.1 (UM) Design and implementation of efficient mechanisms for data transfer and storage in
Leader: Juan Piernas/Juan Fernandez; Researchers: Pilar González-Férez, Ana Avilés, Juan Sánchez

1. Brief Description of the Goals

Secondary storage is usually a performance bottleneck in modern computer systems, from desktops to large distributed systems. This performance problem can be tackled from two points of view: locally, by developing new mechanisms to improve the throughput of the hard disks attached to each node, and globally, by implementing new scalable techniques and algorithms for distributed storage management. In this project, we have proposed different approaches to improve the performance of the secondary storage both locally and globally.

2. Scientific and Technical Developed Activities

In order to improve the performance of the secondary storage of each cluster node, we have designed and implemented three mechanisms that can greatly improve the I/O performance of both hard and solid-state drives: KDSim, REDCAP and DADS. KDSim is an in-kernel disk simulator that provides a framework for simultaneously simulating the performance obtained by different I/O system mechanisms and algorithms, and for dynamically turning them on and off, or selecting between different options or policies, to improve the overall system performance. REDCAP is a RAM–based disk cache that effectively enlarges the built-in cache present in disk drives. By using KDSim, this cache is dynamically activated/deactivated according to the throughput achieved. Results show that, by using KDSim and REDCAP, a system can improve its I/O performance up to 88% for workloads with some spatial locality on both hard and solid-state drives, while achieves the same performance as a “normal system” for workloads with random or sequential access patterns.  These results were published by Gonzalez et al. in IEEE MSST 2007, IEEE MASCOTS 2008 and SBAC-PAD 2010 conferences. DADS (Dynamic and Automatic Disk Scheduling) simultaneously compares two different Linux I/O schedulers, and dynamically selects that which achieves the best I/O performance for any given workload. The comparison is made by running two instances of KDSim inside the Linux kernel. Results show that, by using DADS, the performance achieved is always close to that obtained by the best scheduler. Thus, system administrators are exempted from selecting a suboptimal scheduler which can provide a good performance for some workloads, but may downgrade the system throughput when the workloads change. These results were published by Gonzalez et al. in ACM SAC 2012 Conference.

In the case of distributed file system, we have implemented a new approach for Active Storage in collaboration with the Pacific Northwest National Laboratory (USA). Active Storage reduces the bandwidth requirements between the compute and storage elements of a cluster by moving appropriate processing tasks to the storage nodes. The execution of tasks on the storage nodes also allows Active Storage to leverage the processing power of these nodes too. Active Storage has been implemented on Lustre and PVFS parallel file systems. We have also provided a scientist-friendly environment where it is easy to describe and run an Active Storage job. There results were published by Piernas et al in the ACM/IEEE SC 2007 Conference. The current implementation of Active Storage is able to deal with striped files, i.e., files whose data is spread across several nodes, and which are typically used to improve the I/O bandwidth. It is also able to deal with netCDF files which are very common for data exchange in some scientific applications. These results were published by Piernas and Nieplocha in the Euro-Par 2008 Conference and in Parallel Computing, Vol. 36, No. 1, 2010.

Finally, we have also designed and implemented an enhanced new type of OSD device (called OSD+), and a file system based on it (the Fusion Parallel File System, FPFS). The new OSD+ devices support data objects and directory objects. Unlike “data” objects, present in a traditional OSD, directory objects store file names and attributes, and support metadata-related operations. By using OSD+ devices, we have showed how the metadata cluster of FPFS can effectively be managed by all the servers in a system, improving the performance, scalability and availability of the metadata service. These results were published by Aviles et al. in the SBAC-PAD 2011 Conference.

Publications: [Gonzalez07], [Gonzalez08a], [Gonzalez08b], [Piernas07], [Piernas08]

Projects funded by Public Calls: 

External collaborations Academia: Toni Cortés, Jarek Nieplocha

External collaborations Industry: --

Company Agreements: --

PhD dissertations:  María Pilar González Férez

Patents: --