BMe Research Grant

Laszlo TOKA

E-mail address

Home page

Department's home page

Phone: +36 30 8594040

BMe Research Grant - 2010

1st Prize

Doctoral School of Information Sciences

Department of Telecommunications and Media Informatics

Dr. Attila Vidács and Dr. Pietro Michiardi

Safe data backup for free

Introducing the research area

Nowadays, information technology is present both in our work and personal life. Accumulated information results in large and continuously increasing amount of digital data, a significant part of which is confidential and cannot be reproduced easily, therefore their safe storage is essential. Numerous methods and applications are available to simultaneously produce and back up copies, but the difficulty and high costs of the archiving process preclude most users to provide safety to their data.
My research aims at studying backup service in preferably peer-to-peer (P2P, distributed network of functionally equal peers) systems, where users save their backup data on one of the others' unexploited storage devices over the Internet for free. As a main consequence, no scalability problems arise (higher number of users come with larger overall storage space); moreover, geographical as well as ownership diversity of the storing hosts assure great safety for the backed up data. However, the management of users who are not willing to share their local resources with other participants is extremely important for the system to remain operational. Furthermore, assuring the quality of service in a distributed system requires careful planning.


Brief introduction of the research place

I carried out my research work at BME Department of Telecommunications and Media Informatics in Hungary, at Telecom Bretagne and Eurecom in France. All three institutions conduct research on distributed computer systems and networks using game and distributed algorithmic theory-based models. The quality of their work is reflected in the large number of high quality international publications on prestigious and selective forums.


History and context of the research

Additionally to the widespread use of personal computers at firms and households, recently we are also experiencing an explosive growth in the number of mobile devices capable of creating digital data: PDAs, tablet PCs, smartphones etc. The volume of user-generated content (documents, photos, audio and video files) increases at a greater rate than ever before. It is therefore a relevant issue to find safe storage for the large amount of information that is usually difficult or impossible to re-generate. The importance of data duplication and storage at different locations, i.e., data backup, stems from the provided protection against data perishing, possibly due to unintentional deletion, random device failure, unforeseen natural disaster or theft. When preparing archives, it is important that the backed up files are saved in geographically remote locations, preferably in more than one copy; when restoring lost data the availability of archives is essential.

Two important aspects incur when creating safe backups: costs and automation. Currently available options mostly improve one at the expense of the other. The cheapest and least automated solution is to save the data on some secondary storage device from time to time (e.g., on external hard drive, USB flash drive, CD, DVD, Blu-ray) and to place the device to a safe location. At the other end of the spectrum are the online storage services (Amazon S3, Dropbox, Wuala etc.): saved data is transferred over the Internet to profit-oriented companies' reliable servers, data centers where they store it after replication. Although this latter solution is automated (e.g., files placed in a designated folder are saved continuously), the occurring cost is much higher than with manual, local (offline) backups.


Aim of the research

My research focused on the feasibility of a data backup system that would combine existing methods' advantages, without their drawbacks, of course. In addition to the relatively high expenses of user-friendly online backup services, other problems arise in relation to data privacy and reliability. Outsourcing important and often sensitive data into the hands of a single, profit-oriented company comes with serious security risks. Encrypting the files appropriately solves the secrecy issues. The consequences of the data center's vulnerability and the service provider company's eventual bankruptcy, however, are still not remedied.

In a P2P system, backed up files can be stored in a distributed manner at several users at different locations (Figure 1). Every user shares resources with the community, and in return, can save backup data on other members' storage devices having network access. The system is designed to maintain high availability and accessibility for the backup, and to assure the recoverability of local files in the event of data loss (Figure 2). Nevertheless, participants may join to misuse the system. Exclusion of those, who do not wish to share their own resources (disk space, Internet connection, time spent online) with the community, is important to maintain the service for ethical users of the network [6]. Once appropriate incentives are in place, distributed architecture outperforms the centralized one in scalability: a growing number of users does not lead to service performance degradation. On the contrary, saved data can be further dispersed on the larger user base.

My research mainly focused on these incentives: on their impact, efficiency and the related control burden. Storing data on temporarily unavailable users, even with implemented incentives, causes temporary data loss. To remedy this phenomenon, data must be stored in multiple copies, i.e. with redundancy. The management of data redundancy and network operations greatly affect the quality of service; the investigation of these aspects was also subject of my research.



Figure 1. The user's program, running on its computer being connected to the Internet, safely stores its data by scattering the secure copies on other participants' computers.



Figure 2. After a loss of local data, the program recovers the required data from the storing participants.


Methods and results

I started my research work by creating a theoretical model of P2P storage systems, in order to study possible incentive mechanisms (policies and rules) [1]. I suggested and compared two approaches for encouraging resource sharing. In symmetric systems, the service that each participant may receive is limited to the user's contribution. In payment-based systems a profit-oriented agent buys and sells network storage from and to participants, respectively. I described user "selfishness" with a non-cooperative game theoretic [7] model and studied the overall user benefit in both systems. I also provided necessary and sufficient conditions for a system to socially outperform the other.

Next, I developed the symmetric system model so that users were able to "selfishly" choose the peers, on which they stored data [2, 3]. Selection of storing partners is based on their characteristics (network availability, Internet connection's upload and download bandwidths). In case of mutual willingness, two users offer the same amount of storage space to each other. I demonstrated that the selfish peer selection encourages users to raise their devoted contribution to the system in terms of online availability and dedicated bandwidth. This essentially improves the overall quality of service offered by the system. My work also included extension of a well-known matching theory problem [8], which allowed me to formulate the algorithmic problem of peer selection in the game theoretic context. I have proposed a fast (polynomial time) algorithm that increases the user satisfaction as much as possible via the organized peer selection. I have verified my model and the effectiveness of the algorithm through simulation tests.

Basically, to maintain a P2P backup system, participating users must share three types of resources: disk space, bandwidth and time spent connected to the network. However, a system with a small number of users might not be able to guarantee the appropriate quality of service merely with the shared user resources available. Therefore, I examined the effects of introducing a central storage server in order to avoid such (temporary) situations and presented the cost implications of persistent quality guarantees [4].
In such a hybrid system, the central, reliable (highly available) server might be used to store data in exchange for the reimbursement of costs. Furthermore, it can also be used to restore data from the remaining copies when backup is lost on peers that leave the system. I have shown that the relatively low occurring costs of such a hybrid system allow for highly relaxing certain incentive rules.

Subsequently, I built analytic models to study the system's other, relevant operational parameters both in pure P2P and hybrid systems. Then, I evaluated various options of data redundancy, data maintenance and network data transfer schemes with simulations. To show reasonable comparison between different system designs, I introduced simple metrics to describe the quality of service, e.g., duration of archiving and data recovery process, probability of backup loss. I determined the settings that provided well-suited service quality for backup purposes and, at the same time, implied the possible smallest burden on users (i.e., minimal shared resources). Also suggested a new procedure for determining data redundancy, and evaluated its performance against currently available techniques. The redundancy-determining method is based on the time required to retrieve backup data and it guarantees high service quality levels while significantly reducing the applied redundancy and so users' storage and bandwidth requirements.


Finally, I validated the results of my research by implementing the developed algorithms in a P2P backup system application [5]. I wrote a user program with the established incentive rules and tested the created prototype in various settings. The experiments were implemented on a global research network, the PlanetLab and included hundreds of simulated users. The need for incentives has been shown by comparing performance results of experiments with and without symmetric peer selection mechanism. The careful adjustments of data redundancy also proved beneficial to other, existing methods. The experiments revealed that safe, free and user-friendly backup service is feasible in a P2P system.


Expected impact and further research

My research findings include extensive investigation of distributed storage and backup systems with various features. The presented combination of matching and game theory models may provide new insights into existing problems. The defined performance metrics and the novel data redundancy calculation approach based on the data retrieval duration may affect other P2P storage system designs (not necessarily only for backup purposes). In addition, the prototype can provide a basis for developing practical backup applications.

The real impact of the proposed incentive mechanism could only be measured during operation in a system with real users. Therefore, setting up a real experimental testbed is certainly one of the directions of future work. Observable online availability pattern of users also worth further investigations: peer selection based on the daily, weekly recurring regularity may further reduce the resources necessary to share.


Publications, references, links

Related own publications

[1] Patrick Maille, Laszlo Toka: Managing Peer-to-Peer Data Storage System in a Selfish Society, IEEE JSAC Special issue on "Game Theory in Communication Systems", Volume 26, Issue 7, September 2008, pp. 1295–1301
[2] Pietro Michiardi, Laszlo Toka: Selfish Neighbor Selection in Peer-to-Peer Backup and Storage Applications, Euro-Par'09, 15th International Conference on Parallel and Distributed Computing, Delft, The Netherlands, 2009
[3] Laszlo Toka, Pietro Michiardi: Analysis of User-Driven Peer Selection in Peer-to-Peer Backup and Storage Systems, Springer Telecommunication Systems, Special Issue dedicated to GameComm'08, 2010
[4] Laszlo Toka, Matteo Dell'Amico, Pietro Michiardi: Online Data Backup: Peer-Assisted Approach, IEEE P2P'10, IEEE International Conference on Peer-to-Peer Computing, Delft, The Netherlands, 2010
[5] Attila Csoma, Laszlo Toka, Attila Vidacs: A Peer-to-Peer Backup System with Incentives, TSP'10, 33rd International Conference on Telecommunications and Signal Processing, Vienna, Austria, 2010 (submitted)


Link collection


BME Department of Telecommunications and Media Informatics: http://www.tmit.bme.hu/?bl=y
HSNLab: http://hsnlab.tmit.bme.hu/Welcome
Télécom Bretagne: http://www.telecom-bretagne.eu/
Eurecom: http://www.eurecom.fr/
Amazon S3: http://aws.amazon.com/s3/
Dropbox: http://www.dropbox.com/
Wuala: http://www.wuala.com/
Game Theory: http://en.wikipedia.org/wiki/Game_theory
PlanetLab: http://www.planet-lab.org/


List of references

[6] Eytan Adar, Bernardo A. Huberman: Free Riding in Gnutella, Technical report, Xerox Palo Alto Research Center, 2000
[7] Martin J. Osborne, Ariel Rubinstein: A Course in Game Theory, MIT Press, 1994
[8] David Gale, Lloyd S. Shapley: College admissions and the stability of marriage, American Mathematical Monthly, Volume 69, Issue 1, 1962, pp. 9–15