BMe Research Grant
BMe Research Grant - 2010
Nowadays, information technology is present
both in our work and personal life. Accumulated information results in large
and continuously increasing amount of digital data, a significant part of which is confidential
and cannot be reproduced easily, therefore their safe storage is essential.
Numerous methods and applications are available to
simultaneously produce and back up
copies, but the difficulty and high
costs of the archiving process
preclude most users to provide safety to their data.
My research aims at studying backup service in preferably peer-to-peer (P2P, distributed network of functionally equal peers) systems, where users save their backup data on one of the others' unexploited storage devices over the Internet for free. As a main consequence, no scalability problems arise (higher number of users come with larger overall storage space); moreover, geographical as well as ownership diversity of the storing hosts assure great safety for the backed up data. However, the management of users who are not willing to share their local resources with other participants is extremely important for the system to remain operational. Furthermore, assuring the quality of service in a distributed system requires careful planning.
I carried out my research work at BME Department of Telecommunications and
Media Informatics in Hungary, at Telecom Bretagne and Eurecom in France. All
three institutions conduct research on distributed computer systems and networks
using game and distributed algorithmic theory-based models. The quality of
their work is reflected in the large number of high quality international
publications on prestigious and selective forums.
Additionally to the widespread use of personal computers at firms and households, recently we are also experiencing
an explosive growth in the number of mobile devices capable of creating digital data: PDAs,
tablet PCs, smartphones etc. The volume of user-generated content
(documents, photos, audio and video files) increases at a greater rate than ever
before. It is therefore a relevant issue to find safe storage for the
large amount of information that is usually
difficult or impossible to re-generate. The importance of data duplication and
storage at different locations, i.e., data backup, stems from the provided
protection against data perishing, possibly due to unintentional deletion,
random device failure, unforeseen natural disaster or
theft. When preparing archives, it is important that the backed up files
are saved in geographically remote locations, preferably in more than one copy;
when restoring lost data the availability of archives is essential.
Two important aspects incur when creating safe backups: costs and automation. Currently available options mostly improve one at the expense of the other. The cheapest and least automated solution is to save the data on some secondary storage device from time to time (e.g., on external hard drive, USB flash drive, CD, DVD, Blu-ray) and to place the device to a safe location. At the other end of the spectrum are the online storage services (Amazon S3, Dropbox, Wuala etc.): saved data is transferred over the Internet to profit-oriented companies' reliable servers, data centers where they store it after replication. Although this latter solution is automated (e.g., files placed in a designated folder are saved continuously), the occurring cost is much higher than with manual, local (offline) backups.
research focused on the feasibility of a data backup system that would combine
existing methods' advantages, without their drawbacks, of course.
In addition to the relatively high
expenses of user-friendly online backup services, other problems arise in relation to data privacy and reliability.
Outsourcing important and often sensitive data into the hands of a single,
profit-oriented company comes with serious security risks. Encrypting the
files appropriately solves the secrecy
issues. The consequences of the data
center's vulnerability and the service
provider company's eventual bankruptcy, however, are still not
In a P2P system, backed up files can be stored in a distributed manner at several users at different locations (Figure 1). Every user shares resources with the community, and in return, can save backup data on other members' storage devices having network access. The system is designed to maintain high availability and accessibility for the backup, and to assure the recoverability of local files in the event of data loss (Figure 2). Nevertheless, participants may join to misuse the system. Exclusion of those, who do not wish to share their own resources (disk space, Internet connection, time spent online) with the community, is important to maintain the service for ethical users of the network . Once appropriate incentives are in place, distributed architecture outperforms the centralized one in scalability: a growing number of users does not lead to service performance degradation. On the contrary, saved data can be further dispersed on the larger user base.
My research mainly focused on these incentives: on their impact, efficiency and the related control burden. Storing data on temporarily unavailable users, even with implemented incentives, causes temporary data loss. To remedy this phenomenon, data must be stored in multiple copies, i.e. with redundancy. The management of data redundancy and network operations greatly affect the quality of service; the investigation of these aspects was also subject of my research.
Figure 1. The user's program, running on its computer being connected to the Internet, safely stores its data by scattering the secure copies on other participants' computers.
Figure 2. After a loss of local data, the program recovers the required data from the storing participants.
started my research work by creating a theoretical model of P2P storage systems,
in order to study possible incentive mechanisms (policies and rules) . I suggested and
compared two approaches for encouraging resource
In symmetric systems, the service that each participant
may receive is limited to the user's contribution. In payment-based
systems a profit-oriented agent buys and sells
network storage from and to participants, respectively. I described user "selfishness" with
a non-cooperative game theoretic  model and
studied the overall user benefit in both systems. I also provided necessary and
sufficient conditions for a system to socially
outperform the other.
Next, I developed the symmetric system model so that users were able to "selfishly" choose the peers, on which they stored data [2, 3]. Selection of storing partners is based on their characteristics (network availability, Internet connection's upload and download bandwidths). In case of mutual willingness, two users offer the same amount of storage space to each other. I demonstrated that the selfish peer selection encourages users to raise their devoted contribution to the system in terms of online availability and dedicated bandwidth. This essentially improves the overall quality of service offered by the system. My work also included extension of a well-known matching theory problem , which allowed me to formulate the algorithmic problem of peer selection in the game theoretic context. I have proposed a fast (polynomial time) algorithm that increases the user satisfaction as much as possible via the organized peer selection. I have verified my model and the effectiveness of the algorithm through simulation tests.
Basically, to maintain a P2P backup system, participating users must share three types of resources: disk space, bandwidth and time spent connected to the network. However, a system with a small number of users might not be able to guarantee the appropriate quality of service merely with the shared user resources available. Therefore, I examined the effects of introducing a central storage server in order to avoid such (temporary) situations and presented the cost implications of persistent quality guarantees . In such a hybrid system, the central, reliable (highly available) server might be used to store data in exchange for the reimbursement of costs. Furthermore, it can also be used to restore data from the remaining copies when backup is lost on peers that leave the system. I have shown that the relatively low occurring costs of such a hybrid system allow for highly relaxing certain incentive rules.
Subsequently, I built analytic models to study the system's other, relevant operational parameters both in pure P2P and hybrid systems. Then, I evaluated various options of data redundancy, data maintenance and network data transfer schemes with simulations. To show reasonable comparison between different system designs, I introduced simple metrics to describe the quality of service, e.g., duration of archiving and data recovery process, probability of backup loss. I determined the settings that provided well-suited service quality for backup purposes and, at the same time, implied the possible smallest burden on users (i.e., minimal shared resources). Also suggested a new procedure for determining data redundancy, and evaluated its performance against currently available techniques. The redundancy-determining method is based on the time required to retrieve backup data and it guarantees high service quality levels while significantly reducing the applied redundancy and so users' storage and bandwidth requirements.
Finally, I validated the results of my research by implementing the developed algorithms in a P2P backup system application . I wrote a user program with the established incentive rules and tested the created prototype in various settings. The experiments were implemented on a global research network, the PlanetLab and included hundreds of simulated users. The need for incentives has been shown by comparing performance results of experiments with and without symmetric peer selection mechanism. The careful adjustments of data redundancy also proved beneficial to other, existing methods. The experiments revealed that safe, free and user-friendly backup service is feasible in a P2P system.
My research findings include extensive investigation of distributed storage and
backup systems with
various features. The presented
combination of matching and game theory models may provide new insights
into existing problems. The defined
performance metrics and the novel data redundancy
calculation approach based on the data retrieval duration may
affect other P2P storage system designs (not necessarily only for backup
addition, the prototype can provide a basis for developing practical backup
The real impact of the proposed incentive mechanism could only be measured during operation in a system with real users. Therefore, setting up a real experimental testbed is certainly one of the directions of future work. Observable online availability pattern of users also worth further investigations: peer selection based on the daily, weekly recurring regularity may further reduce the resources necessary to share.
Related own publications
 Patrick Maille, Laszlo Toka: Managing Peer-to-Peer Data Storage System in a Selfish Society, IEEE JSAC Special issue on "Game Theory in Communication Systems", Volume 26, Issue 7, September 2008, pp. 1295–1301
 Pietro Michiardi, Laszlo Toka: Selfish Neighbor Selection in Peer-to-Peer Backup and Storage Applications, Euro-Par'09, 15th International Conference on Parallel and Distributed Computing, Delft, The Netherlands, 2009
 Laszlo Toka, Pietro Michiardi: Analysis of User-Driven Peer Selection in Peer-to-Peer Backup and Storage Systems, Springer Telecommunication Systems, Special Issue dedicated to GameComm'08, 2010
 Laszlo Toka, Matteo Dell'Amico, Pietro Michiardi: Online Data Backup: Peer-Assisted Approach, IEEE P2P'10, IEEE International Conference on Peer-to-Peer Computing, Delft, The Netherlands, 2010
 Attila Csoma, Laszlo Toka, Attila Vidacs: A Peer-to-Peer Backup System with Incentives, TSP'10, 33rd International Conference on Telecommunications and Signal Processing, Vienna, Austria, 2010 (submitted)
BME Department of Telecommunications and Media
Télécom Bretagne: http://www.telecom-bretagne.eu/
Amazon S3: http://aws.amazon.com/s3/
Game Theory: http://en.wikipedia.org/wiki/Game_theory
List of references
 Eytan Adar, Bernardo A. Huberman: Free Riding in Gnutella, Technical report, Xerox Palo Alto Research Center, 2000
 Martin J. Osborne, Ariel Rubinstein: A Course in Game Theory, MIT Press, 1994
 David Gale, Lloyd S. Shapley: College admissions and the stability of marriage, American Mathematical Monthly, Volume 69, Issue 1, 1962, pp. 9–15