System

Every node of the distributed backup system provides a certain amount of storage space on the local hard drive, that can be used to store backup data. Whenever a file is inserted into the backup system, it is split into several blocks. These blocks are then encrypted using convergent encryption, that is, the key used in the applied shared key cryptographic algorithm (AES is used here) consists of the hash of the unencrypted file block. By using this approach, it is possible to share file blocks between different files or even different users, thus reducing storage and network requirements. The following figure shows the system’s data flow structure.

Distributed Backup System - Data Flow

For each file, a file block list is created which stores pointers in form of identifiers of the file blocks belonging to the corresponding file. The file block list is encrypted using public key cryptography (RSA is used here). Moreover, the user can specify how many copies of each file block respectively each file block list are going to be inserted into the backup system. A higher number means higher resilience, but also higher storage and network requirements. Directories are treated in a very similar way. A directory list simply consists of pointers to the file block lists of the contained files. The proposed system has been implemented in Java, using the freely available Pastry libraries.

This work was done as a final course project in 2006 at Wayne State University, Detroit, MI, USA.

Final Project Report (PDF)