Piotr Borowski

CS student exploring Web development, with algorithmic background

Client-server application for distributed file storage

source code and manual: github

Below is a short overview of the biggest individual project during my bachelor’s degree. It was a command line application for storing files on a server cloud, written in C++.

Architecture

Application consists of server nodes and client nodes. They communicate over the network according to the protocol predefined by the task author. Nodes cooperate creating an arbitrary large group. Nodes can dynamically join or leave the group. Each server or client provides the same functionalities and acts indifferently from others. The client node’s job is to browse, upload, download or delete files from the group, while server nodes store them.

Nodes create a group by using the same multicast IP address. Each server provides some non-volatile memory for storing files. That means that the maximum storage capacity may change as the nodes join and leave the group. Every file stored in the group is located on only one server and every client has full access to it.

Client and server nodes communicate by exchanging packets with a specific format over the UDP protocol. The example of such packet:

struct SIMPL_CMD {
    char[10] cmd;   // command type, e.g. "ADD", "NO_WAY", "HELLO" 
    uint64 cmd_seq; // unique number used to match requests and responses
    char[] data;    // content, e.g. list of files on the server
};

On the other hand, file transmission takes place over the TCP protocol as it is connection-oriented and less vulnerable to lost packets in the network layer than UDP.

Challenges

The fundamental difficulty and also the purpose of this project was using linux websockets. It took me some time to wrap my head around sockets, binding, addresses, ports, connecting and so on just to be able to send first bytes over the network.

With ability of exchanging data between nodes came the concurrent programming issues. For example, there could be many clients wanting to send a file to the same server. Or there could be a client wanting to delete a file while someone else is downloading it. Even simple logging stuff to the console was sometimes mixed from two or more threads holding a TCP connection. To do so I used mutexes and tried to leave as little space as possible for a nasty execution flow.

The time-consuming task was also making an application to be robust and cope with invalid packets, requests and user input in general. One of the requirements was to report such bugs with the description of the error. Problems would occur on a deep level, e.g. in the middle of sending a packet so C++ exceptions were helpful.

There were additional points for making the server group synchronized. They should talk to each other in order to not duplicate any files. It was not an easy task to implement, however we could describe our solution in a readme file. My idea based on the Ricart-Agrawala algorithm and the Laport timestamp algorithm was correct. Or at least scoring says so.

Experience

At the beginning this task was a nightmare for me because of such a big entry barrier. But with the help of my friends I liked the project and got more and more joy as the progress was made. Sending files between two computers over the wi-fi with my code gave me a lot of satisfaction. Finally, everything worked as planned, I got a full score, but most importantly, learned a lot.