In his book Distributed Systems: Principles and Paradigms, Tanenbaum discusses the difficultly in accurately characterising distributed systems and makes allusions to the generally poor definitions often given in the literature, stating that “none of them [are] satisfactory, and none of them [are] in agreement with any of the others". Distributed systems are so prevalent, it can be difficult to come up with a definition that captures them all. Tanembaum, hence, provides a loose characterisation of a distributed system as a “collection of independent computers that appears to its users as a single coherent system". This characterisation is sufficient within the context of this work. In this report, we will be focused on exploring decentralised systems, specifically with a peer-to-peer (p2p) architecture and introduce the main output of this project: Butter.

Background

In this section we shall be providing some background on distributed systems, distributed architectures and how they can give rise to centralised and decentralised properties, overlay and peer-to-peer networks.

Distributed systems & architectures

Distributed systems have a few notable characteristics. Firstly, the “differences between the various computers [constituting the system] and the way in which they communicate are mostly hidden from the user" meaning that the underlying organisation of the system is abstracted away from the user. Secondly, the “users and applications can interact with a distributed system in a continuous and uniform way, regardless of when and where the interactions take place", which makes them well suited to deliver application services over the internet. Finally, distributed behaviour often lies in-between logical application layers of the system hence it is often referred to as the distributed behaviour middleware.

Distributes systems can be organised in several ways depending on how the software components are placed and interact, this leads to an instance of a system architecture. Figure [fig:distributed_taxonomy] illustrates the taxonomy of distributed systems in terms of the different possible architectures.

In this report, like many other works of distributed systems, we will be thinking in the terms of clients initiating connections to fulfil requests from servers. A server is a “process implementing a specific service" such as a database while a “client is a process that requests a service from a server by sending it a request and subsequently waiting for the server’s reply". Thinking in these terms will help us reason and manage the complexity while discussing the system.

When a system is distributed into logical layers, such as a database, processing components and user-interface, the distribution is vertical as the data flows from one layer to the next. In this architectural style, the users interface with the service through an endpoint, the top layer of the vertical stack. The user-level clients are distinct, unique and mutually exclusive from the service providing nodes.

While having a vertical distribution, from a systems’ management perspective, can help to logically and physically split system components across machines to provide a service, it is not the only way of organising a distributed system. For instance, distributing clients and servers in such a way that they operate based on their own data set (i.e., their view of the system), thus balancing load, is referred to as horizontal distribution. This distribution style is what gives rise to decentralised systems.

Centralisation & decentralisation

One of the pitfalls of non-distributed systems (i.e. just a single process) or a system with at least one vertical step in its distribution is that they introduce centralisation. Centralisation is when there are unique central processes that others depend on to fulfil a service. If a serving component of the system fails, clients become unable to access information and interact with the service. This precariousness comes about due to the inherent inter-dependability of the system subcomponents. For example, a typical architecture for a web application includes a vertically distributed database, data processing server and user-interface. The resulting system is distributed yet centralised, as if any one of the subcomponents should fail, e.g. the database, users would be unable to interact with the service. Large, heavily centralised systems consist of few servers relative to the number of clients, hence if a server becomes unavailable, a large number of clients may be unable to function. In some critical systems, where information availability is of paramount importance, a centralised system is at a higher risk of failure making the system less dependable.

A subtle design to note, is that a system may still be centralised if the underlying server infrastructure is itself horizontally distributed. For example, many cloud services lie behind load balancing servers. The load balancing server does not itself provide the core service but provide a supporting role in balancing server load. While the underlying system running the service is horizontally distributed, there is still only a single endpoint with which the client interacts, that of the load balancing server. This is illustrated in Figure [fig:vertHorArch], where there is a single publicly available load balancing server which acts an as endpoint for the overall service.

A good example of the risks caused by a centralised service is the ‘2021 Facebook outage’ which made Facebook, WhatsApp and Instagram unavailable for over six hours on October 4th 2021. An engineer made a configuration error while updating a router protocol resulting in the Border Gateway Protocol (BGP) “disconnecting Facebook data centres globally". This, in turn, prevented Facebook’s main DNS server from connecting to other nodes. Facebook has a globally distributed architecture, with backup databases and data processing servers, but users still interact with the services through their domain. If users are unable to resolve the facebook.com domain, they cannot access the underlying service regardless of whether their internal architecture is highly horizontally distributed. This was further worsened by engineers being unable to access the buildings to fix the configuration error as the card authentication system also relied on Facebook’s own DNS servers. This case study shows that high levels of centralisation and dependency can be risk-prone and suggests that relying on large centralised cloud services to provide communication services may be unwise.

Decentralisation, which occurs from horizontally distributed architectures, can offer some distinct advantages over a centralised design for delivering certain services. Given that decentralised systems are devoid of hierarchical organisation or centralised control, there is no single-point of failure, however, this comes at a significant cost of performance relative to the centralised equivalent providing the same service. Notably, maintaining information across several nodes comes at the cost of time complexity (as several nodes must process the information), space complexity (as some form of redundancy is often introduced) and most significantly message complexity (as nodes need to communicate with each other to determine their partial views of the system). Designing efficient decentralised protocols is of paramount importance for a decentralised system to effectively deliver its service.

Overlay networks

An overlay is a logical network that is implemented on top of some underlying network (see Figure [fig:overlay_network]). There is a one-to-one mapping between the nodes in each layer, i.e., each node in the overlay is associated with a node in the underlying network. The overlay nodes process and forward packets in an application-specific way. There can exist several overlay networks on top of the same underlying network each implementing its own application-specific behaviour and overlays can also be nested, one on top of another.

Peer-to-peer systems

In this work we will focus on a horizontally distributed system architecture known as peer-to-peer. Peers are “distributed computational entities each of which are considered equally important in terms of initiating an interaction and each of which provides its own resources". The peers cooperate and collaborate to provide a service; often to a distributed community of users. In a peer-to-peer system, nodes behaves both as clients and servers. This symmetry supports community resource sharing, however for effective service delivery a fault-tolerant design is required.

Peer-to-peer systems are both a “social and technical phenomenon". They allow for community resource aggregation to share storage space (e.g. Gnutella, Napster, BitTorrent), CPU cycles (e.g. SETI@Home), or support collaborative environments (e.g. Groove) hence p2p networks are often used to support community projects that may otherwise be unachievable. There are several factors that have fostered the growth of such systems: first, the low cost and high availability of large numbers of computing and storage resources, second, increased network connectivity as well as a desire become less dependent on cloud services.

Peers form self-organising networks that are ‘overlayed’ on underlying network infrastructure. By their nature, they have the advantage of providing services with “massive scalability, and fault-tolerance". However, because peer-to-peer systems are decentralised (i.e., fully horizontally distributed) managing data consistency as well as data and service availability is more complex. In many cases it is “difficult to provide guarantees with peer-to-peer systems because the peers come and go" we refer to this process as ‘churn’.

Note that it is not uncommon to see the terms ‘node’, ‘peer’ and ‘process’ used interchangeably in many works of distributed system and peer-to-peer systems. In this work we will be using the term ‘node’ and ‘peer’ interchangeably.

Types of peer-to-peer architectures

As seen in Figure [fig:distributed_taxonomy], there exists two types of peer-to-peer architectures; structures and unstructured. In a structured architecture the “network is constructed using a deterministic procedure". Most commonly the network implements a Distributed Hash Table (DHT) where the overlay network “assigns keys to data items and organises its peers into a graph that maps each data key to a peer". This produces a structured graph which enables efficient information retrieval. Common structured overlay network protocols include Chord and Kadmilia.

In an unstructured overlay network, each node is atomic and indistinguishable from any other node. The protocols do not attempt to organise peers, instead constructing a random graph. The resulting network is flat an can also be organised into hierarchical overlays. In this paradigm, randomised algorithms are used for maintaining the network. Each node maintains a list of its neighbours, known hosts, this list is constructed somewhat randomly. A node makes decision based on its partial view of the network, its immediate known hosts and the information it has collected during its lifetime. The network can at best “offer probabilities that quality goals will be met, and these probabilities typically increase with the size of the population of peers".

For retrieving information a structured p2p architecture is significantly more efficient than an unstructured architecture. In an unstructured overlay, queries for content are spread to a large fraction of peers, and there is no coupling between topology and information location so overlay topology cannot be exploited to improve search efficiency. On the other hand, unstructured networks are better suited to high-churn environments. In so called ‘churney’ environments, nodes frequently join and exit the network ungracefully either because of link or process failure. In a structured network there is an overhead for joining and exiting the network which comes from the ‘bootstrapping’ process where the network re-structures itself to allow the insertion or exit of a node while maintaining a consistent structure. In a community driven service, working over the internet, nodes are often unreliable; frequently joining, leaving, failing, or otherwise ungracefully disconnecting from the network. This means that unstructured networks can offer better performance in high churn environments.

Motivation & Goal

This project was mainly motivated by a sense of disillusionment with ‘cloud’ services. Cloud services are “infrastructure, platforms, or software that are hosted by third-party providers and made available to users through the internet". They are often advertised as highly dependable solutions, however, as the Facebook outage example in Section 1.1.2 shows, this is not always the case. The problem of over-reliance on centralised cloud infrastructures is particularly pertinent for community services such as Wikipedia. Wikipedia is a valuable resource of public information hence a decentralised design could be beneficial. It would allow the information to be highly available and devoid of controlling authorities making it much more difficult for malicious users to corrupt or censor information. Finally, given the information is contributed by the community of users it seems logical to have it hosted by the community as well, giving rise to the notion of community infrastructure.

There is an elegant efficiency to decentralised designs. In heavily centralised, busy network, the servers are expected to handle most of the load, while many clients remain idle. On the other hand, in a peer-to-peer system we can make everyone contribute resources at the benefit of the network. This gives rise to an interesting model of internet consumption where all users are expected to contribute if they want something from a service. Instead of monetizing services through advertising or subscription fees, users collaborate to provide infrastructure. With both open-source software and community hosted infrastructure we have the ability to provide entirely community driven services.

In addition, the design challenge of building a peer-to-peer system provides grounds to explore how information can exist beyond specific infrastructure and to develop probabilistic systems that reflect how people deal with information in the real world. The result is a system that in effect behaves fairly unpredictably hence we need to design algorithms and protocols that maximise the probability of the service being delivered correctly.

Finally instead of simply adopting a pre-existing decentralised network stack such as libp2p, we built one because despite the benefits of a decentralised approach, the majority of popular p2p networks include structured elements such as lookup tables, super-peers or Distributed Hash Tables (DHTs). These are introduced primarily to improve network performance by reducing message complexity. However, structured network elements reintroduce some of the pitfalls of the client-server model. This motivated the project’s goal of exploring entirely unstructured p2p architectures and their potential for highly-available, fault-tolerant service delivery.

Contribution: Butter

A node in a peer-to-peer system has to have “abilities to enable routing, efficient search of data items, selection of nearby peers, redundant storage, permanence, hierarchical naming" amongst many other features. In this project, instead of building a specific instance of a decentralised application, where the distributed behaviour is tightly coupled with the application specific logic, we built a framework allowing the isolation of the distributed behaviour middleware. This framework creates a consistent interface for building decentralised applications which is later used for the case studies. In addition, having a highly modular framework allowed for experimenting with different peer-to-peer solutions and protocols. The main output and contribution of this project is the creation of the framework named ‘Butter’, after its approach to managing information availability by ‘spreading’ it across nodes.

Problem brief

This section serves as a brief introduction to the main identified problems that a peer-to-peer middleware needs to handle. Greater depth is provided for each problem in Section 3, however, it may be useful to be aware of each of the problems before reading Section 2 and [ch:projectManagement].

Outline

To better understand this report, it is important to note that this project blurs the line between a software engineering project, trying to piece together and implement protocols and technologies to produce a product, and an academic research project, proposing new designs and protocols, experimenting and collecting data. With that said, the report cannot be structured like a pure software engineering report or a research paper. In addition, building and designing a distributed middleware such as Butter requires solving many problems before the system is functional, so, the implementation is broken down into modules for each problem. These modules are introduced in Chapter 3 allowing us to present the problem statement, related work, design and implementation as well as testing and evaluation in a self-contained section.

In this report, we will present some similar projects to Butter and discuss how the project was managed. Then we will describe the process of building the Butter framework. Finally, we will discuss some of the case studies, possible future work and conclude. In the conclusion we will summarise the report, discuss the legal, social and ethical considerations for Butter and peer-to-peer systems more broadly and finish with a personal statement on the project from the author.

Core problems and the Butter module that implements the solution
Problem	Butter module
Peer discovery	Discovery
Known host maintenance (edge selection)	Known host management
NAT traversal and internet discovery	Wider discovery
Persistent information	Persistent information
Information retrieval	Information retrieval

Related projects

As discussed in the introduction, the high-level output of this project is Butter, a peer-to-peer framework and networking stack for building decentralised applications. In this section of the report we will look at other similar projects that were a source of inspiration and helped inform this work. For the related work and literature review of a specific problem in designing peer-to-peer distributed systems refer to Section 3.

Many of the early peer-to-peer networks were popularised by file sharing systems such as Gnutella, Napster and the BitTorrent protocol. Peer-to-peer architectures provided new means of accessing and sharing information with a greater degree of freedom as the networks were not governed by any single institution. However, this freedom came at a cost were shared information could break copyright law, and it was inherently very difficult to hold any single individual accountable. No single organisation or entity was responsible for the system’s infrastructure as users would contribute their personal resources. This method of networking and information propagation has since been adopted by various other systems such as mobile ‘ad-hoc’ networks and peer-to-peer sensor networks.

Gnutella

Gnutella is, or rather was, a peer-to-peer file sharing system. It was initially developed by Justin Frankel and Tom Pepper at a subsidiary of AOL and released in March 2000. Within 24h of its release, AOL attempted to take down the network over concerns about the copyright liability of a platform created to freely share files. However, Gnutella was released as open source software under the GNU General Public Licence and AOL were unable to take it down before copies were made. Gnutella was quickly adopted and developed by diverse groups, becoming the basis for a range of peer-to-peer networks. Despite AOL’s attempts to take it down, the ball was set in motion giving rise to what would become the first large scale peer-to-peer network. Over the next decade, Gnutella would be the subject of large amounts of research and work to improve its performance and efficiency.

At its core, Gnutella is a protocol for search on a flat topology of nodes. Although files are stored in a centralised fashion, i.e. a node is responsible for storing a file not the entire network, Gnutella implements a decentralised model for document location and retrieval. Figure [fig:gnutella] shows the process of retrieving the Strawberries.txt file on a Gnutella network. In this model, every node is a server and a client; they can both query and deliver information. The network topology is flat and each node shares the same similar functionality, so they can be termed as ‘peers’.

On the Gnutella network there is no centralised directory and the system does not precisely control the network topology or file placement. The placement of data items is not based on any knowledge of the topology, as in a structured peer-to-peer design. To locate a data item, a node queries its neighbours, and gradually floods the network in a Breadth-first search manner. The lookup query is flooded to all neighbours, and if no match is found, the query is then forwarded to their neighbours. The query is typically bounded to a certain radius which is specified with a Time-to-Live (TTL) flag. This design is resilient to high churn rates, i.e., peers entering and leaving the system. However, as the network grows, the search mechanisms do not scale effectively and generate unexpected loads on the network.

To join the network, a node initially connects to one of several hosts that are known to be highly available, these are listed on the Gnutella website. This is one of the main weaknesses of the project. While the search mechanism is decentralised and is hence robust and fault-tolerant, the way of first joining and interacting with the network requires a form of centralisation. The ability for new nodes to join is entirely dependent on the availability of the known hosts listed on the Gnutella website. If the known hosts or the website itself were to become unavailable, then no new nodes would be able to join. Having nodes unable to join the network affects the overall probability of the network fulfilling its service.

Once connected to the network, peers can send messages to each other (refer to Figure [fig:gnutella]). These messages are either directly communicated to all peers with which the sender has open TCP connections or back-propagated, i.e., sent on a specific connection on the reverse of the path taken by an initial broadcast message. Each peer keeps a short cache of the recently routed messages in order to prevent re-broadcasting and to implement back-propagation.

To become a member of the network, a node has to open at least one connection with other peers already on the network. Peers periodically send PING messages to their neighbours to discover other participating peers. A peer decides who to connect to based only on local information. The entire application-level network is composed of peers and open TCP connections as links. The resulting network is a dynamic, self-organised network of independent entities.

Later versions of Gnutella introduced super-peers or ultra-peers. These are self-appointed peers with better bandwidth connectivity. Requests are preferentially routed via super-peers, if possible, to help improve the routing performance of the network. However, the design is still limited by the fundamental need to flood large parts of the network to retrieve information.

Finally, while Gnutella laid some of the foundations for implementing a decentralised search mechanism, it does not provide means for improving information availability by having information maintained by the network rather than being strictly tied to a node.

JXTA

JXTA was initially released by Sun Microsystems in 2001. It is a network programming and computing platform that is designed to solve a number of problems in distributed computing, especially in peer-to-peer networking. The project was discontinued when Sun was acquired by Oracle in 2009. JXTA provided a network programming platform specifically designed to be the foundation for peer-to-peer systems. The resulting platform was independent of specific transport protocols, languages and application logic.

Figure 2.1 shows how the JXTA architecture stack breaks down into three layers. At the bottom, the core layer deals with peer establishment, communication management such as routing, and other low-level elements. The six protocols at the JXTA core are listed in Appendix 9. In the middle, a service layer handles higher-level concepts, such as searching and file sharing. The top layer is for application services such as messaging, and storage systems.

BitTorrent

BitTorrent is a centralised peer-to-peer protocol that was originally conceived in 2001 by Bram Cohen. The protocol was designed to deliver a file sharing service similar to that of Gnutella. In BitTorrent, like in Gnutella, the burden of file storage is on the community, however, BitTorrent systems also have central serving nodes that handle location management. This enables much greater performance in information retrieval while maintaining some of the advantages of peer-to-peer architectures, namely, sharing bandwidth and load across the network and mitigating the need for data centres.

The protocol is designed to incite contribution by taking a ‘tit-for-tat’ approach where a peer responds with the same action that its other collaborating peer performed previously, e.g., if a node downloads a file hosted by other nodes, it subsequently hosts the file.

The architecture (depicted in Figure [fig:bittorentArchitecture]) consists of a central location, called a tracker and many peers. When attempting to download a file from the network a peer initially connects to a known tracker to download a .torrent file. This file contains metadata about the requested file such as its length, name, hashing information and URL of a tracker.

Trackers keep track of all the peers hosting a file using a protocol layered on top of HTTP. A downloader sends information about the file it is downloading to the tracker. The tracker responds with a list of contact information about the peers that are downloading the same file. Downloaders then use this information to connect to each other. A downloader that has the complete file, known as a seed, must send out at least one complete copy of the original file. Each downloader announces to all of its peers which piece of information it has. When a peer finishes downloading a piece, it checks that the hash matches with that of the .torrent file, and announces that it has that piece to all of its peers. This verifies data integrity.

libp2p

libp2p, unlike Gnutella or BitTorrent, does not deliver a specific user service but rather provides developers with a platform to build decentralised applications, much like JXTA. As of time of writing, it is used as the underlying peer-to-peer networking stack behind the IPFS project, Filecoin (a cryptocurrency based on sharing files) and the Ethereum cryptocurrency. libp2p is not a single library but rather a continuously maintained specification for the implementation of the peer-to-peer protocols. The project’s ambition is to inform and make developing peer-to-peer applications significantly easier in an effort to increase their ubiquity.

The project introduces some interesting innovations such as Peer IDs, which can uniquely identify peers across subnetworks producing an overlay that bridges networks (see Figure 2.2). One of the major innovation of libp2p is the introduction of ‘multiaddresses’ which specify a peer and communication protocol in a standard string format. This allows the address to not only imply a recipient but also the protocol for addressing it. Peers can hence form connection between each other with many protocols for different scenarios. The connection between peers in libp2p is called a stream. More than one stream can be active between the same pair of peers, and each stream is logically independent of one another. When a new stream is opened, the two peers can negotiate which protocols should be used, and when a peer proposes a protocol to use, the other can accept it or send a message saying that it does not support it. When a consensus is reached, the peers will start using the agreed upon protocol.

To discover new peers in the network, libp2p uses multicast broadcasting for local peers and boostrap endpoints that enable the peer to be discovered by the wider network. When a peer is online, it can observe the network to learn which are the most reliable peers, i.e., highly available peers. By storing their IDs in a bootstrap list it can then readily join the network again without a prompted boostrap endpoint. Peers can share information about highly available nodes between themselves. This approach introduces some central points of failure in the discovery mechanisms, however, libp2p offers a fault-tolerant solution by providing several discovery mechanisms and relying on rendezvous as a fallback.

For NAT traversal libp2p offers several protocols. If the router supports UPnP (Universal Plug and Play) or nat-pmp (NAT Port Mapping Protocol), libp2p automatically tries to configure the router to enable inbound traffic. Another techniques used by libp2p is to listen to incoming connection on the same public port of the router as the port associated by the router to the peer’s outbound connections. While these are far from perfect solutions, they are some of the best options currently available. NAT traversal could be avoided with wider adoption of the IPv6 protocol by Internet service providers (ISPs). IPv6 allows for the unique identification of all internet connected devices, but for the moment, most ISPs do not support the new protocol.

libp2p offers three protocols for content routing: Multicast DNS (mDNS), Kademlia DHT (KAD), and Publish-Subscribe. mDNS consists of broadcasting a query and asking local peers to identify themselves, while KAD leverages a Distributed Hash Table to find a peer with specific content. The Publish-Subscribe protocol has two implementations: FloodSub (based on network flooding) and GossipSub. In GossipSub peers are organised in a 2-layer logical network: a sparse layer, called full-message, where the published messages are exchanged using gossiping algorithms, and a dense layer, called metadata-only, used to maintain the other layer.

One of the core characteristics of the libp2p project is that it is fundamentally modular this is expressed in their logo and is conveyed throughout their documentation. Each module can be imported separately and independently of any other module and addresses a specific problem in peer-to-peer distributed computing. The libp2p module architecture can be seen in Figure 2.3.

Building Butter

In this chapter we cover the research and technical aspects of building the Butter framework. We shall discuss some general characteristics of the framework and global design decisions that have influence across all the modules. Then we shall dive into the specifics of each module, the problem(s) it seeks to solve, related work, design and implementation. We will then evaluate the module’s design based on testing when appropriate.

Butter is a networking stack and framework for building decentralised applications (dapps). Hence, Butter’s collection of modules can be used in conjunction to handle all the networking behaviour of a user specified decentralised application. Furthermore, the framework is designed to feel similar in use to other backend web development frameworks such as Django or Express. As a developer, you append extra functionality to a Butter node to describe the user-level application processing and the rest is handled by the framework. In other words, the framework’s goal it to allow application services to be delivered, in a decentralised fashion, with minimum friction.

Figure [fig:butter-platform-taxonomy] shows how the framework lies within the wider networking stack. The top layer is a user-defined application which interacts with the Butter modules. The architecture is similar to that of JXTA as seen in Figure 2.1. Butter provides a high-level API for developers to use, abstracting away the underlying behaviour that handles the distributed aspects of the system.

Like in the Gnutella implementation, Butter nodes perform tasks normally associated with both clients and servers. On one hand, they provide client-side interfaces through which users can query other nodes of the network, while at the same time they also accept queries from nodes and respond based on their partial view of the system. The decentralised design should result in highly fault-tolerant characteristics, as operation of the network will not be interrupted if a subset of nodes goes offline.

Before covering the design specific to each module, we introduce the core design themes that run throughout the framework. These are listed bellow:

Summarising table of Butter modules and the core technology used to implement the solution
Module	Core technology
Discovery	UDP Multicast
Known host management	Known host quality metric
Wider discovery	Port forwarding & Ambassadors
Persistent storage	PCG
Information retrieval (IR)	Random TTL BFS

Table 3.1 can be used as a quick reference summarising the core technologies used in each module. Greater detail is provided in each of the module sections.

Finally, in order to reason about the module designs, we should stress the importance of message complexity in distributed system. As computer scientists we are used to seeing time and space complexity when assessing the theoretical efficiency of a design, however, in distributed systems there is an extra factor to take into account: message complexity. In the literature, message and communication complexity are used interchangeably to denote the amount of communications required to solve a problem when the input to the problem is distributed among two or more parties, graphically it can be expressed as the maximum number of messages transmitted over any edge.

Testbed

Butter provides a testbed that enables stress testing and experimenting with Butter peer-to-peer networks. The tool can be used to spawn n nodes on a single machine (using go-routines), it can artificially introduce churn and add random data to the network. These steps can be carried out in various user defined sequences with timeouts allowing network recovery. Specific features of the tested can be extended based on the testing requirements of the module.

Limitations & challenges

Firstly, it is important to note how difficult it is to accurately test peer-to-peer networks in simulation. The systems are designed to connect different remote devices and hence without access to large amounts of network testing hardware, it can be difficult to run accurate simulations. Simulating nodes on a single system will often be limited, despite efforts to add randomised latency and introduce node and link failure. While the simulation might not present a realistic environment it does give an opportunity to create extreme scenarios to test edge cases.

In addition, there are hardware constraints to testing on a single system. Generating a large simulated network across many threads, introducing churn, and opening and closing ports in rapid succession locally on the system can cause the testbed to behave unpredictably. Hence, we are bounded on the amount of nodes that can be spawned by the test device’s resources.

Nodes behave asynchronously so can take unpredictable amounts of time to spawn (e.g. blocking while OS allocates port or as thread gets created), and it is difficult to determine how long a spawned node will take to discover other nodes on the network. This means it is difficult to create and destroy nodes in rapid succession. We are forced to include timeouts to give nodes sufficient time to start up and connect to the network (i.e. discover other nodes). This results in unpredictable success of test runs and/or very long test runs (up to several hours).

An interesting future improvement to the testbed would be to allow the generation of specified network topologies. This would enable controlled tests of very specific edge cases.

Discovery

Before any service can be delivered by a node, it needs to be known by other nodes, i.e., a node cannot benefit from or provide a service to the network if it is unknown by the network. In an effort to remove any form of centralisation, nodes cannot communicate with a known endpoint at spawn because there is no known endpoint. In other words, when the node is first spawned, it is not aware of any other nodes and hence cannot participate in the network, so the problem can be thought of as: how does a node get known by other nodes and conversely how do other nodes get to know the newly spawned node?

Note that this version of the peer discovery problem is only relevant in local area networks (LAN). Local area networks generally provide highly reliable communication facilities based on broadcasting, making it much easier to develop distributed discovery systems. Please refer to the Wider discovery section to see how nodes discover each other across subnetworks.

Related work

Here we will discuss some notable mechanisms and technologies used to enable communication between initially unknown nodes on LAN.

Broadcasting

There are two types of network links: point-to-point links and broadcast links. A point-to-point link consists of a single ‘sender’ process communicating with a single ‘receiving’ process (often referred to as a listening process). Broadcast links, on the other hand, can have multiple sending and receiving nodes, all connected to the same shared broadcast channel. In essence, broadcasting allows all host connected to a network, to share the same communication channel and so a packet sent by a host is received by all the other hosts on the network.

In broadcasting, we often specify the address of the intended recipient in the address field of the packet. While the packet is sent to all others on the network, only the recipient host processes it. However, there is also a possibility to address a packet to all hosts on the network by specifying a special code in the address field of the packet. When the packet is transmitted, it is received and processed by all the host in the network.

One of the main limitations of broadcast is that it has no mechanism to limit the recipients of a broadcast and so sends packets to all devices on a local area network. This is not of much importance on small local networks but can introduce significant bandwidth usage on larger LANs.

Multicasting

Multicasting is a transmission method in which copies of a packet are transmitted to a group of the hosts in the network interested in receiving the packet. The relationship between source and destination is one-to-many, as apposed to one-to-all for broadcasting. In multicasting, destination address is specified as a group address.

Multicast group membership is configured when devices send ‘join’ packets to an upstream router. The routers and switches keep track of this membership; so when multicast packets arrive at a switch, they are only sent to devices that want them.

Multicast DNS

Multicast DNS (mDNS) is a technology originally developed at Apple under the name Bonjour and has since been adopted as an internet standard. It is used to locate a device or service by name on a small local network without using a pre-configured name sever, i.e, a DNS. While the protocol uses the same packet structure and commands as DNS, it does not rely on a DNS server, instead computers on a network create their own local DNS records and store them in memory. When a host on the network requires the IP address of another host, it sent a DNS query using a multicast UDP message. All mDNS hosts see this query and the host storing the IP address responds. Because messages are exchanged using multicast, all other mDNS hosts see this exchange and can make a note of the network name and IP address. They can then update their local cache.

Design & implementation

The Butter Peer discovery mechanism is loosely inspired by the mDNS protocol. Multicast was preferred over broadcast in an effort to minimise wasted bandwidth usage. With multicast only devices running Butter node processes, receive and interpret packages.

As a node spawns, it initially has no known hosts. When a node’s list of known hosts is empty, the node goes into discovery mode. In discovery mode, the node, at regular intervals, sends a PING packet containing its listening address along a UDP multicast channel. All peers have a background procedure that listens out for incoming PING packets. If a PING packet is received by a remote host, it attempts to append the new host to its list of known hosts (given enough available memory) and responds with a PONG packet containing its own listening address. If a discovering node receives no response, a 10-second timeout occurs before trying again.

Please see Algorithms [alg:ping] and [alg:pong] to view the pseudocode for the discovery mechanism.

This implementation uses Go routines, which are lightweight threads to run the asynchronous procedures. In addition, Go provides in its default net packages an implementation of a UDP multicasting server and client which is used to initialise the UDP multicast channel.

Once an initial connection is made, the PING procedure stops and will only restart if the node detects it no longer has any known hosts. Nodes will always be listening out for incoming PING packets. Once connected a node can learn about other peers on the network by querying its newly known host about the other remote hosts.

Testing & Evaluation

The approach taken by Butter is similar to that of libp2p when discovering local peers. While it is effective, and uses significantly less bandwidth than broadcasting, there are some limitations. Firstly, the ping timeout interval affects the speed at which a new node can be discovered by the network. In worst case, discovery can be as long as the peer timeout interval plus any latency.

In a simulated testing environment, using the Butter testbed, a slow ping rate affected network functionality as nodes programmatically attempted to retrieve information before having any known hosts. In a practical environment this should not pose an issue but the module does provide a user parameter to change the default 10 second interval, so it can be best set to suit the operating environment. In most practical cases, a 10 second interval should suffice.

Finally, as discussed previously, the main limitation of this method is that it is restricted to LAN discovery. In addition, on certain LAN networks with extra security protocols, UPD multicasting to certain reserved groups may fail and hence nodes will be unable to discover each other. A solution to this may be to implement several fall-back discovery protocols like in libp2p, this could be developed in later version of the project.

Known host management

Known host management in an unstructured peer-to-peer system is arguably simpler than in a structured peer-to-peer architecture. Graphically, if a node is a vertex, its known hosts determines its edges. The direction of the edges imply ‘who knows who’, i.e., if nodes are known to each other, the edge is bidirectional. It is also possible that the hosts are not mutually known to each other.

In a structured network, a certain topology needs to be maintained in order for the overlay network to behave as intended. In an unstructured topology, there is no need to maintain known hosts in a specific structure, however, maintaining a balance between the amount of known hosts so that the network is sufficiently connected to function effectively, but not too many as to exceed a node’s resources, is important. Node’s have finite memory to store known hosts so peer selection becomes important.

One option is to simply choose known hosts on a ‘first-come-first-serve’ basis, however, this causes issues in some cases. For example, take three severely memory restricted nodes that only have the capacity to store one known host each. Upon spawning, the first two would become aware of each other and hence store each other in their known host list, leaving the last node unknown and hence unable to participate in the network (illustrated in Figure [fig:memoryRestrictedUnmanaged]). This edge case highlights that we want to design a known host selection and maintenance protocol that, in addition to maintaining a list of alive known hosts, always accepts known hosts that would otherwise be unknown by the network (illustrated in Figure [fig:memoryRestrictedManaged]).

A few extra properties of known hosts might be desirable, for example, it might be easier to query large parts of the network if node q’s known hosts know a lot of other hosts. Furthermore, q may want to have known hosts it can rely on, hence highly available. Finally, it might also be good to known hosts with lots of available storage so that q can readily share information with its peers. Intuitively, we might be inclined to think that we should optimise known hosts to be highly available nodes, with plenty of available storage, that know lots of other nodes. However, if that were the case, new nodes would be actively disregarded by the network as they would inherently have low uptime and know relatively few other nodes. We have to make the distinction between what is good for the node and what is good for the network.

Related work

Gnutella group membership messages

While the Gnutella project does not implement a very complex known host management protocol it does implement a Group membership protocol in which a peer joining the network broadcasts PING messages to announce its presence. The message is then forwarded to its neighbours, initiating back-propagated PONG messages, which contain information about peers, such as the IP address, number and size of the data items.

JXTA peer information protocol

Much like Gnutella, JXTA does not attempt to actively optimise a node’s known hosts but provides a peer information protocol for peers to learn about the capabilities and status of others. For example, a PING message can be sent to see if a peer is alive. A query can also be sent regarding a peer’s properties where each property is returned as a name and a value string.

Design & implementation

In Butter, a node can query other remote nodes to obtain their NodeQuality metadata. The data contained in NodeQuality is: uptime, available storage space and number of known hosts. Nodes periodically request the NodeQuality metrics from their known hosts and cache the data between updates.

Known host list maintenance

Firstly, the simpler part of the Known host management module is handling dead known hosts. During the regular NodeQuality requests, if a node is unresponsive, after a given timeout period, the host is removed from the list of known hosts. This reduces the probability that a node attempts to communicate with a dead host during other operations such as retrieving information or sharing discovered peers. Making sure that the hosts within the list are alive and responding with a minimum delay also prevents dead hosts from taking up list capacity, leaving room for new hosts to join.

Peer selection

In addition, the Known host management module is responsible for peer selection. Peer selection becomes a problem when a node’s known host list is at capacity and hence decisions need to be made as whether knowing or ignoring a host is best (with regards to the node and the network).

The approach for designing peer selection is loosely inspired by various optimisation algorithms (optimal, sub-optimal and soft constraint). We define an optimisation problem to be: “a set of variables, each with an associated domain, an objective function that maps total assignments to real numbers, and an optimality criterion, which is typically to find a total assignment that minimises or maximises the objective function". To avoid the edge case discussed previously, i.e., new peers being unable to get themselves known by pre-existing nodes on the network as they do not fit the desired NodeQuality, each node attempts to optimise its list of known hosts for a diverse distribution of NodeQualitys. In other words, nodes do not optimise for a specific kind of remote host but rather a diverse set of hosts. To put it in terms of an optimisation problem, the possible known hosts is the set of variables, each with an associated NodeQuality, the objective function determines how diverse the a known host list is (based on the node’s own perception of diversity guided by its partial view) and the optimality criterion is to find a set of known host that maximises the node’s known host diversity.
The procedure is as follow:

Notice that this algorithm is somewhat reminiscent of the AI optimisation algorithms where given a current known host list state, we evaluate a permutations and consider whether the new states is an improvement or not.

The objective function for diversity is based on the node’s partial view. A node looks at its metadata and the metadata of all of its immediate known hosts and classifies them according to this understanding of the network. With this objective function, at capacity, a node will attempt to have an equal distribution of node types, i.e., with varying NodeQuality metrics, by mostly accepting nodes that increase the diversity of its list of known hosts.

An important edge case to consider is when many nodes are spawned in quick succession, with similar parameter settings, e.g., similar user allocated memory. This leads to many nodes with very similar NodeQuality values. In this case, any given node will be optimising for a diverse set of known hosts, but the diversity between nodes and hence the global diversity may be poor. This results in the network converging towards knowing the same hosts. This can be thought of in terms of local and global diversity.

In this scenario, all nodes have similar uptime, available storage and number known hosts, and hence have the same diversity metric. If a new node wants to joins the network, it runs through the Discovery protocol (as it knows no hosts) and is quickly accepted by a host (as hosts always accept nodes with no known hosts). However, if the node does not fit within the unanimous diversity metric it will quickly be rejected by the network, it will then go back into discovery mode, be discovered and eventually rejected once again. This will keep re-occurring.

A solution would be to not have any form of diversity driven optimisation and simply accept nodes at random, however, this may lead to uneven distribution of information on the network and poor network performance. Instead, we rely on a few factors to decrease the probability of this occurring. Firstly, in most practical environments nodes are not symmetric, nodes typically have their own very unique perception of the world as they have their own uptime, available storage (user allocated memory) and collection of known hosts. In addition a randomness factor is added to the diversity protocol. This means that on occasion a node is accepted or removed by chance despite its valuation by the objective function, i.e., regardless of the diversity metric. This prevents a convergence towards only accepting certain node types in an attempt to improve the probability of global diversity.

Testing & evaluation

An extra optimisation to improve known host list maintenance would be to remove hosts that are found dead when carrying out other operations such as information retrieval. This would allow the list of known host to be updated between NodeQuality request intervals. To achieve this we would have to introduce an API endpoint within the Known host management module to enable other modules to make changes to the known host list. However, this has the potential to increase complexity for developers and increases the risk of accidental mismanagement and incorrect removal of known hosts, so this feature is not implemented in the current version of the module.

During the implementation several different NodeQuality intervals were explored. Obviously, short intervals means that the known host list is a better representation of the known hosts, however, this leads to greater message complexity. The message complexity of the NodeQuality procedure grows linearly for a node based on the number of known hosts. On the other hand, the queries are exponential on a network level, as each node is querying its known hosts. The NodeQuality interval should be a balance between maintaining an accurate reflection of the hosts and minimising network flooding.

Standard deviation of node types in three different test cases for two nodes chosen at random. Z-statistic is provided to compare distribution across the nodes.
Test	S.D. Node A	S.D. Node B	Z-statistic
No randomised acceptance, uniform nodes	1.0897	0.9682	0.4797
Random acceptance, uniform nodes	0.6614	0.9682	0.8472
Asymmetric nodes	0.9990	0.8291	1.0640

In order to test the diversity mechanisms we generated several networks of Butter nodes at random using the testbed. The data from this experiment can be found in Table 3.2. The standard deviation is a measure of distribution of host types. If the standard deviation is small, then we roughly have the same amount of each host type and the local diversity is relatively high. The z-statistic measure is used to compare the distribution between nodes.

In the first instance of the test, nodes were spawned with the same allocated memory in quick succession, resulting in what we would assume to be similar host types. The mechanism that occasionally accepts a node at random regardless of host type was disabled. In this test, we observed slightly higher standard deviations of the host type distribution in comparisons to the other test environments. This suggests in a network with many uniform nodes, maintaining a diverse set of known hosts is more difficult. Note, that in all three testes, the difference in distribution of known hosts between the two nodes selected at random is fairly small suggesting there us little difference between the two distribution diversities. Once we introduce, the random acceptance mechanism, the standard deviation decreases suggesting there is greater diversity in the maintained list of known hosts.

In future it may be interesting to test how the network diversity mechanisms perform in prescribed topologies. Integrating the Butter testbed with tools such as NetworkX may allows the exploration of edge cases and provide a visual testing environment.

A last factor to consider is that NodeQuality information becomes more quickly out-of-date on higher churn networks. If nodes are frequently dying and the interval is large, there is a higher probability that hosts in the list are unavailable. Currently the NodeQuality update interval is a parameter that is user-specified. In a high-churn simulated environment (see Section 3.5.3.1) we saw significantly fewer failed requests when the update interval was 10 seconds as apposed to 30 seconds. An interesting future improvement might be to implement a dynamic update interval relying on some churn rate detection mechanism.

Finally, in future, more metadata could be added to the NodeQuality to broaden the diversity metric. For example, it may be interesting to give a sense of the information hosted by a peer, as having diverse knowledge between known hosts may increase the probability of quickly finding some piece of information.

Wider discovery

For peers to communicate over the internet they need to be accessible publicly, so that others can query them, i.e. so that they can serve requests. Once available publicly, they also need to be known by other peers on different subnetworks. This module was designed to address both those problems, referred to here as NAT traversal and Internet discovery respectively.

NAT

Network Address Translation (NAT) is the process of translating an IP address so that it makes sense from one subnetwork to another. It is necessary, as there are not enough addresses, when using IPv4, to uniquely identify every device on a large network. Instead, devices on a subnetwork lie behind a router which acts as an endpoint, routing packets to the appropriate device on the local area network. This solution makes it difficult for peers to ‘listen’ to incoming connections behind a router.

When making a request a temporary port is opened in the router enabling the server to communicate with the device making the request. However, if a peer wishes to communicate with a peer on another subnetwork, i.e. it wishes to be served by another peer, it cannot uniquely identify that node as all it knows is the IP address of the subnetwork (i.e. that of the router), not that of the individual machine that could serve the request. A solution to this is provided by the IPv6 protocol, which introduces a significantly larger namespace, enabling unique identification of all internet connected devices. However, for security reasons many Internet Service Providers have not enabled IPv6 and hence the technology is not yet ubiquitous. The process of establishing and maintaining connections across gateways that implement network address translation is called NAT traversal, and it is requirement for peers to be able to serve each other.

Internet discovery

Consider the problem of locating a service. In a local area network, a process can simply broadcast a message to every machine, asking if it is running the service it needs. This may be inefficient but LAN links enable this behaviour. Only the machines that are running service respond, each providing its network address in the reply message. Such a location scheme is not possible in a wider network such as the internet. Instead, special location services need to be designed.

Related work

NAT traversal

Firstly, you can avoid the need for NAT traversal entirely by port forwarding, i.e. allocating a router port that directs incoming router requests to the desired machine. Alternatively, there are three main approaches to NAT traversal frequently used in peer-to-peer systems. UPnP is a protocol that requires software support from the router and essentially automates the port forwarding configuration process. However, this protocol is not supported by all routers and is often disabled by default for security concerns. Another approach is STUN, which requires a publicly available server that detects the presence of NAT and attempts to determine the local IP address of the machine behind the router. The final technique is Hole punching which requires a third public computer to communicate between the two peers behind NAT. Hole punching uses a server to create a communication route between peer.

It is important to note that certain firewalls may prevent the technologies from working so some peer-to-peer systems, such as libp2p or BitTorrent attempt various techniques simultaneously depending on what works and is available in the specific instance of the communication between peers.

Internet discovery

Internet discovery can be approached in several ways depending on the architecture of the peer-to-peer system. In structured peer-to-peer network discovery comes about by providing bootstrapping to the network, i.e. joining the network and enabling the network to restructure itself with the existence of the newly joined node. libp2p achieves this in its kad-dht module, where once a node is bootstrapped it can be found according the protocols of the Kadmilia distributed hash table.

In unstructured peer-to-peer networks the problem is significantly harder to solve. Gnutella achieves internet discovery by providing a list of well known highly available nodes which can act as rendezvous servers and enable peers to discover others across subnetworks. While this technique works, it not fully decentralised and hence can be prone to failure.

Design & implementation

NAT traversal

Butter does not implement a solution to NAT traversal yet. Instead, users are expected to port forward to make themselves publicly visible to others. While this is not an ideal solution and requires users to have a certain level of technical literacy to manually configure their router, it does mitigate the need for NAT traversal. As discussed in the related work, there are several other possible techniques all of which have their drawbacks.

A possible implementation using UPnP was considered, however, the protocol is unsupported by some routers and disabled by default on most. In addition, it is a difficult protocol to work with and Go currently does not provide libraries to handle the complexity, resulting in the feature being disregarded.

Internet discovery

For internet discovery, Butter introduces Ambassadors which are similar to rendezvous servers but community driven. Essentially, they are peers like any other with appended functionality which enables them to act as meeting points between peers. They can introduce peers to each other, and hence help to propagate connections between subnetworks. As a user, when spawning a Butter node, you can specify if you want your node to be an Ambassador, on the condition it is accessible publicly. As an Ambassador, a node appends a flag to its host quality metric metadata. This enables its peers to know it is an Ambassador.

Evaluation

Port forwarding in an imperfect solution that relies on users having to configure their routers to make themselves publicly available. This is far from an ideal solution as it introduces a certain level of required technical literacy to participating in the network. Other techniques exist but they also have their flaws. A better approached could be achieved with IPv6, however for the moment we are dependent on Internet Service Providers enabling IPv6 support on their networks.

Testing was difficult for this module as it would have required simulating subnetworks and routers. In the future, further testing will be needed to better evaluate the solution to Internet discovery. It may be interesting to explore simulating subnetworks as an extension to the Butter testbed.

Ambassadors are probably one of the weakest parts of the framework’s design as they introduce some form of centralisation. There is going to be a need for at least one known endpoint for two subnetworks to be bridged by an Ambassador. Once a single bridge is made, then other nodes can learn about and communicate with other publicly available nodes, however, a first bridge still needs to be made. Future version of Butter will seek to provide a more decentralised approach.

Persistent storage

A fault-tolerant decentralised design can be beneficial with regards to data availability in a service where information needs to be stored. By introducing data maintenance mechanisms we can make information persist beyond an instance of a specific node. Trivially, if there is no possibility of node or link failure, a node can simply transfer the information it hosts to another node before gracefully exiting the network. However, if we introduce the possibility of failure, maintaining a high probability of data retention becomes significantly more challenging.

An obvious solution to making data persist despite failure is to introduce a certain level of information redundancy on the network. However, efficiently managing redundant copies of information is a non-trivial challenge. If the network has a high churn rate, i.e. a high turnover of nodes either gracefully or ungracefully (by failure) leaving the network, this problem becomes highly relevant.

In this module of Butter, an overlay network was designed based on the premise of Peer Content Groups (PCGs). The original design was first modified to remove reliance on structured network elements, and here we suggest further extensions to the protocols, improving the performance of the data retention mechanism while maintaining a decentralised design and usable levels of efficiency.

Related work

In this section we will explore some of the existing approaches and technologies that enable persistent storage of information on a peer-to-peer network.

Peer Content Groups (PCGs)

Peer Content Groups provide an intuitive framework for reasoning about persistent information on a network. Instead of thinking about data in terms of individual nodes, we think about data being hosted by logical entities known as PCGs. The original premise for PCGs was to allow for transparent interaction with the network. So, if a peer fulfilling a request fails, the request can still be handled by other members of its group. This puts the responsibility of the quality of service on the peer-to-peer network, rather than on the peer making the request.

The protocol is as follows: when information is added to the network, a group is created, hence, each group maintains one data block on the network, replicating it across its members. A node can be a member of as many groups as it has the memory capacity to store. The network of groups is ‘overlayed’ on the network of Butter nodes. Members of the same group are not necessarily known hosts to each other, so, the PCG network may have different edges to the underlying known host network (see Figure [fig:overlayPCG]). Groups recruit new members through the use of advertisements, and a node may join a group by responding to an advertisement. Advertisements introduce the main limitation of the original PCG implementation. In the original PCG protocol there is the notion of super-peers which work as rendezvous points where group advertisements are publicised. This enables efficient communication across the network. However, this reliance on super-peers, re-introduces elements of centralisation.

Groups know to advertise for new members based on their Group status. Each node maintains its own group status by the using heartbeat pings. The heartbeat pings are used as eventually perfect peer failure detectors, i.e. oracles that eventually output an accurate representation of what nodes have failed in the group. When a heartbeat message is received from a node, it updates its localGroupView, i.e. what each group member node believes to be the group’s state. If a node does not receive a heartbeat ping within a given timeout period, the peer is removed from the node’s view of the group members. This process allows the group status to tend towards consensus.

If the group is in an arbitrarily defined ‘unsafe’ state, i.e. if the group is too low on members and hence the information it is responsible for is at relatively high risk of loss, a leader is elected to publish an advertisement at the rendezvous point. If available, a new node will join the group.

The group membership problem

As seen in Section 3.5.1.1, it is required that PCGs maintain some consensus on the group status, i.e. what group members are still alive and hosting the group information. This allows the data to remain highly available despite node failure. The generalisation of this problem is introduced in Riccardi’s paper as the Group Membership Problem (GMP). The GMP consists of two ideas: eventually perfect failure detection and consensus between non-faulty group members on current group member status.

There are several methods that can be employed to achieve consensus between group members and hence have an accurate group status. Here we discuss two possible approaches: Heartbeat protocols and Randomised gossiping protocols.

Design & implementation

Here we will discuss the design and implementation of Butter’s persistent information storage module based on PCG. Note that throughout this section we assume the use of reliable and ordered links as the implementation is built on top of the TCP protocol.

PCG

The Butter Persistent information storage module implements a modified version of the PCG protocol. This enables the persistence of information beyond specific node instances resulting in transparent content delivery despite high network churn.

PCGs are groups of network nodes, i.e. peers, that contain a copy of a piece of data. Groups improve the dependability of the system by maintaining data availability as long as at least one peer in the group remains fault-free. The group members are responsible for maintaining the integrity of the group. The integrity metric is defined as the number of non-faulty peers n over the desired replication constant r, i.e. how many nodes are hosting a replicated piece of information over how many nodes are expected to be hosting a replicated piece of information. Should n < r, then the elected leader will attempt to rectify the fault by recruiting new peers from its known hosts.

Group membership

As introduced in the GMP, maintaining consensus between peers in a group so that they collaborate to maintain information is one of the core problems to solve. The problem is particularly difficult to solve efficiently in unstructured networks.

Butter’s implementation uses a heartbeat protocol as it provides a simple solution to the two sub-problems in the Group Membership Problem, i.e. failure detection and consensus. However, heartbeat protocols have the primary disadvantage of producing message complexities of O(n²) which makes them unsuitable for large group sizes. With Butter’s PCG implementation, however, the default group size is relatively small (r = 3), reducing the issue of exponential message complexity.

As the primary focus of this design is to maximise availability, a probabilistic approach, such as using Gossip-based algorithms is less suitable (at least until further quantitative testing is carried out). The primary benefits of gossip-based approaches, over heartbeats, can be seen when group sizes are much larger making the cost of maintaining consensus between group members impractical. But gossip-based approaches introduce non-optimal probabilistic confidence of consensus which leads to higher risk of information loss.

Another advantage of heartbeats is that they can be modified to provide faster detection of peer failure by changing the heartbeat interval. By changing the heartbeat interval and desired replication constant parameters, the Butter network can be adjusted to better reflect the operating environment. For example, on lower churn networks, it may be suitable to reduce the heartbeat interval and replication constant to decrease message complexity. So a heartbeat design can be adapted to better suit the specifics of the network by tuning the parameters. Each parameter can affect system performance on multiple metrics such as probability of information retention, mean time to detection and network usage.

If heartbeats, by chance, are synchronised, there can exist long periods of unknown where no members have an accurate representation of the group status. Butter mitigates this by introducing randomised ‘palpitations’. While the heartbeat interval is generally regular for all group members, occasionally a random extra heartbeat by a node is introduced resetting its start interval. This reduces the probability that all the heartbeats are synchronised, allowing a more continuous polling of the group status as the heartbeats are offset.

An illustrated example

Figure [fig:overlayPCG] illustrates an instance of a Butter network with a PCG overlay. In this example we see an underlying LAN network where the edges represent physical or local WIFI connections. The directed edges at the Butter level represent a node’s known hosts and the edges in PCG layer represent group members.

The group of P₁, P₂ and P₃ are responsible for maintaining the information for the “Orange" webpage. In the case that r = 3 they are a complete group. The group P₂ and P₄ is responsible for “Strawberry" but is in a cold state. In a cold state, a leader is elected amongst the two nodes and his responsibility is to find a node in his known hosts that is able to participate in the group. In this case, say that P₄ is elected leader, he interacts with the underlying butter node B₄ and sees that he has available known hosts B₂ and B₅. Say B₅ is asked to join the group, if it has the available storage, it will join and complete the group.

Extra optimisation

A geo tag can be appended to each node’s known host quality metric so when a leader is elected to find a new peer to join the group (if the group is deemed to be in an unsafe state) it will favour picking nodes with different geo tags. This attempts to maximise the probability of redundant copies of information being distributed geographically, resulting in less shared infrastructure and improved information retrieval by reducing the average latency and steps taken to discover data.

Testing & evaluation

In this section we will discuss how the design was tested as well as the different relationships between parameters such as heartbeat intervals and replication count. Based on the tests we evaluate the design and discuss some of the benefits and shortfalls of the implementation as well as what could be improved in future iterations.

Methodology

Results & evaluation

Experimental data from test rig. Tested on 100 simulated nodes, repeated 5 times at each heartbeat interval setting. **Note**: Messages sent is the cumulative amount of heartbeat messages sent between group participants over a 60s churn time, chanceToDie was set to 1 in 50 across all heartbeat intervals.
Heartbeat interval (s)	Nb. messages sent	(%) Success rate
10	600	58.00
5	1200	60.00
2	3000	72.00
1	6000	74.00

There were some initial issues with testing due to the speed of churn in simulation relative to the speed at which the nodes were carrying out heartbeats, i.e. the simulated churn rate was extremely high and heartbeat intervals too far apart. This was resolved by changing the heartbeat interval from 10s to 2s. This does increase the message complexity of the network greatly and so in practice the parameter should be considered carefully, based on the specifics of the network.

To demonstrate how message complexity scales with different heartbeat intervals and how heartbeat intervals influence information retention we can observe the data in Table 3.3. For a short interval t_h (e.g. 1 or 2 seconds), the probability of all peers maintaining some data X failing in between t_h + t_r where t_r is the time to recover to n = r, is very low. However, this requires the node to constantly flood the group with heartbeat queries and hence scales poorly.

When the initial tests were carried out, success rate was lower than expected. This turned out to be because heartbeat intervals were all in sync and hence inspired the design for heartbeat palpitations. Once palpitations were introduced and hence the intervals offset, success rate improved greatly.

Experimental data collected from testbed. Tested on 250 simulated nodes, repeated 5 times at each chanceToDie; average is rounded to the closest node. **Note**: the chanceToDie is per second, the network churn time was 30s and the heartbeats were set to every 2s.
chanceToDie	Nb. failed to retrieve	(%) Success rate
1 in 50	63	74.80
1 in 100	29	88.40
1 in 1000	4	98.40
1 in 10,000	0	100.00

In an effort to help interpret the data in Table 3.4, think about the probability of an initially spawned node surviving churn when the chanceToDie is 1 in 50. In that case, every second, for 30 seconds, each of the 250 nodes has a 1 in 50 chance of dying. In other words, for every second there is a 49/50 chance of survival to the next second. The probability, therefore, of an initially spawned node surviving the churn stage of the testing process is $(\frac{49}{50})^{30}=0.545$. So, with a 1 in 50 chanceToDie, we can expect just above half the network to have failed over the course of the churn. With the persistent storage mechanism we have managed to retain on average 74.80% of the original information. While this is not a perfect solution, it shows a significant improvement in terms of availability. The tests stressed the importance of choosing an appropriate heartbeat interval.

Having carried out testing we can consider that the implemented solution has succeeded in providing a certain level of data persistence across a decentralised peer-to-peer network. Despite the limitations of the testbed, Table 3.4 shows that a significant amount of information that would otherwise have been lost, if no mechanisms for data persistence existed, was still present in the network after a period of relatively high simulated network churn.

An interesting future development might be to introduce dynamic group sizes. The groups and hence redundant copies of the data could dynamically grow with the popularity, i.e. frequency of access of a piece of information. This may improve availability for popular information, but more importantly, it would spread the load of information requests ensuring that average file download latency does not increase significantly for highly desired information. In addition it could increase the probability of QUERYHITs in information retrieval. This will be further discussed in Section 3.6.

Finally, PCG provides an elegant way of reasoning about persistent information on the network. The extension of PCG developed here improves the fault-tolerance and scalability of the original design by taking away some of the aspects that introduce centralisation, i.e. the publishing rendezvous super-peers. However, there is still a significant drawback in message complexity that will need to be addressed for networks at scale. A better future implementation may involve implementing randomised gossiping to solve the GMP, however, more testing will be needed.

Information retrieval

Information Retrieval (IR) across an unstructured network is a graph search problem. Distributed search algorithms are designed to successfully locate resources while incurring low overhead (in time, space and message complexity).

In structured networks, the network topology can be exploited to increase search efficiency, this is not possible in an unstructured topology. Hence, efficiently retrieving information is non-trivial in an unstructured network.

In short, the question can be posed as: how to find information quickly and efficiently while relying on a partial view of the network (the limited information available to a node) and not on a central repository of global knowledge. Solutions to this problem can be divided into two primary categories: blind, and informed.

Related work

In this section we present several techniques for information retrieval of a specified document on an unstructured peer-to-peer network. Typically flooding, random walks or expanding-rings are used to retrieve information stored by peers. Each peer visited will evaluate the query locally on its own content, and will support complex queries.

Gnutella ‘naive’ breadth-first search

BFS is the technique that was originally used in the Gnutella network and is illustrated in Figure [fig:gnutella]. The BFS search protocol in a peer-to-peer network is as follows: a querying node generates a QUERY message which is propagated to all its known hosts. When a peer receives a QUERY request, it searches its local storage for a match. If no match is found, it forwards the QUERY to all of its known hosts (other than the sender). If some node q receives the QUERY and has a match in its local storage, q generates a QUERYHIT message to transmit the document. QUERYHIT messages are sent along the same path that carried the QUERY message.

Extension to BFS with time-to-live

The original Gnutella implementation of BFS sacrifices performance and network utilisation for the sake of simplicity. Each query consumes excessive network and processing resources because a query is propagated along all links (including nodes with high-latency). This means a low bandwidth node can easily become a bottleneck. This became apparent as the Gnutella network became more popular and the search mechanisms failed to scale. One technique to avoid overusing network bandwidth is to associate each query with a time-to-live (TTL) parameter. The TTL parameter determines the maximum number of hops that a given query should be forwarded. In a typical value for the TTL is usually 7 and is decremented each time the query is forwarded. When the TTL becomes 0, the message is dropped. Note, some existing documents may not be located due to limited TTL.

Random Breadth-first search technique

Kalogeraki and Gunopulos propose an alternative to BFS, namely Random breadth-first search (RBFS) and evaluate its performance. They state that it can be a “dramatic improvement over the ‘naive’ BFS approach". In RBFS, a peer forwards a QUERY to a random subset of its known hosts. The size of the subset can be a user specified parameter. The researchers suggest half of the peers (0.5) as in testing it proved the best compromise between IR success and message generation. RBFS, like the BFS mechanism, allows nodes to make local decisions quickly since they only needs to select a portion of their known hosts. However, this algorithm is probabilistic. It is possible that some large segments of the network may be unreachable because a node was unable to understand that a particular link would lead the query to a large segment of the network.

Directed BFS with ‘Most Results in Past’ heuristic

In a directed BFS approach, each node forwards a query to a subset of its peers based on some heuristic. This technique was originally proposed by Yang et al. They compared a number of query routing heuristics and found that the ‘Most Results in Past’ ( > RES) heuristic was the most ‘satisfactory’. In other words, more documents were found in less steps using this heuristic. In > RES a peer q forwards a search message to n peers which returned the most results for the last k queries. In their experiments they chose n = 1 and k = 10 effectively turning Directed BFS into a directed Depth-first-search (DFS) approach. This technique may perform well because it routes queries to the larger network segments (which subsequently may also contain more relevant answers).

Intelligent Search Mechanism

The Intelligent Search Mechanism (ISM) is a more sophisticated approach to Directed BFS with ‘Most Results in Past’ heuristic. In an effort to minimise the number of queries made and improve search performance a peer estimates, for each query, which of its peers are more likely to reply to the query, and propagates the query message to those peers only. The mechanism consists of two components: a profiling mechanism used by a node to build a profile of its known hosts (neighbouring peers) and a relevance rank that uses the peer’s profiles to select the neighbours that will lead a query to the most relevant answers.

Gossiping to replicate global state

A different approach to those previously discussed is, instead of searching based on a partial view of the network, to maintain a global state in a decentralised fashion using randomised gossipping. Cuenca-Acuna and Nguyen suggest an approach to constructing a content addressable publish/subscribe service that uses gossiping of global state across unstructured communities. Their framework is based on a global inverted index, i.e. a globally maintained map of content to its location. The inverted index is partially constructed by each node n_k, specifically, n_k constructs a bloom filter b_k, of its local index and propagates it to the rest of the network using gossiping. A bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, in this case it is an efficient way of representing if n_k hosts the information. While this is an interesting approach, the paper does state that the technique does not scale and maintaining a global state across nodes may not be suitable for large network.

Design & implementation

Gnutella showed us that a ‘naive’ BFS is not a practical solution, as it floods the network with queries and so scales poorly for large networks. It is specially impractical as without a TTL there is no upper-bound to the breadth of search for a query. Each search query requires excessive network use and processing resources as a query is propagated along all links (including high latency links).

The design and implementation of the Butter Information retrieval (IR) module is largely inspired by the work carried out on later version of the Gnutella project (discussed in Chapter 2). We did not implement a semi-centralised peer-to-peer architecture such as BitTorrent as by reintroducing centralised elements, in an effort to increase search efficiency, the resulting network is less fault-tolerant and hence less dependable. On the other hand, Butter’s IR module needs to retrieve information efficiently in regard to time and message complexity to make application services built using the framework usable.

Butter implements a RBFS technique with a user specified TTL (per query). Having a user specified TTL allows the user to define how far they are willing to go to fetch a piece of information. Users also have the option to use a BFS mechanisms if the information is not found using the default RBFS mechanisms.

The proportion of the known hosts selected can be user specified. If unspecified it defaults to 0.5. An example successful run is illustrated in Figure [fig:succesfulRBFS].

Butter stores information in a ‘chunk’ data structure. which contains a 4kb array for information, some optional keywords to associate with that information, and a part/totalParts number. If the data is smaller that 4kb then it is stored in a single chunk, else, when data is added to the network, it is broken down into several ‘chunks’ with the appropriate part number.

Information is uniquely identified by the hash of the whole information and a chunk number. Initially a query search is based on the hashed information regardless of the chunk number, simply finding the first hash match. Once an initial hit is made, the querying node learns about the number of chunks and hence generates queries for all the remaining parts. This enables the rest of the search to be carried out in parallel and improves retrieval speed.

Testing & Evaluation

Using the tested, we spawned nodes and stored random information strings in each of the nodes. We stored the information identifiers in a global database and then spawned another node to systematically query each of the spawned nodes. We measured the average number of messages generated, documents found as well as the average time taken for the ‘naive’ BFS and RBFS with TTL search mechanisms.

Comparing information retrieval techniques. Carried out for 10 queries on 100 simulated nodes.
Technique	Avg number of messages	(%) Documents found	Avg time taken (seconds)
BFS	587	100	10.1
RBFS	235	68	5.4

Table 3.5 shows a clear improvement in message complexity using a RBFS approach over BFS. However, the % of successfully retrieved documents is significantly lower. There is a compromise to be made between speed, message complexity and success rate. A failed scenario of RBFS is depicted in Figure [fig:failedRBFS]. The value of the proportion of known hosts selected at random could be tweaked from the default 0.5 to other values to improve the success rate.

Two features mentioned in the Persistent storage section could have an impact on IR performance. We mention a node geotag enabling information to be spread geographically which in turn would improve resilience. This may also help minimise latency and hence improves search time. In addition, dynamic group sizes based on the perceived popularity of a piece information, i.e. information that is frequently queried, may increase the probability of a hit and spreads the load across more nodes, however, this would have to be further tested.

In their work Zeinalipour and Gunopulos have shown, we can greatly improve the % successful document retrieval while maintaining similarly efficient message complexity to RBFS by using an informed approach. RBFS does not use any explicit technique to guide the search query to the most relevant content, which is a desirable property in information retrieval. A future implementation will look at implementing an informed approach; most probably the Intelligent Search Mechanism.

Case studies

While so far we have explored the design and implementation of the framework, it is important to also explore how the framework will be used to develop decentralised applications. Three example decentralised applications were implemented alongside Butter to test features and the development experience. One of the simplest examples was a reverse echo application where a user submits a string to the service, and it returns a reverse of the string, the processing happening elsewhere on the network. While this is not an interesting application, it provided grounds for testing. Two significantly more compelling case studies were explored: a decentralised chat application and wiki application. The chat and wiki application demonstrate the advantages of decentralisation for communication and high availability information storage respectively.

Decentralised chat applications are a fascinating service case study as decentralised communication goes back to the original roots of the Internet’s inception. ARPNET the predecessor to the Internet, which laid the foundational groundwork for the technology, was conceived as a “computer communications system without a central core, with no headquarters or base of operations that could not be attacked and destroyed by enemies thus blacking out the entire network in one fell swoop". To contextualise, this research was carried out at the height of the cold war. In addition, it seems intuitive to design communication systems in a decentralised way. As humans, this is typically how we navigate and communicate in our daily lives. However, as systems grow in user-base, it becomes natural to employ structured mechanisms to manage communication complexity.

In recent decades we have seen the rise of cloud services, i.e. services provided by third-parties. Since this shift, we have blindly moved our habits to these services, for practicality as they are presented as highly reliable and dependable. Many of these services become viable businesses by either asking for user subscriptions in exchange for use or more commonly monetising their service through advertisement revenue. The resulting services, are often propriety, centralised and out of the user’s control. This poses an interesting question: should we allow ourselves to become highly dependent on cloud services, specially for communication services that are so fundamental to our daily lives. If we can trust the cloud service providers, the problem is mute, if not, this model of service delivery cannot be sustained.

Butter’s demo chat application allows direct peer-to-peer communication between peers on the same LAN. This application is still rudimentary and will be further extended to enable direct peer-to-peer communication across the internet once the Wider discovery module is further tested.

The other case study for a Butter application is a wiki. A wiki can be thought of as a, typically community information driven, encyclopedia service. It is a particularly pertinent case study for an application of a fault-tolerant decentralised information service. The information in a wiki tends to be provided by contributing members of the community. There is no single author and hence it can be difficult to attribute information to any single figurehead. This model of information service brings about some interesting questions: Who owns the information? Who is responsible for the information and the consequences of its dissemination? Who maintains and hosts the service? In certain scenarios, the service can be of immense value to its users and society more broadly, so, it is important that the service be dependable and highly available. In addition, ubiquitous access to information may be desirable for a service, so it needs to be designed to be devoid of central control and censorship.

Some wikis are hosted and maintained by purely altruistic organisations and individuals. However, this is not always the case, and in some cases it is wise to consider whether we should trust private third-parties to have the interests of the community at heart. This problem could be avoided if the service were delivered by an autonomous decentralised network where, like the information, the infrastructure would be community contributed as well.

In Butter’s wiki demo, the application behaves similarly to a typical wiki. Users can publish and retrieve informative articles, but it is significant to note that there is no central server, the information is not stored in a central index or hosted by a single third-party. Each node instance running the service has never explicitly been made aware of others, yet the service is still delivered. Nodes work together to maintain the information using the PCG mechanisms and the Information retrieval module handles decentralised search across the network removing the need for a central database. This has the effect of creating an autonomous service.

Having said that, there are still severe limitations with Butter and decentralised service delivery more broadly. Something that has been made clear throughout the literature is that the problem of scalable decentralised services has yet to be solved. In addition, it is important to note that not all services should or need to be autonomously delivered. There is a case that services involving personal information should not be autonomous.

Take the example of a blog or social media sharing platform. These platforms may deal with personal information which should be considered with caution. Butter is not yet equipped to deal with personal information. Currently, there are no inbuilt encryption mechanisms and while the autonomy means information is devoid of malicious control it makes controlling personal information on the network difficult. This is further discussed in the legal, social and ethical considerations section (see 5.4) of the project.

Conclusion

Summary

This report has introduced some core concepts in distributed architectures and how they can give rise to decentralised systems. We have described two common types of peer-to-peer architectures: unstructured and structured, and discussed their benefits and limitations. Furthermore, we introduced the motivations and major contributions of this work. In Chapter 2 we gave an overview of several significant projects both past and present focused on building peer-to-peer systems. Aspects of the project’s management were discussed in the ensuing chapter where we briefly justify the decision to gravitate towards an incremental development model and present some turning points in the project’s development.

The technical detail for the project is contained within Chapter 3. There, each module of the framework was described and evaluated in detail. The core problems presented are: local peer discovery, NAT traversal and internet discovery, peer selection, persistent information storage and information retrieval. Each module of the framework attempts to provide a decentralised solution to one or more of these problems. The report finishes by discussing some case studies that present interesting arguments and open a debate about decentralised systems and the resulting autonomy of the services they provide.

Butter vs. libp2p

It may be interesting to briefly look at how Butter compares to its closest neighbour: libp2p.

Having initially been sceptical of libp2p, I have a much better appreciation for its design having built a peer-to-peer framework myself. The concept of multiadddresses is a particularly ingenious way of creating generalised nodes which are capable of providing infrastructure to deliver many types of services. In addition, libp2p’s focus on modular design enables developers to use it for specific problems they encounter without needing to convert their entire existing system to the framework. libp2p’s specification is implemented in JavaScript, Go and Rust with future plans for Python, Java and Haskell which should help make it a ubiquitous solution to designing peer-to-peer systems regardless of the language of choice.

Furthermore, another particularly good feature of libp2p, is that it is independent of any specific transport layer, making it more flexible and adaptable to different use cases. In addition, the project’s focus on high quality documentation and promoting research into p2p system has been a valuable contribution the field.

Future work

While we have mentioned possible improvements to the individual Butter modules in their respective sections we have not yet discussed broader future directions for the project. It is my full intention to continue to research and improve the Butter framework.

Since the project’s first open source publication, several developers have been in contact discussing possibilities to collaborate and exchange ideas. This is tremendously exciting and there is a lot of work to be done as the problem of scalable decentralised systems has yet to be solved. In addition, I have been in contact with the developers at Protocol labs (libp2p) and the Maidsafe project (SAFE network) discussing future roles and possibilities for contribution.

One of the first improvements to make would be to develop a significantly more robust testbed. One of the main limiting factors in progress and research for the project was testing, so improved testing utilities would be beneficial. It would be particularly interesting to expand on the work of Zeinalipour on PeerWare. PeerWare is a testbed for information retrieval on peer-to-peer networks that enables specific instances of network topologies. Adding this functionality to the Butter testbed would enable better exploration of edge cases particularly for the Known host management module.

Another interesting direction for the project would be to integrate a Butter node with a browser; similar to what is being done by the Beaker Browser project. This has been an initial long-term goal for the project as it provides an elegant means of inciting contribution to the network. This design would support a future Internet consumption model where users are expected to contribute resources as they consume services available on the Internet. The effect would be similar to that of the ‘tit-for-tat’ BitTorrent protocol were once you download a file you are expected to host it for others (seeding).

Throughout the project, Adam Chester and I have discussed the possibility of collaborating on a paper. The intention would be to provide a summative review of the various problems and protocols present in unstructured peer-to-peer architectures. Furthermore, some more work needs to be carried out on the Peer selection problem which is not widely been researched thus far as most research is focused on structured approaches to generate network topologies. In addition, further extensions to the PCG mechanisms will be required before the network could be widely deployed (see section 5.4).

Legal, social and ethical considerations

Peer-to-peer systems are both a technical and social phenomenon. As Gnutella has shown, they can be associated with a host of legal and ethical considerations. Notably, peer-to-peer systems have been notoriously used to share information protected by copyright law. This was both the case on the Gnutella network and on some BitTorrent networks.

Allusions have already been made in Section 4 to the limitations of decentralised and autonomous networks in regard to information management. Focusing specifically on Butter, there is currently no option to update or remove information from the network. The PCG mechanism can, in its current form, only store information, and thus far has been entirely focused on information persistence.

The ‘right to be forgotten’ and the right to update and correct information cannot be met in the current implementation of Butter. It will be necessary, before Butter can be put into practical use, to extend PCG to enable update and delete operations as we have a legal and ethical responsibility to design systems that allow incorrect information to be modified and comply with the ‘right to be forgotten’.

Author’s assessment of the project

This project has been fascinating, and I am pleased with my new-found knowledge and proud of the resulting work. I distinctly remember how it exciting it was when the peer discovery mechanisms first started working, being able to see two entirely independent computers become aware of each other’s existence. I am particularly pleased with the work on the persistent information storage mechanisms and find the idea of autonomous networks enticing.

This project has allowed me to gain a significant level of expertise in the field of decentralised peer-to-peer systems, and it has without doubt become one of my passions. The framework is open source and is my first project to gain traction in the open-source community. I have had several individuals approach me with a desire for collaboration and have even received a job offer based on the work carried out for the project.

Original objectives

For the readers convenience, here bellow lies the objectives from original project specification.

Functional objective

Non-functional objectives

Possible extensions

A possible extension would be to achieve a use case, where a user using a decentralised application, in exchange, contributes resources to the decentralised network similar to how ’torrenting’ or the Beaker Browser work. As a user views and downloads files, they also host them for others on the network.

Out of scope

Enabling the network to connect to peers over the Internet, while being exciting, is out of scope for the project. The Internet adds a layer of complexity as peer discovery is problematic on a network as extensive as the Internet. Discovering peers would require Network Address Translation, moving from one sub-network to another, which is challenging to do in a decentralised way.

Original timeline

Revised timeline

Here bellow is an accurate timeline of the progress of the project. This timeline, taken from the project presentation slides, gives a loose overview of the key tasks carried out during the project’s development.

Introduction

Background

Distributed systems & architectures

Centralisation & decentralisation

Overlay networks

Peer-to-peer systems

Types of peer-to-peer architectures

Motivation & Goal

Contribution: Butter

Problem brief

Outline

Related projects

Gnutella

JXTA

BitTorrent

libp2p

Building Butter

Testbed

Limitations & challenges

Discovery

Related work

Broadcasting

Multicasting

Multicast DNS

Design & implementation

Testing & Evaluation

Known host management

Related work

Gnutella group membership messages

JXTA peer information protocol

Design & implementation

Known host list maintenance

Peer selection

Testing & evaluation

Wider discovery

NAT

Internet discovery

Related work

NAT traversal

Internet discovery

Design & implementation

NAT traversal

Internet discovery

Evaluation

Persistent storage

Related work

Peer Content Groups (PCGs)

The group membership problem

Design & implementation

PCG

Group membership

An illustrated example

Extra optimisation

Testing & evaluation

Methodology

Results & evaluation

Information retrieval

Related work

Gnutella ‘naive’ breadth-first search

Extension to BFS with time-to-live

Random Breadth-first search technique

Directed BFS with ‘Most Results in Past’ heuristic

Intelligent Search Mechanism

Gossiping to replicate global state

Design & implementation

Testing & Evaluation

Case studies

Conclusion

Summary

Butter vs. libp2p

Future work

Legal, social and ethical considerations

Author’s assessment of the project

Original objectives

Functional objective

Non-functional objectives

Possible extensions

Out of scope

Original timeline

Revised timeline