On Frameworks for Implementing Distributed Protocols

Sergey Fedorov - Aug 29 - - Dev Community

This post concludes the second phase of the state-of-the-art exploration in the scope of milestone M0.1 of the Replica_IO project, namely exploration of existing frameworks for implementing distributed protocols. It shares the main conclusions drawn from exploring 7 different frameworks.

The original post can be found here.

A companion video is available on YouTube:

Exploring Distributed Protocol Frameworks

Trying to make a real breakthrough, such as what the Replica_IO project aims at, it is important to learn from prior attempts to deal with the problem. Having explored how real-world code bases typically implement the core distributed protocols like consensus and having summarized the findings in the previous post, I continued the exploration by surveying existing attempts to find a better approach to the problem, i.e. different frameworks for implementing distributed protocols.

I looked at the documentation and examples provided by each of the frameworks, as well as into their implementation, in order to figure out how they model and structure distributed systems, what kind of notation is used to specify and implement distributed protocols in those frameworks, what is their approach to communication, concurrency, and composition of protocol components, and what they offer for ensuring the correctness of protocols and their implementations.

After having explored each of the frameworks, I summarized and shared some of my findings. You can find those overviews on this wiki page.

Here is the full list of 7 frameworks, based on different programming languages, that I explored1:

  • Mir — a framework for implementing, debugging, and analyzing distributed protocols (based on Go);
  • Atlas — a modular framework for building distributed mechanisms focused on configurability and performance (based on Rust);
  • Babel — a generic framework for implementing and executing distributed protocols (based on Java);
  • DiStaL — a framework for implementing and executing distributed protocols (based on Scala);
  • PSync — a framework for implementing and verifying fault-tolerant distributed protocols (based on Scala);
  • Disel — a framework for implementation and compositional machine-assisted verification of distributed systems and their clients (based on Coq);
  • Verdi — a framework for implementing and formally verifying distributed systems (based on Coq).

In the subsequent sections, I will share some of the observations and conclusions I made while exploring those frameworks. I decided to structure the discussion around the following aspects:

  • model: how distributed systems and their components are modeled;
  • structure: how distributed protocol components are structured and composed together;
  • notation: what kind of notation is used to specify and implement distributed protocols;
  • operation: how distributed protocol components are executed and interact with each other;
  • verification: how distributed protocols and their implementations are verified for correctness.

But before we go into details, I would like to note that most of those frameworks were purely academic efforts and almost all of them seem to be abandoned now; unfortunately, they didn't seem to have found practical use. Nevertheless, it was good to learn from them. Let's now dive in.

Model

The way that systems and their components are modeled has a profound effect on the structure and shape of their specifications and implementation, on the operational aspects and verification for correctness. We can consider different levels of abstraction when modeling distributed systems: the high-level model of the system as a whole, the model of individual nodes within the system, as well as individual protocols and components within the nodes. Let's take a look at how the explored frameworks model distributed systems and their components.

The common approach is to model distributed systems at high level as state transition systems where the global system state changes upon external and internal events, such as client requests, messages exchanged within the system, and timeouts. In this model, the global system state includes the states of all individual nodes and their components, as well as the environment state. The environment state usually includes the state of the network the nodes communicate through, in particular the messages in transit. Transitioning from one state to another happens according to a global transition function triggered by events. The transition function receives the current state of the system together with the triggering event and returns the new system state.

The explored frameworks follow the message-passing approach, where the states of individual nodes and protocol components are disjoint, i.e. different parts do not share pieces of state. Interaction between nodes happens through the network by sending and receiving messages. Sending and receiving of messages are modeled as events modifying the global system state by updating the set of messages in transit in the network state and, in case of message receiving, the state of the target component in the destination node. For example, in Disel, the system state includes a "message soup" for each protocol, which models the current state and the history of the network. In Disel's abstract model, messages are never removed from the message soup, instead they are marked either as active or consumed.

Nodes and network failures are modeled as special events, e.g. dropping or duplicating messages in the network, disabling normal event handling in faulty nodes or (partially) resetting their state. Some of the frameworks only consider crash faults, leaving Byzantine faults for the future work. The system models in most of the frameworks do not seem to include timing assumption, so they can be considered asynchronous. In contrast, PSync employs the Heard-Of model based on communication-closed rounds, which provides an illusion of simple synchronous communication on top of the partial synchrony of the actual underlying network. In this model, protocol execution proceeds in explicit rounds, alternating communication with protocol state transition based on the set of messages received during the round. Network and node faults in this model are unified, and the network assumptions are specified in terms of the heard-of sets.

Communication between nodes is predominantly modeled as fire-and-forget message delivery, where messages can be reordered or dropped by the network abstraction. Though, in Verdi, protocols are first modeled with an idealistic, reliable network semantics, which can then be translated into weaker fault models using verified system transformers. Babel allows modeling protocols in stronger communication models using communication channel abstractions, which represent communication mechanisms with different properties and guarantees.

Individual components within nodes are commonly modeled as sequential, event-driven state machines interacting with each other via reliable message passing. The message passing normally follows the one-to-one one-way or request-response patterns; however, some of the frameworks also support one-to-many notifications between components. Some frameworks (e.g., Verdi) explicitly distinguish between the events that are external to the distributed protocol, like client requests, and internal events, like exchanging messages between protocol components and nodes within the distributed system.

To overcome the limitations of strict state separation between protocol components in the abstract model, Disel allows coupling protocols via inter-protocol behavioral dependencies, called send-hooks, which allow restricted logical access to other protocol's state. Disel doesn't seem to strictly follow the message-passing model for protocol components within the same node, supporting generic composition of protocol components. Disel also provides mechanisms to establish stronger properties of protocols and their combinations by strengthening them with additional inductive invariants.

As we can see, although there are some interesting variations and extensions, the underlying system model in the explored frameworks is largely the same, namely the one of a state transition system composed of components, which are sequential, event-driven state machines with disjoint state, interacting via message passing. This is remarkably similar to how distributed protocols are usually implemented in real-world code bases, which do not use any framework, as I described in the previous post. This approach tends to shift the focus more to the operational rather than functional and logical aspects of the system. There I also pointed out, when discussing how protocol implementations attempt to evade concurrency, that this approach may complicate the implementation and cause fragmentation of the protocol logic. Perhaps, adopting the same kind of the underlying model is one of the reasons why the explored frameworks do not seem to have made a real breakthrough in designing and implementing distributed protocols.

Structure

In all of the explored frameworks, protocol implementations rely on some kind of runtime or shim provided by the framework, hiding low-level concerns from implementation of the protocol logic. In most of the frameworks, the runtime is responsible for coordinating the execution of protocol components and interaction between them; some of the frameworks also provide there dedicated interfaces for communication over the network and setting timers. Protocol components normally need to be registered within the runtime before they can function within the system.

In most cases, protocol components within the same node can interact by simply sending some kind of internal messages to each other, without establishing any explicit connection. In contrast, components in Atlas require explicit orchestration of the interaction with the rest of the system. Disel instead supports composition of components expressed as effectful functional programs; it also allows loosely coupling protocol components via inter-protocol behavioral dependencies, called send-hooks, at the formal specification level.

Since the runtime in Mir is only responsible for coordinating the execution of protocol components and interaction between them, there are special components that provide functionality for sending and receiving messages over the network, as well as for setting local timers and for cryptographic operations. Mir explicitly distinguishes between passive components, which can only produce events synchronously, as a result of processing input events provided by the runtime, and active components, which can asynchronously produce new events on their own.

There are frameworks that provide means for enhancing protocol components with additional properties. Disel provides a protocol combinator ProtocolWithIndInv, which allows elaborating a protocol by strengthening it with additional inductive invariants. Verdi provides verified system transformers, which allow transforming protocols specified, implemented, and verified in an idealistic, reliable network semantics into an equivalent implementation, preserving the transformed system properties under a weaker fault model. PSync provides a class, which can be used to wrap protocol round instances to support updating progress conditions and synchronizing rounds in a Byzantine setting.

There are two main approaches to structure interaction of distributed protocols with the rest of the application. Some frameworks provide a mechanism for interacting with protocol components by sending and receiving messages, same as protocol components interact with each other. Other frameworks allow defining dedicated interfaces for that purpose, featuring callable methods or special IO events. In some of the frameworks, protocol components are supplied with some kind of handles to trigger side effects, such as sending messages or setting up timers; in other frameworks, side effects only happen after returning the control back to the runtime, e.g. through the return value or using a monadic structure.

Protocol components commonly consist of the component's state and protocol logic structured as handlers modifying the state or state transition functions, which are triggered by the runtime upon certain events or conditions. In Distal and Disel, the handlers/transitions can be augmented with guarding conditions that must hold in order to trigger the action.

The round-based model in PSync imposes a particular structure of protocol components. Protocol components in PSync must specify a sequence of protocol rounds. Each round, in general, consists of methods to: initialize the round, send messages at the beginning of the round, process received messages, and finally update the internal state before transitioning into the next round.

PSync and Disel explicitly separate the protocol specification from its implementation, whereby the specification is used to formally verify the implementation. Protocol specifications in PSync consist of protocol properties, as well as safety and liveness predicates (assumptions). In order to aid automated verification in PSync, the specification should also include round and/or phase invariants; round transition relations are automatically derived from the code, although this imposes certain limitations on the code. In Disel, high-level abstract protocol specifications are defined in terms of state-space coherence predicates and send/receive transitions.

The configuration of protocol components within nodes and their internal structure can be more static or dynamic. For example, in Babel, protocol components and various components' callbacks are registered within the runtime dynamically. The configuration of top-level components in Mir is rather static, for it cannot be changed after initialization, but there is a special component, called factory module, that supports creating sub-components dynamically. Such flexibility at runtime can make formal verification particularly hard, so the frameworks focused on formal verification (PSync, Disel, and Verdi) tend to be rather static with respect to the configuration and internal structure of protocol components.

Overall, the abstract model adopted by a framework, e.g. the model of a generic state transition system with message-passing or the Heard-Of model based on communication-closed rounds, largely determines how protocol specifications and implementations are structured within the framework. The framework's features like formal verification can add further restrictions, e.g. restricting runtime flexibility of the configuration and internal structure of protocol components.

Notation

All of the explored frameworks use the general-purpose programming language they are based on as the primary means for expressing protocol implementations. To enhance ergonomics and expressiveness, most of the frameworks introduce some notational extensions, e.g. elements of an embedded domain-specific language (eDSL).

Using a regular general-purpose programming language as the basis allows tapping directly into the comprehensive features of the language, such as the type system, polymorphism, inheritance, and metaprogramming. On the other hand, implementing protocol components in regular code can produce some undesirable effects such as nondeterminism, e.g. when iterating over built-in unordered data structures, which is problematic for reproducible simulation-based testing. For that reason, Mir provides some utility functions that should be used in place of usual idiomatic code to avoid that kind of nondeterminism. Rich expressiveness of general-purpose languages can also make automatic verification difficult, e.g. PSync is quite limited in what kind of Scala constructs it supports in automatic derivation of transition relations from the code.

In most cases, the notational extensions introduced by the frameworks serve to make the code more declarative and concise. One target of such enhancements is providing a convenient and clear way of expressing the typical event-oriented pseudo-code notation with upon statements. For example, the protocol logic in Distal is implemented by defining rules and the corresponding actions, whereby the rules are expressed in a declarative style resembling typical pseudo-code found in the literature and specify event predicates, such as the message type and matching conditions, as well as the means to specify composite events, which are triggered by collections of messages.

Another area of applying notational enhancements is for expressing certain actions performed by the protocol logic and overcoming the limitations of the host language. For example, Distal provides a special notation for sending messages, discarding received messages, as well as for scheduling actions to be executed in future. Disel extends Coq's specification language Gallina with effectful commands (actions), such as sending and receiving messages, reading from local state, monadic sequential composition, and general recursion. For message sending and receiving actions, Disel provides transition wrappers, which lift low-level operations on the networking layer to the level of well-typed program primitives corresponding to the protocol specification. Similarly, Verdi provides a monad for expressing the actions of sending messages, emitting output event, and manipulating the current state, as well as convenience notation for various monadic bindings.

Finally, expressing protocol properties, assumptions, and invariants, as well as the related annotations in the protocol implementation can also benefit from notational enhancements. PSync, for instance, defines a DSL for expressing properties, predicates and invariants, in which one of the main primitives is the notion of domain, representing a set of values of certain type with universal and existential quantification defined for it, as well as set comprehension. Disel features a special notation for representing the higher-order Hoare types of program fragments.

Many elements of a DSL can be implemented using common programming language techniques like polymorphism, inheritance, composition, higher-order functions, and the type system features. This way, Distal implements most of its DSL as ordinary methods and convenience aliases. However, this approach has certain limitations, and such techniques as metaprogramming (macros) or code generation are often required. For instance, in Distal, the key element of the DSL is implemented as a macro; in PSync, automatic derivation of transition relations from the code is also implemented as a macro. Code generation in Mir reduces the amount of hand-written boilerplate code by processing Protobuf definitions annotated with special extensions. Disel and Verdi take advantage of Coq's syntax extension features.

Clear and convenient notation for defining protocol specifications and their implementation is crucial for ergonomics and expressiveness. Building upon ordinary code, clever use of common programming language techniques, such as polymorphism, inheritance, composition, higher-order functions, and the type system features, should be the preferred approach for achieving notational expressiveness. Such techniques as metaprogramming and code generation can greatly help overcoming the limitations of that approach or further improving the notation, but they can make the framework more complicated and, therefore, should be employed judiciously.

Operation

In terms of operation, the core part in most of the frameworks is some kind of runtime or engine that orchestrates the execution of protocol components and their interaction. For concurrent execution of protocol components, the runtimes mostly rely on conventional concurrency mechanisms used in the corresponding language's ecosystem, such as goroutines in Go and execution pools in Java and Scala. Atlas, notably, is based on native OS threads rather than an async Rust runtime like Tokio, presumably due to a performance overhead of the latter.

There are two main approaches to implementing interaction based on message-passing between protocol components and with the networking layer: using a central event dispatching loop or through explicit channels established between components. Mir is a good example of the former approach, whereas Atlas follows the latter one. Since different protocol components normally operate asynchronously, there is often some kind of event queue or buffer placed between the components to accommodate for the asynchrony.

This raises an issue of preventing unbounded growth of those queues. Mir addresses this problem by temporarily blocking the influx of external events from active modules when the amount of events buffered in internal queues exceeds certain thresholds. Atlas relies on flow control provided by bounded buffered channels for communication between components.

In general, garbage collection issue is an important aspect of operation in distributed protocol implementations. Sometimes it can happen automatically, e.g. in PSync, received message sets are automatically discarded by the runtime upon transitioning into a new round. However, in some cases, it requires special care. For example, in Distal, protocol components should explicitly discard automatically buffered messages that become irrelevant for further execution of the protocol, in order to avoid unbounded growth of state and slowing down evaluation of rules. In Mir, there is an interesting pattern for performing garbage collection of internal component's state, whereby each disposable piece of state is assigned a numerical retention index, and the component removes the pieces of state whose retention index is below a specified value upon processing a dedicated garbage collection event emitted by another component.

Communication between nodes in most of the frameworks is implemented in a simplistic manner, providing best-effort message delivery, following the fire-and-forget communication style. Babel is a notable exception, since it introduces a notion of communication channel abstraction, where different channel types can represent different communication mechanisms with different properties and guarantees, e.g. more reliable message delivery, multiplexing connections, and φ-accrual failure detection.

So orchestrating the execution and coordinating interaction of protocol components often require some kind of runtime provided by the framework. The runtime should take care of asynchrony, garbage collection, communication, and coordination within the node, preventing unbounded growth of internal state and ensuring good performance.

Verification

Not all of the explored frameworks are concerned with verifying correctness of protocols and their implementations. Verification is the primary area of focus for Disel and Verdi, as well as PSync. Verification is not a major concern in Mir, though it provides some mechanisms that can be considered as lightweight verification of protocol implementation correctness.

Mir includes support for recording, inspecting, modifying, and replaying traces of events being passed by the node engine between the components, which can be very helpful for debugging, but also to perform correctness analysis. Mir also comes with a simple discrete-event simulation runtime that can be used for randomized reproducible testing with simulated time.

Full-fledged formal verification of protocol specifications and implementation in Disel and Verdi relies on machine-assisted theorem proving, whereas PSync attempts to automate the process. Theorem proving, even if machine-assisted, is a very difficult, time consuming task and requires special expertise. Automated verification, on the other hand, is in general undecidable and can only be achieved with certain restrictions to the system model and the protocol implementation.

Formal verification requires formal specification of the assumptions and the required properties. In Disel, protocol specifications are defined in terms of state-space coherence predicates and send/receive transitions. In Verdi, the correct behavior of the protocol is specified as a logical predicate over its possible traces of events. Formal specifications in PSync are expressed in terms of protocol properties, expressed in a fragment of a first-order temporal logic, as well as safety and liveness predicates (assumptions), expressed in terms of cardinalities of heard-of sets.

In these frameworks, the method of formally proving the correctness is typically by induction, constructing inductive state invariants. Coming up with appropriate inductive invariants constitutes the greatest difficulty in the verification effort and requires special skills.

In PSync, the simple round-based structure with lock-step semantics makes protocol implementation amenable to automated verification that can check safety and liveness properties. The verification, though, requires inductive invariants at the boundaries between rounds. However, the automated verification problem is decidable with certain constraints.

The correctness of protocol implementation in Verdi is proved directly, whereas Disel employs a different approach. For implementing protocol specification, Disel provides a DSL, embedded shallowly into Coq, that extends Coq's specification language Gallina. Disel programs and their fragments are assigned types corresponding to the protocol specification in higher-order separation-style Hoare logic. For message sending and receiving actions, Disel provides transition wrappers, which lift low-level operations on the networking layer to the level of well-typed program primitives corresponding to the protocol specification. Well-typed programs are guaranteed to be correct by construction w.r.t. the protocol specifications.

In Verdi, once the protocol is specified, implemented, and verified in an idealistic reliable network semantics, it can be translated using verified system transformers into an equivalent implementation, preserving the transformed system properties under a weaker fault model.

in Disel, protocol specifications can be generic and parameterized. The generic protocol specifications with their proven invariants can be used in composition. Disel provides mechanisms to establish stronger properties of generic protocols and their combinations by elaborating the protocol, i.e. strengthening it with additional inductive invariants.

Correctness is very important for the critical fault-tolerant protocols; therefore, the verification aspect deserves special attention. There are relatively easy-to-apply lightweight methods, such as randomized reproducible testing with discrete-event simulation, and there are sophisticated formal methods typically requiring special skills and expertise, such as machine-assisted theorem proving. However, some particularly structured system models may enable automated reasoning and make formal verification more practical, though at the cost of additional limitations.

Conclusions

Having explored these 7 frameworks for implementing distributed protocols, I found most of them not sufficiently developed for practical use, though some of the ideas and techniques employed there were worth exploring. Being purely academic efforts, most of the frameworks seem to have been abandoned after exploring some ideas and publishing the results. As of time of this writing, Mir is perhaps the most mature, well documented, and up-to-date one among these frameworks.

Focusing on some particular aspects of implementing distributed protocols, such as unifying and standardizing components, testing and debugging, notational convenience, or formal verification, while mostly neglecting the remaining aspects, may be appropriate for academic research, but this is certainly not sufficient for achieving practical adoption. Moreover, the programming languages that most of the frameworks are based on would make it hard to integrate the protocol implementations directly into larger code bases written in other languages, let alone that having to learn a less commonly used language like Scala or Coq is an additional obstacle for adoption. Please refer to the previous post about practical aspects of implementing distributed protocols.

Most important, perhaps, is that all of the frameworks seem to basically adopt the same model of a state transition system and trying to mimic the event-oriented notation as found in typical pseudo-code with upon statements in the literature on distributed protocols. This largely impacts how protocol implementations are expressed and structured in those frameworks.

I believe, if we want to make a real breakthrough in both designing and implementing distributed protocols, which is what the Replica_IO project is about, then we should first of all be innovative, challenging the status quo, and think holistically, taking into account all relevant aspects. Trying to rethink the conventional distributed system model and the way of expressing distributed protocols would be a good start.

Next Steps

Having explored some distributed protocol implementations and frameworks for implementing distributed protocols, now it is a good time to clearly state the problems in designing and implementing distributed protocols to focus on for the rest of milestone M0.1 and start gathering ideas on how we could approach them. Apart from that, there are still some potentially related concepts, approaches, and techniques worth looking into as part of the initial state-of-the-art exploration. The exploration tasks are tracked in the scope of this issue on GitHub.

Once the initial exploratory stage is over, it will be time to come up with key ideas concerning core principles that will guide the process of designing and implementing generic components within the framework (milestone M0.1). Then those ideas will be developed into clearly formulated concepts (milestone M0.2), their feasibility will be verified with code (milestone M0.3). After that, prototype, MVP, and production versions of the framework will be developed and released (milestones M1, M2, and M3).

It does not mean at all that exploration, ideation, and prototyping will not take place at later stages; the milestones simply define the framework's general level of maturity. The framework will continuously evolve and expand and at some point become a de facto standard for implementing critical fault-tolerant systems providing a growing collection of easy-to-use reliable and efficient distributed replication mechanisms.

If you like the project and find it valuable, please support its further development! 🙏
❤️ Replica_IO

If you have any thought you would like to share or any question regarding this post, please add a comment here. You are also welcome to start a new discussion or chime in to our Discord server.


  1. If you know of some other framework for implementing distributed protocols that I should have looked into, please let me know. 

. . .
Terabox Video Player