During Q4 of 2024, the CometBFT team at Informal Systems has finalized the work on the Dynamic Optimal Graph (DOG) protocol. First announced earlier this year, DOG is a novel gossip protocol that represents a major step forward in transaction dissemination for CometBFT's mempool.
The Dynamic Optimal Graph (DOG) is a novel gossip protocol that represents a major improvement in transaction dissemination for CometBFT's mempool. Currently, mempool traffic comprises numerous duplicate messages, resulting in exponential bandwidth usage. Transaction messages account for nearly half of the total network traffic generated by CometBFT nodes. DOG reduces this bandwidth consumption significantly, with experiments showing a 2.5x reduction in traffic generated, reduction in missed blocks, and improved transaction confirmation latency.
Currently, in typical CometBFT-based networks, the mempool sub-system generates a large volume of duplicate messages when propagating transactions. DOG reduces this bandwidth consumption significantly, with experiments showing a 2.5x reduction in traffic generated, as illustrated by these metrics:
Testnet with 200 nodes and a transaction load of 500 tx/s during 15 minutes. On the left, the current mempool; on the right, with DOG enabled. The yellow line is the volume of transaction messages sent by nodes.
This substantial improvement is achieved through two key innovations:
A smart routing mechanism that selectively filters transactions before forwarding them to peers, reducing redundant transactions and improving bandwidth usage without compromising latency. Unlike "pull"-based gossip protocols, DOG avoids introducing additional communication steps or delays, thus maintaining (and even improving!) the low latency offered by the current protocol.
A closed-loop redundancy controller that ensures a minimum level of transaction redundancy, needed for preserving the network's resilience against malicious actors.
By optimizing how transactions are forwarded across the network, DOG not only reduces traffic but, as a side-effect, improves the overall system performance, enabling faster consensus and more efficient resource utilization.
In the following, we’ll dive into the details of how DOG compares to the current protocol, its inner workings, its key properties, and finally some results from our testnet experiments.
The current transaction dissemination protocol used in CometBFT's mempool, which we call Flood, has been in use since the early days of the project. Flood is a classic "push" gossip protocol. Its mechanism is simple but effective: whenever a node receives and adds a transaction to its mempool, it immediately forwards the transaction to all its peers, except for the peer from which it received the transaction.
Flood excels in two areas. First, it propagates transactions with the minimal possible latency in decentralised P2P networks. Second, by "flooding" the network with transaction messages, it ensures resilience against malicious actors attempting to censor or disrupt transaction propagation. However, the simplicity of Flood comes with a significant downside: its broadcast-based approach generates a high degree of redundancy, causing exponential increase in bandwidth consumption.
At its core, the DOG protocol introduces a routing mechanism that minimizes the propagation of redundant transactions. Like other gossip protocols, DOG nodes exchange messages to confirm whether they have already seen a specific transaction. DOG goes a step further by using this information to dynamically build a routing table at each node.
The key idea of the protocol is the following. Let's assume a network with no Byzantine actors and where each transaction enters the network from a single node (we'll see later how to overcome these limitations). If a node A receives from a peer B a transaction that it has received before, this means that there must exist a cycle in the network topology. Then node A will send a HaveTx message to B to signal that it has this transaction, and B will close one of the "routes" that forwards data to `B`, thus cutting the cycle. Over time, this exchange of HaveTx messages allows nodes to cut all the routes that cause redundancy. Note that HaveTx messages only carry a transaction hash, which makes its size insignificant when compared to full transaction messages.
When routes stabilize, transactions have a single, optimal path to reach every node. The resulting structure is a superposition of directed spanning trees, which is the ideal topology for efficient data dissemination in peer-to-peer networks. Routes form an implicit overlay of one spanning tree per node. Each node only knows the routes from and to its peers, not relying on routing information of other nodes. Overlay trees are created dynamically and may evolve as the topology of the network changes.
For example, consider the following network topology with 7 nodes, where only nodes 1 and 7 receive transaction load:
Initially all routes are open, meaning that transactions will be transmitted through those paths. Nodes receiving transactions coming from outside the network forward them to all their peers. As redundant transactions are exchanged, nodes begin to send HaveTx messages to signal that certain routes should be closed. Over time, the network converges to an optimized state. Routes marked as red arrows in the diagram represent closed paths.
The result is a superposition of directed spanning trees, where each node has a single, optimal path to reach every other node. For example, transactions entering from node 1 will follow the path 1 → 2 → 4 → 5 → 7, as shown in the spanning tree for node 1 below:
Similarly, transactions originating from node 7 will follow the path 7 → 6 → 4 → 3 → 1, as seen in the spanning tree for node 7:
In non-Byzantine networks, DOG is able to achieve optimal performance with zero redundant messages. In adversarial environments, as expected, a minimum level of redundancy is necessary to counteract possible attacks and ensure that all transactions reach all nodes.
The DOG protocol implements a Redundancy Controller (effectively a closed-loop mechanism) that periodically measures the level of redundant transactions received and decides whether to request from peers for more or less transactions, thus tuning the redundancy level to a target value.
For instance, a target redundancy level of 0.5 means that the node accepts one duplicate transaction for two unique transactions received. If the controller detects that redundancy has dropped below this threshold, it sends a ResetRoute message to its peers to re-open a previously closed route. This mechanism ensures that all transactions reliably reach their destination, even in the presence of malicious actors or network disruptions, but also to allow nodes to dynamically adapt to changes in the network topology.
The DOG protocol is not limited to the mempool: it is a generic BFT gossip protocol that can be applied to disseminate other types of data as well. We are planning to explore its applicability to other CometBFT messages that consume significant bandwidth, such as block parts or votes.
The protocol implicitly favors faster routes. By cutting routes on peers that send duplicate transactions, which by definition are received at a later time, the protocol naturally optimizes for quicker paths.
Additionally, the protocol preserves the best-effort FIFO ordering for mempool transactions by filtering based on target peers. Even when this ordering is not strictly guaranteed, it is still important for some specific applications, such as IBC, which benefits from a lower rejection rate of IBC packets.
DOG was designed as an extension to the existing CometBFT mempool. It functions as a node plugin that can be enabled at any time, though the benefits are maximized when all nodes in the network adopt the protocol.
There are still two minor downsides that we intend to improve in the future:
Convergence time for redundancy: Reaching the optimal routes for the desired levels of transaction redundancy may take each node a time proportional to the number of peers it has, though typically in the order of a few minutes. This should not pose a problem as nodes are expected to run for much longer times.
Traffic fairness: Nodes with high-speed connections and a large number of peers may end up handling higher traffic load compared to other nodes. However, as the network topology evolves dynamically, this imbalance changes over time.
For more details, check out the protocol specification. It comes together with a formal spec written in Quint, which proved to be invaluable during the design phase of the protocol.
We conducted several experiments on our typical 200-nodes testnet with the goal of
identifying the optimal default values for the protocol's parameters (#4596, #4597, #4598),
evaluating the performance in scenarios with nodes having a different number peers (#4614), and
testing potential vulnerabilities (#4606).
In every evaluated scenario, enabling the DOG protocol consistently resulted in improvements across most metrics, with no observed degradation in any area.
Moreover, the reduction in redundant messages has a system-wide positive impact. Continuing with the same experiment from the introduction, we could observe how the whole system operates more efficiently with DOG enabled.
Mempool size is smaller and the number of cache hits is notably reduced as the routes are being built.
We observe:
More stable block production.
Validators need a second consensus round less often.
8 times less missing validators during voting.
About 2.7 times less missed blocks.
We also observe:
More blocks produced and more transactions on chain, as observed after 15 minutes
All of the above results in an additional 10% reduction of network traffic generated BlockPart messages.
Using DOG can also make nodes more efficient and cost-effective. In terms of resource usage, in these experiments we remarked a 43% lower CPU usage and 10% lower memory usage, as shown in the graph below.
Also in terms of efficiency, there is another side-effect: we notice 10% reduction in total transaction validation time (CheckTx), putting less pressure on the application and overall node.
And lastly, one remark on latency. The experiments reveal 15% less transaction latency, as a consequence of the system being less loaded. During this experiment, Flood throughput was 496.9 tx/s on average, and DOG had an almost equal production with 496.6 tx/s. The average latency with Flood was 3.63s, while with DOG it was 3.14s.
The DOG protocol is a powerful upgrade for any CometBFT-based network. It is available and ready to be used in CometBFT's main
branch. The Comet team in Informal Systems fine-tuned this feature to provide:
An efficient bandwidth usage: its routing mechanism significantly reduces bandwidth in all tested scenarios.
Low Latency: DOG does not introduce delays or extra communication steps that add latency, as other kinds of gossip protocols. Instead, it selectively filters transactions before forwarding them to peers. In our tests, transaction latency is even lower than the current protocol.
Byzantine Fault Tolerance (BFT): DOG keeps a minimum level of transaction redundancy for preserving the resilience needed to mitigate the impact of Byzantine attacks.
Improved system-wide performance: faster consensus and more efficient resource utilization.
To further improve the latency of transaction propagation, check out Mempool Lanes, a new feature we designed for the mempool to offer Quality of Service (QoS) guarantees by allowing applications to prioritize transactions for faster processing and dissemination.
As recently announced, the Comet team in Informal is stepping down, and Comet maintenance will be transitioned over to the engineering team of Interchain Foundation. Informal Systems has extensive expertise in building and maintaining CometBFT-based networks, however. We therefore remain committed to support builder teams who seek specialized requirements and are building high performance applications. Reach out to us at hello@informal.systems and we'd be thrilled to collaborate!