Summary:
We describe the current spam mitigations existing in CometBFT.
We provide two recommendations: We encourage application developers and node operators to use these existing mitigations for making their networks more resilient to spam or traffic surges. We also strongly recommend developers that use CometBFT v0.37 or older to transition to v0.38 and to plan transitioning to v1. The CometBFT team is available to actively contribute and support all users on both of these matters. You can find us on Slack, Telegram https://t.me/CometBFT or Discord https://discord.gg/interchain
We end by describing the ongoing work to implement in future Comet versions: (i) quality of service guarantees, and (ii) a more efficient transaction dissemination algorithm called DOG.
Applications that build on CometBFT and the Cosmos SDK do not typically have, by default, built-in mitigations to deal with spam traffic. Such situations can lead to a network becoming congested and having unpredictable block times.
There are various approaches to mitigate such problems today. Below we report on the set of mitigations that apply to CometBFT specifically. We also document our current and ongoing work through which we plan to introduce quality of service guarantees in future CometBFT releases and make the P2P layer more efficient in particular with regards to the transaction dissemination algorithm.
To mitigate the impact of spam or traffic surges such as some chains have experienced in the recent period, CometBFT provides some avenues. Generally, CometBFT is a "producer-consumer" consensus engine system. The engine will be able to do advanced things such as treating transactions on a case-by-case basis, like Quality of Service (QoS) guarantees which are being implemented, described below. Even in the presence of such an advanced feature, the bottom line is that the application is the layer that understands how to throttle the production of spammy transactions, and how to differentiate transactions of various kinds by prioritizing among them.
That being said, the first mitigation for app chains to mitigate spam is by reducing the allowed max block size. There was an advisory we published ASA-2023-002 towards the end of September 2023. Specifically, in that advisory we recommend operators to check the parameter `BlockParams.MaxBytes
` in the genesis file (doc ref for v0.38) to a smaller value than the default of 21MB. As a default value in Comet, we have reduced the value from 21 MB to 4 MB. We have also change the `max_gas
` parameter (doc ref) by default to a value of 10M (from a previous unbounded value of -1).
Related to the `BlockParams.MaxBytes
` parameter, we have also advised in ASA-2023-002 that the `timeout_propose
` parameter (doc ref) should be computed using the maximum allowed block size as a reference. A larger maximum block size implies that a longer timeout for proposing is necessary.
Separately, we recommend that all nodes in a given network should have consistent values parameterized for the CometBFT mempool capacity and transaction size, across that network. These are the three parameters called `mempool.size
`, `mempool.max_txs_bytes
`, and `mempool.max_tx_bytes
` (doc ref). Without consistent configuration across the different network nodes, some transactions will keep tumbling around in the network for a long time. Side note: This also applies to other settings not related to the mempool, such as `minimum-gas-prices
` in app.toml configuration for applications built using the Cosmos SDK.
As a heuristic mitigation, which is more recent and we have not fully corroborated yet, we advise nodes in a network to avoid using large mempools. This tends to have detrimental performance effects. We advise against configuring a `mempool.max_txs_bytes
` to more than roughly 10 times the chain’s block size (unless there is a good reason for doing so).
We have designed and added support so that CometBFT no longer performs transaction dissemination, and instead can delegate the mempool responsibility to the application layer. This is an optional feature called “nop” mempool. For technical description, see ADR 111: nop Mempool. Applications are thus able to implement the mempool through a P2P layer that is separate from CometBFT and is potentially optimized for application-specific conditions. This feature is available in v0.37 and newer versions of CometBFT. Documentation reference for v0.37 is here: CometBFT Documentation - Mempool - v0.37.
There were also numerous, smaller improvements that contribute to achieving better resilience and predictability of a network in the face of transaction surges. Two in particular are relevant and recent. First, we have added a backpressure mechanism so that mempool stops accepting transactions if it cannot keep up with ReCheck calls cometbft#3314. Second, as a significant contribution from Dev @ Osmosis team, we have added support to make mempool update async from block.Commit cometbft#3008 (this is just one of an impressive array of performance optimizations that Dev has contributed to CometBFT over the last few months!). The former optimization was shipped on CometBFT v0.37 and newer versions. The latter optimization is only on the v1 line a the moment, and we are assessing whether we can backport to v0.38.
Working closely with users and collaborators, we have identified that a root cause of spam surges in CometBFT-based networks is the fact that the mempool protocol, when overloaded, might make a node progress very slowly in building blocks, affecting itself and possibly the rest of the network. Briefly, CometBFT lacks an internal mechanism to contain the pressure in mempool from spreading to other components. This is largely due to tight coupling between different components in CometBFT, and also due to some limitations of the p2p connection layer (see cometbft#3053).
We are adding mechanisms to prevent the above from happening. The cost of these measures is some nodes may drop some transactions in some circumstances, i.e., load-shedding. The trade-off here can be made customizable, and we are trying to do so via the “Mempool Quality of Service” (QoS) design we’re aiming to introduce. Reference: cometbft#2803.
We refer to the QoS approach as “mempool lanes” in some of our design documents. This is because the concrete approach to provide quality of service guarantees entails providing a (configurable) set of lanes for the mempool. The lanes are ordered in terms of their priority. Like a highway, some lanes will go faster, while others slower. Transactions that are important or urgent for an application should go in the higher priority lane. Such transactions will have stronger guarantees (predictability, latency) in terms of their propagation and inclusion in a block. It will be up to the application to decide prioritization.
Besides the QoS design, there is one more exciting feature we are working on. This is based on a simple yet brilliant idea from our colleague Hernan, a Research Engineer in the Comet team. The feature consists of adding a Dynamic Optimal Graph (DOG) gossip protocol for the mempool. We have been designing, testing and stabilizing this novel protocol that improves the efficiency of transaction dissemination and is meant to reduce the number of duplicate gossiped transactions. The outcome is that the system performs more efficiently from a bandwidth perspective and can cope better with high surges of traffic. The algorithm extends the base FLOOD protocol with a mechanism to eliminate cycles in transaction dissemination. In the absence of this extension, the algorithm falls back to the base FLOOD protocol.
So far, we have obtained compelling performance results. On a 200-node network, with transaction workload close to saturation, we observed a reduction in transaction dissemination bandwidth close to 75%. Those results are obtained with conservative guesses of the protocol's configuration; we believe that fine-tuning the config can provide even further reduction in bandwidth. If you are curious about the details, check the GitHub issue cometbft#3263 !
We are currently working on a demonstration of the QoS design. We plan to make steady progress on both the QoS and DOG designs, with QoS taking priority. It is not yet clear if we can backport any of these new features to v0.37 or v0.38 versions of CometBFT. At minimum, we’re planning to make them work with v1.
For reference, the Cosmos Hub is using the existing mitigations for several months. They are limiting transactions based on the `max_gas`
parameter so that it is low enough to prevent transactions from reaching 2MB or higher. They are also in the process of upgrading to v0.38, slated for the next release.
We strongly recommend developers that use CometBFT v0.37 or older to plan transitioning to v0.38 and then to v1. We understand the transition is not seamless, and upgrades are sometimes difficult. The CometBFT team is available to actively contribute to any network to upgrade and support the transition away from older versions, and also to support all users on mitigating spam or other issues. You can find us on Telegram https://t.me/CometBFT or Discord https://discord.gg/interchain as well as Slack.
Many thanks to the CometBFT and Amulet teams for their constructive feedback on earlier versions of this essay. Revised July 22 @Adi: problem statement wording & summary.