In Q2, team Comet will focus on five streams of work. Each stream aims to nibble away at a major problem from our priorities backlog.
The problems we’re tackling have a broader scope than this quarter. Therefore, we expect to continue focusing on these issues throughout the rest of 2023.
The five problems of focus are, in order of priority:
CometBFT provides poor protocol design support to application developers.
There is currently no alternative to network-based state sync.
The JSON/RPC that CometBFT nodes expose is flaky.
Storage and bandwidth consumption are expensive for operators running CometBFT nodes.
For specific solutions, see the full post below.
We launched CometBFT at the beginning of February 2023. It has been 2 months since then. In this brief time, we have made significant strides in the development of CometBFT. In this post we will first summarize what were our main successes during Q1. Then we will provide an overview of the priorities we plan to tackles as part of Q2, as well as the process we used to arrive at these priorities.
Q1 was an interesting period for the CometBFT team because the first month (January) was packed with activities related to forking Tendermint Core, renaming that fork into CometBFT, and preparing its public announcement. This work was tedious but very rewarding: It allowed us to restart development and to provide a solid groundwork for continuing to evolve this software towards growing the Interchain. Please read the announcement introducing CometBFT here and in this thread.
In terms of technical & documentation deliverables, during Q1 we have shipped the following:
Feb 3, 4: we released v0.34.25
and v0.34.26
, the former release being a security patch to the v0.34.*
line. Both of these releases were done from the Informal Systems team's public fork of Tendermint Core.
Feb 27: we released v0.34.27
, the first official release of CometBFT, a drop-in replacement for Tendermint Core v0.34.x
Feb 28: we shipped the official documentation for CometBFT at https://docs.cometbft.com/
March 6: released v0.37.0
, the first CometBFT release with ABCI 1.0
March 29: released v0.38.0-alpha.1
a pre-release with ABCI 2.0
Both v0.37.*
and v0.38.*
have been in preparation for a long time. We are thankful to all the teams and contributors that have helped design and refine ABCI2.0 over the last ~2 years. It is truly exciting to see all this work come to fruition, and the numerous networks interested to leverage the new features in the ABCI interface.
The last two weeks of March we have been busy managing our backlog and getting our priorities in order so that we have clarity over the work we’ll be doing over the next 3 months.
We have five streams of research and development lined up for the next quarter. We will first talk about the problem each of this line of work tries to address, and then provide some context on the solution space.
CometBFT v0.34
, used in production today, comprises the ABCI v0.17
interface. ABCI is the API separating the application from the consensus engine. This is the interface that application developers have at their disposal to interact with CometBFT.
ABCI v0
was designed circa 2015-2016. Since then, the broader crypto ecosystem has evolved rapidly. Applications demand more flexibility and control today. Application developers seek to have a more fine-grained control over what CometBFT puts inside blocks and to use CometBFT to exchange information between nodes other than through blocks.
To address this problem, ABCI++ was proposed circa 2020 (RFC 013). ABCI++ is being delivered across two versions: ABCI v1 (CometBFT v0.37
), and ABCI v2 (CometBFT v0.38
). As mentioned above, we have released ABCI v1, and for v2 we are in the final phases: we have already released an alpha-1
. What is left is to iron out the interface design, to write more comprehensive documentation of ABCI v2, and to do an extensive large-scale QA which typically involves 200-node testnets.
We estimate releasing v0.38.0
to take another 1-2 months of work.
In CometBFT v0.34, synchronizing a fresh node with the rest of the network relies on a protocol that fetches snapshots from the other peers in the network. Some disadvantages of this approach are that it is fragile (because peer snapshots might not be immediately available) and bandwidth-intensive.
Alternatives to network-based state sync exist. For instance, using a local snapshot, which may have been generated by the node itself or copied from a trusted source, state synchronization could avoid large transfers from remote peers. Such an alternative has been explored and we plan to bring that work into a future release of CometBFT.
This problem is particularly interesting because operators employ state syncing of new nodes frequently. They do so because storage consumption of nodes grows over time, so there is a preference to setting up new nodes regularly. The storage growth issue is something we’re aware of and is also a priority (see below). The bottom line is that this is an important problem, and as an important user put it in the tracking issue:
yes please, every validator would like to have this
There are multiple dimensions to this problem. First, the JSON/RPC implementation is very complex, and second, it is orthogonal to the main task of CometBFT, which is state machine replication (not RPC), hence it inflates unnecessarily the surface area that we need to maintain. Third, we are aware of security concerns, i.e., that the RPC is a potential DDoS vector, so we advise defensive measures there. Fourth, the RPC interface can pose a bottleneck to IBC relaying operations; this is from our experience with developing Hermes IBC relayer, and running it as part of Informal Staking. Fifth, the RPC interface can impose backpressure into other components of CometBFT (e.g., consensus) which can make the node unreliable by falling out-of-sync with the network.
Our approach towards fixing this is still in the exploration phase but we have a promising solution in the form of a Data Companion. We have recently discussed this at our community call, and invite feedback directly in the PRs (ADR 100, ADR 101).
On the storage front, the problem is that pruning does not work effectively. This leads to increasing storage costs over time. At a more basic level, CometBFT supports multiple database backends, but there is no documented use-case of the workload characteristics expected of these backends. Consequently, it is not clear which of them is most appropriate from a performance and efficiency perspective. Similarly to the JSON/RPC problem, we want to reduce the surface area under our maintenance, and would therefore prefer to narrow down the supported databases to the one that fits best.
In terms of bandwidth, we are aware that operators are incurring high costs due to CometBFT-based networks’ bandwidth consumption. To mitigate this, an initial investigation led to a reduction by 50% of Precommit votes sent among peers. We are continuing that investigation towards reducing non-vote bandwidth consumption (e.g. block parts or transactions). An important element of this investigation comprises the drawing of specifications for the P2P and consensus modules, which are at the center of attention when it comes to bandwidth usage.
We have chosen problems based on the input and feedback we have been gathering from various channels (Slack, Discord, Telegram, Twitter, and community calls).
The specific elements of feedback we were concerned with where:
Are there specific people that can confirm the problem? Do we have a good understanding of the underlying concerns or root cause?
Is there an articulation of the impact of the problem, e.g. in terms of money, opportunities, time, or energy? Does that make the problem urgent, relative to other problems?
Are there specific users we can work with to bounce our ideas off, to de-risk solutions, and ensure continuous feedback?
Can we make incremental steps towards a solution?
Additionally, we have also given a slight preference to quick win and high-impact problems.
This led to us categorizing the problems in a table. This shed light and clarity over all problems by comparing them in terms of urgency, impact, users, complexity, and effort.
The table below is a snapshot from our raw notes during the prioritization exercise. It shows the top (most urgent) problems for Q2. Note that the “User” column is non-exhaustive: Some users go unmentioned to avoid swelling the table.
The following snapshot also shows our current short-listed backlog of problems.
Both the chosen priorities and the backlog are aligned with the project board. The difference is that in these tables and our internal discussions we have went more deeply into assessing the problem and solution space, while also capturing users & impact.
We are aware that this approach to prioritizing is not entirely objective. If there is feedback or ideas to improve, we would be glad to engage with the community to refine our prioritization.
Finally, we're executing on these priorities in parallel to a multi-quarter research work on forks of Tendermint Core and CometBFT (e.g., forks of Celestia, Sei, Skip, Numia, or Polygon among many others). The aim of this investigation is to uncover which improvements that other teams did in their forks would be appropriate and desirable for upstreaming into CometBFT. For each candidate, we're assessing its complexity and impact. The idea is to work with some of these teams to upstream changes into mainline CometBFT, so that the whole community can benefit from the corresponding improvements.
Follow @cometbft on Twitter to keep up-to-date with all news related to CometBFT.