At Informal Systems, we take software quality seriously. Consequently, when we are iterating on features for the Cosmos Hub, like Interchain Security, we use an extensive suite of tests. An important part of those are our end-to-end tests. In those tests, we spin up a small “Cosmos ecosystem” by running multiple local testnets with a handful of validators each. We submit sequences of transactions to the chains and check that we get correct behavior by inspecting the resulting chain states. These tests are some of our most high-level tests, checking that all components like various Cosmos SDK modules, but also components outside of the application binary like relayers, are behaving well together. and that our implementation matches the intent of the protocol. We run the application binaries just how validators will run them, so we get results that we are pretty sure match the behavior we would see in production for the same inputs.
We were pretty happy with how our test suite was set up and what coverage it gave us, but we noticed a problem that many blockchain projects eventually hit on with end-to-end tests: They are either slow or flakey. For example, imagine we want to test that a governance proposal to change a parameter is executed correctly when it is voted in. For this, we spin up the local testnet, submit a governance proposal, make enough voting power vote yes for the proposal to get accepted, then wait until the voting period is over and check that the parameter change was applied correctly.
Our problem with this workflow is that it is slow - we have to wait for the voting period to end. We clearly set the voting duration shorter than on the real chain (no one wants to wait for two weeks for their test results!), but setting it too short also is problematic - if the voting period is over too quickly, we may not have had enough time to make our validators vote yes on the proposal, and our governance proposal will sometimes fail unexpectedly.
We have a similar problem with block times in general: If we make the chain produce blocks slowly, our tests take longer, but if we try to make it produce blocks very fast, validators might not be able to respond in time, and rounds of our underlying consensus engine will fail and have to be retried, which erases any time savings.
It turns out that while a real consensus engine is great for running real blockchains, it’s not great for running local testnets - in these testnets, we control all the validators, so we don’t really need byzantine fault-tolerance.
Enter: CometMock
For testing, we have very different requirements from a chain running in production, but we still want our application to be executed as close to reality as possible. That is why we created CometMock, which is a stand-in for CometBFT, the consensus engine powering Cosmos.
You might know that the CometBFT process communicates with the application via the Application BlockChain Interface, or ABCI for short. The CometBFT process, in turn, communicates with the CometBFT instances of other full nodes, and together they ensure that they send their application instances the same blocks in the same order, thus the applications of nodes in the chain have a consistent state. Along the way, CometBFT does a lot of work to be correct even in the presence of malicious participants.
CometMock also communicates with an application via ABCI, but the difference is that one CometMock instance talks to many application instances. These are all instances of the same application, so akin to having many full nodes, but without the gossip between them. This means it saves the heavy communication and computation that is done by CometBFT to be byzantine fault tolerant, but to the application everything looks exactly the same as if it was talking to the real consensus engine.
This allows us to reduce communication overhead and extra work that is necessary on a real blockchain, but not required in a testing environment.
Even better, since CometMock controls what the application receives over ABCI, we can do some things that real CometBFT does not allow. For example, timestamps - applications typically do not use the system clock to determine time, but instead need to reference the current block time that is provided to them via ABCI as part of the block header. But since it is provided via ABCI, CometMock has full control over it, so we can tell the application what time it is. No need to wait for voting periods to end - just tell CometMock to skip ahead, and to the application, it will look like weeks passed, without having to actually wait more than a few milliseconds!
There are a few more exciting things that CometMock allows us to do, like causing downtime and double-sign infractions very easily.
Let’s get hands-on with CometMock and see how easy it makes it to manipulate the chain state.
The tutorial has some dependencies, in particular jq, e.g. via homebrew:
brew install jq
…and Go:
brew install go@1.20
You can follow along with this demo by cloning the CometMock repo:
git clone https://github.com/informalsystems/CometMock.git
cd CometMock
git checkout v0.37.2-3-tutorial
…and then installing it:
make install
To check that CometMock was installed correctly, let’s check it’s version:
$ cometmock version
v0.37.2-3-tutorial
In this version string, the first part tells us the version of CometMock we are using is interchangeable with CometBFT v0.37.2.
We also assume you have the binary of some CosmosSDK application (using CosmosSDK v0.47) installed. For example, you can use the ‘simapp’ binary that comes with the CosmosSDK:
git clone https://github.com/cosmos/cosmos-sdk.git
cd cosmos-sdk
git checkout v0.47.5
make install
To check that simd
was installed correctly, run
$ simd version
0.47.5
Once we got both a CosmosSDK app and CometMock installed, let’s start a small testnet to interact with. CometMock provides a handy script that initializes a chain with three validators, all talking to one CometMock instance. Inside the CometMock repo, run
./local-testnet-singlechain.sh simd
…replacing simd
with the name of the app you want to use.
This runs for about half a minute. Eventually, you should start seeing the output from CometMock, which looks something like
I[2023-09-11|15:02:01.292] indexed block exents module=txindex height=1
D[2023-09-11|15:02:01.292] indexed transactions module=txindex height=1 num_txs=0
D[2023-09-11|15:02:01.312] Unlocking mutex
D[2023-09-11|15:02:02.313] Locking mutex
I[2023-09-11|15:02:02.313] Running block
I[2023-09-11|15:02:02.322] Sending Commit to clients
I[2023-09-11|15:02:02.322] indexed block exents module=txindex height=2
D[2023-09-11|15:02:02.322] indexed transactions module=txindex height=2 num_txs=0
… and shows us that CometMock is producing blocks.
Let’s check the current block time by using our app binary to query the chain state.
Run:
$ simd q block --node tcp://127.0.0.1:22331 | jq -r '.block.header.time'
2023-09-11T13:05:55.676911Z
… which, in this case, shows us the current year is 2023.
Let’s see what we can do to change that! CometMock uses the underlying system clock to keep track of time, but we can manually tell it to advance time by some amount.
All CometMock specific functionalities can be accessed via RPC calls, for example via jsonrpc. Advancing time is one of those special functionalities, so let’s run
curl -H 'Content-Type: application/json' -H 'Accept:application/json' --data '{"jsonrpc":"2.0","method":"advance_time","params":{"duration_in_seconds": "36000000"},"id":1}' 127.0.0.1:22331
…which tells CometMock to advance time by 36000000 seconds, or a bit more than a year.
Now, let’s query the current time again,
$ simd q block --node tcp://127.0.0.1:22331 | jq -r '.block.header.time'
2024-11-01T05:42:12.117043Z
Suddenly it’s 2024. We skipped over a year pretty quickly there!
Let’s see another example of CometMock specific functionality, this time taking validators down to stop them from signing.
Recall, this testnet has three validators. Let’s double-check this by seeing the signing information:
$ simd q slashing signing-infos --node tcp://127.0.0.1:22331
info:
- address: cosmosvalcons1z4kl60l3n4ec2fy9mhsrh8z75ddwthchqxgsl4
index_offset: "1634"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1v349k94tkkjtls4qy8fecquk4te4ddnu5aqh3u
index_offset: "1634"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1e23ydew9fplkd7greu6uavxqaxr8pjw5ljfayn
index_offset: "1634"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
pagination:
next_key: null
total: "0"
So we have three validators and none are missing blocks or are tombstoned. Let’s change that by telling CometMock to make a validator stop signing.
First, run this command to put the key address of one of the validators into an env variable:
PRIV_VALIDATOR_KEY_ADDRESS=$(jq -r '.address' ~/nodes/provider/provider-coordinator/config/priv_validator_key.json)
Then let’s use that key address to tell CometMock to make a validator not sign by setting its signing status to ‘down’:
$ curl -H 'Content-Type: application/json' -H 'Accept:application/json' --data '{"jsonrpc":"2.0","method":"set_signing_status","params":{"private_key_address": "'"$PRIV_VALIDATOR_KEY_ADDRESS"'", "status": "down"},"id":1}' 127.0.0.1:22331
{"jsonrpc":"2.0","id":1,"result":{"new_signing_status_map":{"CAA246E5C5487F66F903CF35CEB0C0E98670C9D4":true,"156DFD3FF19D73852485DDE03B9C5EA35AE5DF17":false,"646A5B16ABB5A4BFC2A021D39C0396AAF356B67C":true}}}
The result tells us that one of the validators has its signing status as false now, while the other two have it set to true.
Let’s see the result of what we did in the signing info:
$ simd q slashing signing-infos --node tcp://127.0.0.1:22331
info:
- address: cosmosvalcons1z4kl60l3n4ec2fy9mhsrh8z75ddwthchqxgsl4
index_offset: "1674"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1v349k94tkkjtls4qy8fecquk4te4ddnu5aqh3u
index_offset: "1674"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "13" ⬅️⬅️⬅️
start_height: "0"
tombstoned: false
- address: cosmosvalcons1e23ydew9fplkd7greu6uavxqaxr8pjw5ljfayn
index_offset: "1674"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
pagination:
next_key: null
total: "0"
Notice that the second validator has started to miss blocks! It would take a while for the validator to get punished for downtime and jailed, since we only start punishing validators when they miss many blocks over a window of blocks, but of course CometMock can help us do this quickly. We can use CometMock to very quickly produce a lot of empty blocks without transactions. Run
curl -H 'Content-Type: application/json' -H 'Accept:application/json' --data '{"jsonrpc":"2.0","method":"advance_blocks","params":{"num_blocks": "1000"},"id":1}' 127.0.0.1:22331
…which will quickly produce 1000 blocks. In the terminal window for CometMock, you should see blocks flying by after you execute this. Let’s check the signing info again to see whether our validator was jailed:
$ simd q slashing signing-infos --node tcp://127.0.0.1:22331
info:
- address: cosmosvalcons1z4kl60l3n4ec2fy9mhsrh8z75ddwthchqxgsl4
index_offset: "2833"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1v349k94tkkjtls4qy8fecquk4te4ddnu5aqh3u
index_offset: "0"
jailed_until: "2024-11-01T05:57:12Z" ⬅️⬅️⬅️
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1e23ydew9fplkd7greu6uavxqaxr8pjw5ljfayn
index_offset: "2833"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
pagination:
next_key: null
total: "0"
Let’s demonstrate one last functionality that CometMock offers, which is generating evidence that a validator double-signed. Typically, testing double-signing involves starting two nodes with the same consensus key and tricking their CometBFT processes into double-signing (it is built with safety features to avoid just signing the same block twice, after all!).
With CometMock, this becomes just another curl command. Let’s make a validator double-sign by running the following commands:
PRIV_VALIDATOR_KEY_ADDRESS=$(jq -r '.address' ~/nodes/provider/provider-alice/config/priv_validator_key.json)
…to grab the key address of another validator (the first one is still jailed until they unjail themselves!) followed by
curl -H 'Content-Type: application/json' -H 'Accept:application/json' --data '{"jsonrpc":"2.0","method":"cause_double_sign","params":{"private_key_address": "'"$PRIV_VALIDATOR_KEY_ADDRESS"'"},"id":1}' 127.0.0.1:22331
…to make that validator double-sign.
Let’s check the signing status again:
$ simd q slashing signing-infos --node tcp://127.0.0.1:22331
info:
- address: cosmosvalcons1z4kl60l3n4ec2fy9mhsrh8z75ddwthchqxgsl4
index_offset: "2986"
jailed_until: "1970-01-01T00:00:00Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1v349k94tkkjtls4qy8fecquk4te4ddnu5aqh3u
index_offset: "0"
jailed_until: "2024-11-01T05:57:12Z"
missed_blocks_counter: "0"
start_height: "0"
tombstoned: false
- address: cosmosvalcons1e23ydew9fplkd7greu6uavxqaxr8pjw5ljfayn
index_offset: "2980"
jailed_until: "9999-12-31T23:59:59Z" ⬅️⬅️⬅️
missed_blocks_counter: "0"
start_height: "0"
tombstoned: true ⬅️⬅️⬅️
pagination:
next_key: null
total: "0"
We can see our validator got tombstoned (part of the punishment for double-signing) and is jailed forever!
In the Cosmos Hub team at Informal, we are starting to integrate CometMock in our end-to-end tests. It is already running under the hood in some of our end-to-end tests for Interchain Security. Because we can use CometMock to get faster block times and skip waiting times like voting periods, the running time for this test suite went down from 10 minutes to just 1 minute.
While we probably never want to completely stop running the tests with standard issue CometBFT, we are planning to use CometMock to run a much bigger, partially randomly-generated suite of tests in the future - involving things that are really hard to do in end-to-end tests with the real consensus engine underneath, such as specifically checking edge cases around block numbers and timestamps.
We built CometMock to make our end-to-end tests faster and more reliable. In our tests for Interchain Security, we are able to run 90% faster using CometMock than with CometBFT.
But this is just the beginning - CometMock can be used to tackle scenarios that are very hard to test with a real consensus engine, such as downtime or double-signing, and makes testing them just a single, easy RPC call. And less time spent on fighting with network and node setups means more time to come up with scenarios that test what matters!
If you find CometMock intriguing, make sure to check out the repository at https://github.com/informalsystems/CometMock and give it a try in your tests. The repo also gives some more details behind how CometMock works if you’re interested in the nuts and bolts!