[Tour de SOL] Dry Run #6 Debrief

Dry Run 6 Debrief

Really excited to share the news from Dry Run 6. This time around a network of 41 validators lasted for 21 hours, while undergoing 12 “ramping” periods, some of which saw average transaction rates of over 9,000 tx/s with max bursts up to 60,000 tx/s! This behavior replicated internal tests leading up to the event providing confidence in the robustness of the current state of development.

If you recall, our previous Dry Run, DR5 was meant to be our first attempt at ‘real’ performance stress testing rather than just running an idle network. While we learnt a lot from that attempt, it was short lived due to a loss of consensus and network stability issues.

However after a significant amount of effort from our community and team we think DR6 demonstrates that those issues have been worked through and we now have a highly performant network to continue iterating on. In addition, this performance was achieved before any optimisation of how stake was distributed globally.

After reviewing the logs we were glad to find that the network didn’t fall over due to consensus issues. More on the identified issues below.

Also, we wanted to say thank you to the folks at Staking Facilities who voluntarily put together a block explorer for this Dry Run attempt!

Play by Play

  1. Network booted at roughly 7:30am PST, Jan 7th

  2. After a 24 hour period on-boarding time for Validators, we started the Ramp TPS tool, to start pushing transactions into the network. By this stage we had 41 external validators connected to the network

  3. Ramp TPS started to push transactions through the network for continuous 20 minute rounds with 1.5 hour rest in between each round. Starting at 1,000 tx/s in round 1, and increasing by 2000 tx/s with each passing round (i.e. 1,000 tx/s, 3,000 tx/s, 5,000 tx/s etc.). Validators that kept up with the network after each round were awarded more testnet tokens. Those that didn’t keep up did not.

  4. This continued for about 21 hours, reaching round 12 (23,000 tx/s batch size).

  5. However after Round 12 we started to see a large amount of stake abruptly go offline, about 50% of the stake was still up and moving together however we simply weren’t able to reach 66% therefore we weren’t able to make progress and the network halted

Validator Results from DR6

I’d like to preface this that DR6 was not a competition, and regardless of performance within DR6 all Validators are treated equally. It was simply a test of validator setups, to give them an opportunity to benchmark and optimise for TdS Stage 1.

Having said that, amazing effort from all. The most performant validators based solely on amount of stake accumulated during the DR6 were:

  1. aurel (Dokia) - 7743.02644 SOL

  2. adrian_i (Moonlet) - 2045.83690 SOL

  3. proofofjk (Stake.Fish) - 2044.85177 SOL

Full schedule of stake distribution ranking list is provided here.

Also a shout out to all the other validators who participated:

  • chorusoneinc, alexticea, nodeateam, mabalaru, zemlyanin, melea, mattharrop, syncnode, hma_cope, lunanova, katernoir, p2p_org_, viktor0766, pkrasam, smartya, bneiluj, 01node, pos_bakerz, chrisremus, kellyolson, gnossienli, tlinton, brianlong, ubikcosmos, rosewu, zhanglianghui, rockxrocks, bitcat365, forbole, sunnya97, stakingfund, rudylu, everstake, smith_it2000, stakedinc, syamkumar66, proofofjk, sebytza05, adrianbrink

As we learn from how stake accumulates during these simulated Tour de SOLs, we may also be revisiting our initial compensation design, to ensure that we can award compensation fairly and transparently. More on this soon!

Identified Issues

  • Why DR6 died: More than 33% of the Validators generated an inconsistent bank hash due to a race condition, causing the rest of the network to reject their votes. Link to github issue here.

Interested in sharing your validator logs with us? You can drop the file in here. Thanks in advance!

Notes & FAQ:

During DR6 we also noticed some questions come up relatively frequently, therefore we’ve provided some terms we’ve been using to measure validator performance and share them here for the broader communities information:

  • Node Health - defined by a nodes ability to produce a block during it’s leader slot. If the node fails to produce blocks under 80% of it’s leader slot then it becomes ‘unhealthy’

  • Inactive Status - inactive status refers to nodes that either don’t have stake activated or that your node hasn’t been voting on blocks

Other:

1 Like