[Tour de SOL] Update on Dry Run 6 & Stage 1

We’ve been hard at work these last two months since Dry Run 5 (DR5). With a ton of upgrades to our testnet stability and performance which we’re excited to test as we’re starting to hit some major breakthroughs, giving us enough certainty and confidence to share some upcoming milestones over the next 2 months (see below for more details). Specifically:

  • January 7th - Dry Run 6
  • Mid-January - Tour de SOL Stage 1

Some of this was also discussed recently in our recent Validator Roundtable. Of which we’ve provided a link to at the end of this post for anyone interested.

Technical Updates since DR5

1) Improvements to Ramp TPS

For those who aren’t aware, Ramp TPS is the tool we’ve been using to push transactions through our testnets to test performance. The transactions are pushed through the networks in periodic bursts, followed by an idling period where Validators that fell behind have an opportunity to catch up. The size of each transaction burst increases with each round (i.e. 2k tps burst, 4k, 8k… etc.)

Those that do not fall behind during the bursts of transactions, are rewarded by having their stake doubled in between this idle period. This process continues repetitively until the network either falls over, or we reach peak network performance (i.e. average TPS stops increasing)

There’s been a few bugs and items which we’ve rectified since DR5, including:

  • Ramp TPS script was designed to send 10,000 tx every 250ms, was only sending 10,000 tx every 500ms due to network delays. This has now been fixed.

  • The function which doubled the stake for Validators who successfully kept up with the network during the bursts was in some instances not functioning properly. This has now been rectified.

2 ) Massive Performance and Stability Improvements

In earlier Dry Run attempts, one of the main issues was that after the Ramp TPS pushed the network throughput to a certain level, the network would simply fall over. This has always been a blocker for Tour de SOL (TdS) and Dry Run 6 (DR6).

Since DR5 we’ve done some extensive testing to rectify this, and we’re seeing success in doing so. With our internal testnet now staying up even when we run Ramp TPS. In our initial experiments we were able to produce an average of 50,000 tps within a small validator network. However, as we added new validators we naturally saw a decline in this figure due to the increased network latency.

As a result, we developed some tools internally to simulate this network latency, to better identify the bottlenecks and optimize ‘real’ performance more reliably. We’re proud to disclose that in our latest attempts at stress testing the network internally we’ve still been able to achieve an average of 50,000 tps while simulating network latency across a larger validator set at our co-located instances with bursts of up to 100,000 tps.

We hope to re-test and replicate this within DR6, and in time continue to improve these metrics.

Recently Past/Upcoming Milestones

Below we’ve provided a rough indication for recently past and upcoming major milestones through December and the following month, with detailed descriptions of each milestone afterwards:

1. 19th Dec - Release of Sultans v0.21.5

2. 20th Dec - Internal Dry Run of Sultans

3. Early-January - Dry Run 6

4. Mid-January - Tour de SOL Stage 1

1) Release of Sultans v0.21.5

As of 12/19 we’ve released Sultans v0.21.5 which is a patch release to the current testnet. Please follow the link for further details on the release.

2) Internal Dry Run

We launched an Internal Dry Run based on the Sultans v0.21.5 release on the 20th of December. With 16 external validators participating ad-hoc in this attempt. The results from the Internal Dry Run came back quite respectable with the network averaging ~12,000 tx/s, peaking at 52,000 tx/s and a median block time of ~2 seconds. It eventually fell over due to some minor bugs, but only after enduring longer than any previous Dry Run records. Some of the actions coming out from the attempt are:

  • Debugging why nodes are using up all the RAM and OOM’ing
  • Adjustments to SOL stake distribution during the event to account for block production (i.e. Validator nodes that are skipping more leader blocks should receive less stake for the next round).
  • Minor rework to our network setup so that the nodes don’t act as the bottleneck and prematurely halt if they go down

3) Dry Run 6

Since the Internal Dry Run on Sultans v0.21.5 was successful. We will officially initiate Dry Run 6 using this release on January 7th. Which will be a more formal stress test of the network of which we’re all familiar with. In our mind, the beginning of Tour de SOL - Stage 1 will be gated by this.

4) Tour de SOL - Stage 1

All things going well, we will be able to officially launch what we’ve all been waiting for. Tour de SOL Stage 1, our incentivised testnet. Given that some features might still be disabled, we’ll be revisiting the compensation structure for Stage 1 to ensure that it is properly aligned with our goals.

For context this may involve the removal of compensation for malicious behaviour, and simply focus on performance metrics for now. We will provide more details in this prior to the beginning of Stage 1.

Links

Validator Roundtable #8 - 19/12/2019

5 Likes

Thanks a lot @dominic eagerly looking forward to Jan 2020 …anything we can do now to prepare?

Great work, @dominic

At this stage besides preparing your setup for Dry Run 6, not too much else. While GPU’s are still optional for DR6. given recent progress, our recommendation is that participants should take this as an opportunity to test out the hardware they intend to use for Stage 1 this time.