In this post, we want to raise the topic of validators’ downtime on the Solana network and share insights gained to benefit the Solana community.
There are many factors influencing downtime duration and these are typical ones:
- node operator reaction time (node operators may or may not use specialized monitoring and alerting systems);
- node operator skill (imagine the difference between inexperienced enthusiasts and mature professionals who are working with such high-load systems for years);
- time to repair breakdowns in the power grid or communication network (which does not depend on a node operator);
- time needed to debug and fix specific configuration errors or replace hardware parts;
- complexity and duration of software update (i.e., different Solana versions take different time to install), node startup duration, etc.
Despite most of these factors can not be measured directly, we have managed to collect and analyse some important on-chain data related to the topic, which allowed to quantitatively describe several aspects regarding Solana network nodes unavailability such as downtime duration statistics over time, its variability across nodes as well as duration of node software updates.
In the full report, we illustrate retrospective downtime statistics of Solana nodes that were active in the period from epoch №209 (5th of August, 2021) to epoch №236 (17th of October, 2021). Historical data allow to reveal trends in the dynamics of downtime making it easier to understand the normal behavior of the metric as well as to identify abnormal fluctuations.
Full Downtime Analysis you can find here.
The downtime data analyzed is available through the public Redash dashboard.
We hope this information will improve our understanding of the factors that affect downtime and help us bring more sustainability to the Solana network. We look forward to your feedback and suggestions.