Understanding validator logs & skips

I typically have 5-8% skipped slots, it’s too high, and I am trying to determine the issue, but having trouble understanding the logs.

First I look at successful 4 block set: 28425564 - 28425567 (all blocks made)

[2020-08-14T06:59:34.694243100Z INFO  solana_core::replay_stage] new fork:28425564 parent:28425563 (leader) root:28425532
...
[2020-08-14T06:59:34.938892572Z INFO solana_runtime::bank] bank frozen: 28425564 hash: **6DYRMdGZeC4rNwvFR8hNaPn3zkbPwStv7TJZntp3LR8h**  accounts_delta:

^ is that the time it took to make a block?
244ms

and for the other three: 227, 210, 207ms

Then I find where it went wrong: 28425584 - 28425587 (first block good, last three skipped)

I become leader at 28425585 which is in the log file BEFORE I become leader AGAIN at 28425584 later in the log file – in between all that a different validator says it’s no longer the leader because the slot is 28425582 (what happens to 28425583?)… next leader takes over at 28425589 (28425588?)… so confused.

The 1 good, 3 bad pattern is happening to me a lot… somewhere in this data is the reason, but I am struggling to understand what is happening:

[2020-08-14T06:59:42.746946625Z INFO  solana_core::replay_stage] new fork:28425585 parent:28425581 (leader) root:28425550
[2020-08-14T06:59:42.746952576Z INFO  solana_metrics::metrics] datapoint: replay_stage-new_leader slot=28425585i leader="MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx"
[2020-08-14T06:59:42.746991850Z INFO  solana_metrics::metrics] datapoint: replay_stage-my_leader_slot slot=28425585i
[2020-08-14T06:59:42.747204101Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=28425585i block_height=27227072i
[2020-08-14T06:59:42.747927599Z INFO  solana_core::replay_stage] LEADER CHANGE at slot: 28425585 leader: MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx. I am now the leader
[2020-08-14T06:59:42.748018661Z INFO  solana_ledger::blockstore] slot 28425582 is full, last: 130
[2020-08-14T06:59:42.749855022Z INFO  solana_metrics::metrics] datapoint: cluster_info-num_nodes live_count=161i broadcast_count=162i
[2020-08-14T06:59:42.839307302Z ERROR solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-error_count", "counts": 1440229, "samples": 1439000,  "now": 1597388382839, "events": 1}
[2020-08-14T06:59:42.839894442Z INFO  solana_metrics::counter] COUNTER:{"name": "banking_stage-dropped_tx_before_forwarding", "counts": 58, "samples": 592000,  "now": 1597388382839, "events": 0}
[2020-08-14T06:59:42.847576511Z INFO  solana_metrics::counter] COUNTER:{"name": "bank-process_transactions", "counts": 14029989, "samples": 13488000,  "now": 1597388382847, "events": 1}
[2020-08-14T06:59:42.848007225Z ERROR solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-error-blockhash_not_found", "counts": 502163, "samples": 502000,  "now": 1597388382848, "events": 1}
[2020-08-14T06:59:42.854683595Z INFO  solana_core::replay_stage] bank frozen: 28425582
[2020-08-14T06:59:42.855231751Z INFO  solana_metrics::metrics] datapoint: replay-slot-stats slot=28425582i fetch_entries_time=4587i fetch_entries_fail_time=0i entry_poh_verification_time=42669i entry_transaction_verification_time=2040i replay_time=11521i replay_total_elapsed=667960i total_entries=192i total_shreds=131i
[2020-08-14T06:59:42.855245988Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-08-14T06:59:42.857291634Z INFO  solana_runtime::bank] bank frozen: 28425582 hash: DTw2QALunEbaKM6MViE4598i3oz7PihqyHsi8hzyRkcV accounts_delta: DMm7DWhNEhj7nTJqaX74N4iaAkofqDJZYU5GSTuvJGtc signature_count: 128 last_blockhash: 26U3S5zZ1SvdJUyfFT6fWT1UToFGX8qqdnM1VLHeb9JW capitalization: 488623339413989272
[2020-08-14T06:59:42.857304778Z INFO  solana_runtime::bank] accounts hash slot: 28425582 stats: BankHashStats { num_updated_accounts: 261, num_removed_accounts: 0, num_lamports_stored: 50844019129828, total_data_len: 631478, num_executable_accounts: 0 }
[2020-08-14T06:59:42.859864546Z INFO  solana_metrics::metrics] datapoint: tower-observed slot=28425581i root=28425550i
[2020-08-14T06:59:42.861896657Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx slot_weight: 28425582 1432593496718767959750456880 154854895580100586833861699731920 28425581
[2020-08-14T06:59:42.861899663Z INFO  solana_metrics::metrics] datapoint: best_slot slot=28425582i best_slot=28425582i
[2020-08-14T06:59:42.861923097Z INFO  solana_metrics::metrics] datapoint: bank_weight slot=28425582i weight="4A10379125168504F3BB630"
[2020-08-14T06:59:42.861914631Z INFO  solana_core::replay_stage] validator fork confirmed 28425581 1236ms
[2020-08-14T06:59:42.861942754Z INFO  solana_core::replay_stage] voting: 28425582 1432593496718767959750456880
[2020-08-14T06:59:42.861947433Z INFO  solana_core::replay_stage] LEADER CHANGE at slot: 28425582 leader: Dokia75SVtetShgapUBoVFfYjL99fQyr1twxKKyTZKa3. I am no longer the leader
[2020-08-14T06:59:42.862578917Z INFO  solana_metrics::metrics] datapoint: validator-confirmation duration_ms=1236i
[2020-08-14T06:59:42.862582934Z INFO  solana_metrics::metrics] datapoint: tower-vote latest=28425582i root=28425551i
[2020-08-14T06:59:42.862684216Z INFO  solana_core::replay_stage] new root 28425551
[2020-08-14T06:59:42.863211713Z INFO  solana_core::replay_stage] vote bank: Some((28425582, NoSwitch)) reset bank: 28425582
[2020-08-14T06:59:42.863227212Z INFO  solana_metrics::metrics] datapoint: blocks_produced num_blocks_on_fork=109251i num_dropped_blocks_on_fork=3319i
[2020-08-14T06:59:42.863262839Z INFO  solana_core::poh_recorder] reset poh from: E8pDmA9uwfqyNpWQ76PLuGRTpT9y6iKbCjh25fJnu8b7,1819237480,28425581 to: 26U3S5zZ1SvdJUyfFT6fWT1UToFGX8qqdnM1VLHeb9JW,28425582
[2020-08-14T06:59:42.863478397Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx reset PoH to tick 1819237312 (within slot 28425582). My next leader slot is 28425584
[2020-08-14T06:59:42.863940009Z INFO  solana_core::replay_stage] new fork:28425583 parent:28425582 root:28425551
[2020-08-14T06:59:42.864063222Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=28425583i block_height=27227073i
[2020-08-14T06:59:42.866564129Z INFO  solana_metrics::metrics] datapoint: block-commitment-cache aggregate-commitment-ms=2i
[2020-08-14T06:59:42.943654024Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1605017i elapsed_ms=438i
[2020-08-14T06:59:42.956427406Z INFO  solana_metrics::metrics] datapoint: shred_fetch_tvu_forwards index_overrun=0i shred_count=390i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=0i
[2020-08-14T06:59:43.286447022Z INFO  solana_metrics::metrics] datapoint: shred_fetch_repair index_overrun=0i shred_count=8i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=0i
[2020-08-14T06:59:43.323601726Z WARN  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx already have bank in forks at 28425585?
[2020-08-14T06:59:43.323647663Z INFO  solana_metrics::metrics] datapoint: replay-loop-timing-stats total_elapsed_us=1024000i compute_bank_stats_elapsed=4705i select_vote_and_reset_forks_elapsed=8i start_leader_elapsed=1088i reset_bank_elapsed=273i voting_elapsed=1264i select_forks_elapsed=3i compute_slot_stats_elapsed=25i generate_new_bank_forks_elapsed=5648i replay_active_banks_elapsed=106509i reset_duplicate_slots_elapsed=3057i
[2020-08-14T06:59:43.358545993Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=414i
[2020-08-14T06:59:43.396894956Z INFO  solana_ledger::blockstore] slot 28425583 is full, last: 91
[2020-08-14T06:59:43.410432001Z INFO  solana_metrics::counter] COUNTER:{"name": "poh_recorder-tick_lock_contention", "counts": 30188, "samples": 15512000,  "now": 1597388383410, "events": 0}
[2020-08-14T06:59:43.410450767Z INFO  solana_metrics::counter] COUNTER:{"name": "poh_recorder-tick_overhead", "counts": 688975, "samples": 15512000,  "now": 1597388383410, "events": 0}
[2020-08-14T06:59:43.432354897Z INFO  solana_core::replay_stage] bank frozen: 28425583
[2020-08-14T06:59:43.432362161Z INFO  solana_metrics::metrics] datapoint: replay-slot-stats slot=28425583i fetch_entries_time=4356i fetch_entries_fail_time=0i entry_poh_verification_time=51235i entry_transaction_verification_time=1513i replay_time=5817i replay_total_elapsed=567777i total_entries=95i total_shreds=92i
[2020-08-14T06:59:43.432407597Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-08-14T06:59:43.435693056Z INFO  solana_runtime::bank] bank frozen: 28425583 hash: FG4CiNbzK9FkuGz8nSRmj3mhotRMqmHyA7cbqtRXUU7P accounts_delta: BvsP4WBtJoVWB3EzpUnoCNMGhvLtAKjVFSfpmo81qCDH signature_count: 31 last_blockhash: B2pagzAGNjFnSZMKj6nxBRwGEoQ4wS16L7vFP7Rdw51o capitalization: 488623339413834272
[2020-08-14T06:59:43.435713815Z INFO  solana_runtime::bank] accounts hash slot: 28425583 stats: BankHashStats { num_updated_accounts: 68, num_removed_accounts: 0, num_lamports_stored: 12571705848723, total_data_len: 273302, num_executable_accounts: 0 }
[2020-08-14T06:59:43.439087893Z INFO  solana_metrics::metrics] datapoint: tower-observed slot=28425581i root=28425550i
[2020-08-14T06:59:43.442033410Z INFO  solana_metrics::metrics] datapoint: best_slot slot=28425583i best_slot=28425583i
[2020-08-14T06:59:43.442038400Z INFO  solana_metrics::metrics] datapoint: bank_weight slot=28425583i weight="4A0201968F652904F3BB630"
[2020-08-14T06:59:43.442036175Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx slot_weight: 28425583 1431519754883098018211149360 154856327099855469931879910881280 28425582
[2020-08-14T06:59:43.442059880Z INFO  solana_core::replay_stage] voting: 28425583 1431519754883098018211149360
[2020-08-14T06:59:43.442187792Z INFO  solana_metrics::metrics] datapoint: tower-vote latest=28425583i root=28425552i
[2020-08-14T06:59:43.442625960Z INFO  solana_core::replay_stage] new root 28425552
[2020-08-14T06:59:43.445124352Z INFO  solana_metrics::metrics] datapoint: block-commitment-cache aggregate-commitment-ms=2i
[2020-08-14T06:59:43.475338238Z INFO  solana_core::replay_stage] vote bank: Some((28425583, NoSwitch)) reset bank: 28425583
[2020-08-14T06:59:43.475356993Z INFO  solana_metrics::metrics] datapoint: blocks_produced num_blocks_on_fork=109252i num_dropped_blocks_on_fork=3319i
[2020-08-14T06:59:43.475386779Z INFO  solana_core::poh_recorder] reset poh from: AcDRDRpZbxADEYLTNBZsxMfAh9vQF7fiZwAodxbDkV9s,1819237498,28425582 to: B2pagzAGNjFnSZMKj6nxBRwGEoQ4wS16L7vFP7Rdw51o,28425583
[2020-08-14T06:59:43.475399874Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx reset PoH to tick 1819237376 (within slot 28425583). My next leader slot is 28425584
[2020-08-14T06:59:43.475414261Z INFO  solana_core::replay_stage] new fork:28425584 parent:28425583 (leader) root:28425552
[2020-08-14T06:59:43.475417648Z INFO  solana_metrics::metrics] datapoint: replay_stage-new_leader slot=28425584i leader="MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx"
[2020-08-14T06:59:43.475424972Z INFO  solana_metrics::metrics] datapoint: replay_stage-my_leader_slot slot=28425584i
[2020-08-14T06:59:43.475529680Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=28425584i block_height=27227074i
[2020-08-14T06:59:43.475986222Z INFO  solana_core::replay_stage] LEADER CHANGE at slot: 28425584 leader: MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx. I am now the leader
[2020-08-14T06:59:43.477501467Z INFO  solana_metrics::metrics] datapoint: broadcast-process-shreds-stats slot=28425585i shredding_time=13765i receive_time=35267i num_data_shreds=67i slot_broadcast_time=5i
[2020-08-14T06:59:43.479371812Z INFO  solana_ledger::blockstore] slot 28425585 is full, last: 67
[2020-08-14T06:59:43.479425614Z INFO  solana_metrics::metrics] datapoint: broadcast-insert-shreds-stats slot=28425585i end_to_end_elapsed=731515i insert_shreds_elapsed=5264i num_shreds=135i
[2020-08-14T06:59:43.482135676Z INFO  solana_metrics::metrics] datapoint: broadcast-transmit-shreds-stats slot=28425585i end_to_end_elapsed=734226i transmit_elapsed=2440i send_mmsg_elapsed=1047i get_peers_elapsed=37131i num_shreds=135i shred_select=397i
[2020-08-14T06:59:43.488418821Z INFO  solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-txs", "counts": 12589378, "samples": 12940000,  "now": 1597388383488, "events": 1}
[2020-08-14T06:59:43.488430763Z INFO  solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-sigs", "counts": 13482782, "samples": 12940000,  "now": 1597388383488, "events": 1}
[2020-08-14T06:59:43.545993071Z INFO  solana_metrics::counter] COUNTER:{"name": "banking_stage-dropped_tx_before_forwarding", "counts": 58, "samples": 593000,  "now": 1597388383545, "events": 0}
[2020-08-14T06:59:43.547055499Z ERROR solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-error-duplicate_signature", "counts": 42001, "samples": 42000,  "now": 1597388383547, "events": 1}
[2020-08-14T06:59:43.547237363Z ERROR solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-error_count", "counts": 1441229, "samples": 1440000,  "now": 1597388383547, "events": 1}
[2020-08-14T06:59:43.551727738Z INFO  solana_metrics::counter] COUNTER:{"name": "bank-process_transactions", "counts": 14030992, "samples": 13489000,  "now": 1597388383551, "events": 1}
[2020-08-14T06:59:43.567135190Z INFO  solana_metrics::counter] COUNTER:{"name": "broadcast_service-entries_received", "counts": 72928, "samples": 44000,  "now": 1597388383567, "events": 2}
[2020-08-14T06:59:43.728938327Z INFO  solana_core::poh_recorder] poh_record: max_tick_height 1819237440 reached, clearing working_bank 28425584
[2020-08-14T06:59:43.729559171Z INFO  solana_core::replay_stage] bank frozen: 28425584
[2020-08-14T06:59:43.729566234Z INFO  solana_metrics::metrics] datapoint: replay-slot-stats slot=28425584i fetch_entries_time=0i fetch_entries_fail_time=0i entry_poh_verification_time=0i entry_transaction_verification_time=0i replay_time=0i replay_total_elapsed=253232i total_entries=0i total_shreds=0i
[2020-08-14T06:59:43.729597944Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-08-14T06:59:43.730213878Z INFO  solana_ledger::blockstore] slot 28425584 is full, last: 174
[2020-08-14T06:59:43.730272188Z INFO  solana_metrics::metrics] datapoint: broadcast-process-shreds-stats slot=28425584i shredding_time=9691i receive_time=83356i num_data_shreds=175i slot_broadcast_time=252767i
[2020-08-14T06:59:43.730360766Z INFO  solana_metrics::metrics] datapoint: broadcast-insert-shreds-stats slot=28425584i end_to_end_elapsed=252860i insert_shreds_elapsed=17473i num_shreds=350i
[2020-08-14T06:59:43.730562547Z INFO  solana_metrics::metrics] datapoint: broadcast-transmit-shreds-stats slot=28425584i end_to_end_elapsed=253063i transmit_elapsed=5951i send_mmsg_elapsed=2217i get_peers_elapsed=95796i num_shreds=350i shred_select=1128i
[2020-08-14T06:59:43.731314589Z INFO  solana_runtime::bank] bank frozen: 28425584 hash: 3aRm9DtWTGXeUip4CeTEd5bRTucsPCmEvQJzAXiHMza8 accounts_delta: 14MXriEmmRJjqwhtRmiBfDVnaG5XEg5gyqHdhziT8r9i signature_count: 216 last_blockhash: 8Y5qRWRWkw6nTJUUJ9Rr8BEjkKM3ZNZBUx1S3xEn4SWw capitalization: 488623339412754272
[2020-08-14T06:59:43.731334536Z INFO  solana_runtime::bank] accounts hash slot: 28425584 stats: BankHashStats { num_updated_accounts: 411, num_removed_accounts: 0, num_lamports_stored: 85104576665629, total_data_len: 862800, num_executable_accounts: 0 }
[2020-08-14T06:59:43.734095175Z INFO  solana_metrics::metrics] datapoint: tower-observed slot=28425583i root=28425552i
[2020-08-14T06:59:43.736444565Z INFO  solana_metrics::metrics] datapoint: best_slot slot=28425584i best_slot=28425584i
[2020-08-14T06:59:43.736465024Z INFO  solana_metrics::metrics] datapoint: bank_weight slot=28425584i weight="4A0557EC87BA554C61F1030"
[2020-08-14T06:59:43.736442712Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx slot_weight: 28425584 1431771910310840163339538480 154857758871765780772043250419760 28425583
[2020-08-14T06:59:43.736488959Z INFO  solana_core::replay_stage] validator fork confirmed 28425583 871ms
[2020-08-14T06:59:43.736494930Z INFO  solana_core::replay_stage] validator fork confirmed 28425582 1549ms
[2020-08-14T06:59:43.736498557Z INFO  solana_metrics::metrics] datapoint: validator-confirmation duration_ms=871i
[2020-08-14T06:59:43.736505310Z INFO  solana_metrics::metrics] datapoint: validator-confirmation duration_ms=1549i
[2020-08-14T06:59:43.736500751Z INFO  solana_core::replay_stage] voting: 28425584 1431771910310840163339538480
[2020-08-14T06:59:43.736609847Z INFO  solana_metrics::metrics] datapoint: tower-vote latest=28425584i root=28425553i
[2020-08-14T06:59:43.736970128Z INFO  solana_core::replay_stage] new root 28425553
[2020-08-14T06:59:43.739022517Z INFO  solana_metrics::metrics] datapoint: block-commitment-cache aggregate-commitment-ms=2i
[2020-08-14T06:59:43.772044631Z INFO  solana_metrics::metrics] submitting 657 points
[2020-08-14T06:59:43.796379515Z INFO  solana_core::replay_stage] vote bank: Some((28425584, NoSwitch)) reset bank: 28425584
[2020-08-14T06:59:43.796424891Z INFO  solana_core::poh_recorder] reset poh from: 3kcPFzBH3roJx79vcFpzXvEA3VqUCjmyQMmRett6Qxp3,1819237462,28425584 to: 8Y5qRWRWkw6nTJUUJ9Rr8BEjkKM3ZNZBUx1S3xEn4SWw,28425584
[2020-08-14T06:59:43.796438417Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx reset PoH to tick 1819237440 (within slot 28425584). My next leader slot is 28425586
[2020-08-14T06:59:44.101425731Z INFO  solana_core::progress_map] Propagated stats:
                    total staked: 167533368715231080,
                    observed staked: 2366746363473920,
                    vote pubkeys: {2het6nBRLq9LLZER8fqUEk7j5pbLxq2mVGqSse2nS3tf, CAf8jfgqhia5VNrEF4A7Y9VLD3numMq9DVSceq7cPhNY, 7PmWxxiTneGteGxEYvzj5pGDVMQ4nuN9DfUypEXmaA8o, DQ7D6ZRtKbBSxCcAunEkoTzQhCBKLPdzTjPRRnM6wo1f},
                    node_pubkeys: {MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx, 8GLRbAstsabZuZUx73AoyfGi1FRCWSUhRgMugFyofEz7, HzrEstnLfzsijhaD6z5frkSE2vWZEH5EUfn3bU9swo1f},
                    slot: 28425584,
                    epoch: Some(65)
[2020-08-14T06:59:44.202407752Z INFO  solana_core::progress_map] Propagated stats:
                    total staked: 167533368715231080,
                    observed staked: 8378079841980160,
                    vote pubkeys: {G7akZgkcuqr7Vw1bHXqerKeawQ2Uc1WK9VaYq9APWNLE, CAf8jfgqhia5VNrEF4A7Y9VLD3numMq9DVSceq7cPhNY, 6559KMdiUseNSAkRcK9WcFcNTppoj6jWtKVedpMkBYCn, D9iBN61a4MAt49gvaovohF77zqzoyQ5ERVZUoAfub3o2, 2het6nBRLq9LLZER8fqUEk7j5pbLxq2mVGqSse2nS3tf, 7PmWxxiTneGteGxEYvzj5pGDVMQ4nuN9DfUypEXmaA8o, DQ7D6ZRtKbBSxCcAunEkoTzQhCBKLPdzTjPRRnM6wo1f, FqUfeiVYd1LMNSDw1t9WE6fAXAKzjUVHf8YzZXWQjSMd, beefKGBWeSpHzYBHZXwp5So7wdQGX6mu4ZHCsH3uTar, 4ScTBfR4v2Lw9hWuzayJjfZdBKQVUeg7Rm92Rsq7HZkV, 9EJXJNzaXnycsMr4ZyiC5WUWYPkGNoSSYTGJLYfwnccK},
                    node_pubkeys: {MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx, 8GLRbAstsabZuZUx73AoyfGi1FRCWSUhRgMugFyofEz7, HzrEstnLfzsijhaD6z5frkSE2vWZEH5EUfn3bU9swo1f},
                    slot: 28425584,
                    epoch: Some(65)
[2020-08-14T06:59:44.603747177Z INFO  solana_metrics::counter] COUNTER:{"name": "gossip_filter_crds_values-dropped_requests", "counts": 59038, "samples": 23168000,  "now": 1597388384603, "events": 0}
[2020-08-14T06:59:45.053134647Z INFO  solana_metrics::metrics] datapoint: blocks_produced num_blocks_on_fork=109253i num_dropped_blocks_on_fork=3319i
[2020-08-14T06:59:45.053191845Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1612219i elapsed_ms=462i
[2020-08-14T06:59:45.053252850Z INFO  solana_metrics::metrics] datapoint: cluster_slots_service-timing lowest_slot_elapsed=10460i update_completed_slots_elapsed=578050i
[2020-08-14T06:59:45.053263891Z INFO  solana_metrics::metrics] datapoint: replay_stage-new_leader slot=28425586i leader="MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx"
[2020-08-14T06:59:45.053266296Z INFO  solana_metrics::metrics] datapoint: replay_stage-skip_leader_slot slot=28425586i parent_slot=28425584i latest_unconfirmed_leader_slot=28425584i
[2020-08-14T06:59:45.053269261Z INFO  solana_metrics::metrics] datapoint: replay_stage-retransmit slot=28425584i
[2020-08-14T06:59:45.053271455Z INFO  solana_metrics::metrics] datapoint: cluster_info-num_nodes live_count=161i broadcast_count=162i
[2020-08-14T06:59:45.053288237Z INFO  solana_metrics::metrics] datapoint: replay_stage-new_leader slot=28425587i leader="MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx"
[2020-08-14T06:59:45.053290311Z INFO  solana_metrics::metrics] datapoint: replay_stage-skip_leader_slot slot=28425587i parent_slot=28425584i latest_unconfirmed_leader_slot=28425584i
[2020-08-14T06:59:45.053294028Z INFO  solana_metrics::metrics] datapoint: cluster_info_stats entrypoint=0i entrypoint2=0i push_vote_read=55i vote_process_push=38i get_votes=4651i get_accounts_hash=0i all_tvu_peers=0i tvu_peers=52911i new_push_requests_num=1465i table_size=52677i purged_values_size=42616i
[2020-08-14T06:59:45.053296974Z INFO  solana_metrics::metrics] datapoint: cluster_info_stats2 retransmit_peers=3093i repair_peers=5389i new_push_requests=4084i new_push_requests2=1i purge=87404i process_pull_resp=23133i filter_pull_resp=6810i process_pull_resp_count=14i pull_response_fail_insert=758i pull_response_fail_timeout=15i pull_response_success=491i process_pull_resp_timeout=15i push_response_count=3097i
[2020-08-14T06:59:45.053300079Z INFO  solana_metrics::metrics] datapoint: cluster_info_stats3 process_pull_resp_len=1269i process_pull_requests=32789i generate_pull_responses=732607i process_prune=100i process_push_message=14740i prune_received_cache=3561i epoch_slots_lookup=10i epoch_slots_push=40i push_message=0i new_pull_requests=531302i mark_pull_request=2i
[2020-08-14T06:59:45.053302494Z INFO  solana_metrics::metrics] datapoint: cluster_info_stats4 skip_push_message_shred_version=0i skip_pull_response_shred_version=0i skip_pull_shred_version=0i push_message_count=1557i push_message_value_count=1897i new_pull_requests_count=1280i prune_message_count=1293i prune_message_len=1356i
[2020-08-14T06:59:45.053305900Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=399i
[2020-08-14T06:59:45.053319767Z INFO  solana_metrics::metrics] datapoint: replay_stage-new_leader slot=28425587i leader="MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx"
[2020-08-14T06:59:45.053340045Z INFO  solana_metrics::metrics] datapoint: replay-loop-timing-stats total_elapsed_us=1081000i compute_bank_stats_elapsed=11243i select_vote_and_reset_forks_elapsed=22i start_leader_elapsed=712i reset_bank_elapsed=134i voting_elapsed=93132i select_forks_elapsed=0i compute_slot_stats_elapsed=13i generate_new_bank_forks_elapsed=6802i replay_active_banks_elapsed=13476i reset_duplicate_slots_elapsed=3913i
[2020-08-14T06:59:45.053356977Z INFO  solana_metrics::metrics] datapoint: recv-window-insert-shreds num_shreds=836i total_elapsed=60415i insert_lock_elapsed=0i insert_shreds_elapsed=22057i shred_recovery_elapsed=24135i chaining_elapsed=74i commit_working_sets_elapsed=1367i write_batch_elapsed=8988i num_inserted=233i num_recovered=184i num_recovered_inserted=184i num_recovered_failed_sig=0i num_recovered_failed_invalid=0i num_recovered_exists=0i
[2020-08-14T06:59:45.053381163Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=385i
[2020-08-14T06:59:45.053464600Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=384i
[2020-08-14T06:59:45.127037869Z INFO  solana_core::repair_service] repair_stats: RepairStats { shred: RepairStatsGroup { count: 20, min: 0, max: 28425583, slot_pubkeys: {28425582: SlotRepairs { highest_shred_index: 125, pubkey_repairs: {FopTvQaGp6K5FadWKZtsLJmrX7gnNGFS2fQ7rv5KHyE1: 1, Ev8D9dwYdfebkdLgAjwiJtCkqS882Uvrit5qN6NTeHMy: 1, GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ: 2, ArmanP3SBD1DVSoKCqK7d6S2kLpyncYey2Ybf3o5XkTn: 1, Awes4Tr6TX8JDzEhCZY2QVNimT6iD1zWHzf1vNyGvpLM: 1, 6yf57R7U1J1VXszE7CobdYEWQNJMPRfWEgGfaRsVNk32: 1, CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S: 1} }, 28425583: SlotRepairs { highest_shred_index: 90, pubkey_repairs: {Dokia75SVtetShgapUBoVFfYjL99fQyr1twxKKyTZKa3: 1, 6yf57R7U1J1VXszE7CobdYEWQNJMPRfWEgGfaRsVNk32: 2, FopTvQaGp6K5FadWKZtsLJmrX7gnNGFS2fQ7rv5KHyE1: 1, HzrEstnLfzsijhaD6z5frkSE2vWZEH5EUfn3bU9swo1f: 1, FGiEdzde7Fco2WLpNQMat299hUVoykJdaA5hxdmCzHiS: 1, ateamuvZX4iy2xYrNrMyGgwtTpWFiurNzHpmJMZwjar: 1, superCMS6AucZe9aykaks7kUAj3oqB52yMMV81A8exa: 1, 2NJZ1Ajcwtc7hZdVXTXrh2SAiAXnFkVm6MWcGjBZfPkS: 1, 6TkKqq15wXjqEjNg9zqTKADwuVATR9dW3rkNnsYme1ea: 1, CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S: 1, AyYqAhyCCxRrNGbm3dY4aGY9SyaQC9UvPTSHvMQK4YW2: 1} }} }, highest_shred: RepairStatsGroup { count: 10, min: 0, max: 28425585, slot_pubkeys: {28425584: SlotRepairs { highest_shred_index: 172, pubkey_repairs: {AmhQFcGvH2hjkucP78rn6GMKSbstYwyFpCDVKZUwBGrG: 1, XkCriyrNwS3G4rzAXtG5B1nnvb5Ka1JtCku93VqeKAr: 1, Dokia75SVtetShgapUBoVFfYjL99fQyr1twxKKyTZKa3: 1} }, 28425585: SlotRepairs { highest_shred_index: 67, pubkey_repairs: {D6pUrfgc5ZyXSfgtCBYozydRSz92pse1S7AZP58muEYk: 1, FopTvQaGp6K5FadWKZtsLJmrX7gnNGFS2fQ7rv5KHyE1: 2, GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ: 1, 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2: 1, ChorusmmK7i1AxXeiTtQgQZhQNiXYU84ULeaYF1EH15n: 1, Dokia75SVtetShgapUBoVFfYjL99fQyr1twxKKyTZKa3: 1} }} }, orphan: RepairStatsGroup { count: 0, min: 0, max: 0, slot_pubkeys: {} }, get_best_orphans_us: 0, get_best_shreds_us: 0 }
[2020-08-14T06:59:45.127131165Z INFO  solana_metrics::metrics] datapoint: serve_repair-repair repair-total=30i shred-count=20i highest-shred-count=10i orphan-count=0i repair-highest-slot=28425585i repair-orphan=0i
[2020-08-14T06:59:45.127161522Z INFO  solana_metrics::metrics] datapoint: serve_repair-repair-timing set-root-elapsed=20i get-votes-elapsed=622i add-votes-elapsed=806i get-best-orphans-elapsed=844i get-best-shreds-elapsed=5571i send-repairs-elapsed=6383i
[2020-08-14T06:59:45.376415216Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=385i
[2020-08-14T06:59:45.414781541Z INFO  solana_metrics::metrics] datapoint: replay-loop-timing-stats total_elapsed_us=1010000i compute_bank_stats_elapsed=500i select_vote_and_reset_forks_elapsed=0i start_leader_elapsed=15i reset_bank_elapsed=0i voting_elapsed=0i select_forks_elapsed=1i compute_slot_stats_elapsed=0i generate_new_bank_forks_elapsed=5599i replay_active_banks_elapsed=117i reset_duplicate_slots_elapsed=3217i
[2020-08-14T06:59:45.761445085Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=385i
[2020-08-14T06:59:45.764767535Z INFO  solana_metrics::metrics] datapoint: shred_fetch_tvu_forwards index_overrun=0i shred_count=118i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=0i
[2020-08-14T06:59:45.765748980Z INFO  solana_core::retransmit_stage] retransmit: packets_by_slot: {28425582: 46, 28425583: 184, 28425589: 1}
[2020-08-14T06:59:45.765750253Z INFO  solana_metrics::metrics] datapoint: retransmit-num_nodes count=162i
[2020-08-14T06:59:45.765771753Z INFO  solana_metrics::metrics] datapoint: retransmit-stage total_time=9374i epoch_fetch=1i epoch_cache_update=3932i total_batches=233i total_packets=231i retransmit_total=615i compute_turbine=3323i repair_total=5i discard_total=0i
[2020-08-14T06:59:45.765766163Z INFO  solana_core::retransmit_stage] retransmit: packets_by_source: {"100.14.247.51:8005": 1, "116.202.156.45:8005": 1, "116.202.157.227:8006": 5, "116.202.162.222:8006": 2, "116.202.216.248:8006": 1, "116.202.218.133:8005": 1, "122.11.202.58:8005": 1, "135.181.0.102:8006": 1, "135.181.0.125:8006": 1, "138.201.135.79:8006": 1, "148.251.128.18:8006": 1, "150.238.33.99:8005": 1, "159.69.108.139:8005": 2, "159.69.146.87:8005": 1, "159.69.65.213:8006": 2, "161.97.84.165:8005": 1, "161.97.87.176:8005": 1, "164.68.126.249:8006": 1, "168.119.4.112:8005": 1, "168.119.7.198:8006": 3, "173.212.246.159:8006": 1, "173.249.58.103:8005": 1, "182.239.50.26:8006": 4, "185.181.103.135:8006": 1, "193.105.188.247:8005": 3, "207.180.250.118:8005": 1, "213.239.195.116:8006": 1, "3.216.114.121:8014": 1, "3.8.192.183:8005": 1, "3.9.225.154:8005": 2, "34.64.164.198:8006": 21, "34.74.179.116:8006": 22, "34.90.36.12:8006": 20, "35.196.121.36:11005": 4, "35.208.180.241:8006": 6, "35.233.222.140:8006": 20, "49.12.129.100:8006": 1, "49.12.129.101:8006": 2, "52.117.55.150:8005": 1, "52.77.172.46:8006": 15, "62.171.146.60:8006": 2, "62.171.146.61:8006": 2, "62.171.165.37:8006": 2, "64.225.12.70:8006": 1, "88.198.16.9:8005": 5, "88.99.27.62:8005": 3, "89.202.64.26:8006": 4, "89.33.8.229:8009": 1, "89.42.234.61:8006": 1, "94.130.141.43:8006": 1, "94.130.143.154:8006": 1, "95.168.167.90:8006": 4, "95.216.101.221:8006": 1, "95.216.11.85:8006": 1, "95.216.226.248:8006": 3, "95.216.242.101:8006": 1, "95.216.26.123:8006": 1, "95.216.26.36:8006": 1, "95.216.41.37:8006": 2, "95.216.44.229:8006": 1, "95.216.70.112:8005": 17, "95.217.104.37:8006": 2, "95.217.121.26:8006": 1, "95.217.121.59:8006": 3, "95.217.197.227:9005": 1, "95.217.202.125:8006": 1, "95.217.202.32:8006": 1, "95.217.202.57:8006": 1, "95.217.204.16:8005": 1, "95.217.224.244:8006": 1, "95.217.228.216:8006": 2, "95.217.228.217:8006": 2, "95.217.62.175:8006": 2, "95.217.90.94:8006": 1}
[2020-08-14T06:59:45.817558503Z INFO  solana_core::replay_stage] new fork:28425589 parent:28425584 root:28425553
[2020-08-14T06:59:45.817786333Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=28425589i block_height=27227075i
[2020-08-14T06:59:46.069681831Z INFO  solana_metrics::counter] COUNTER:{"name": "vote-native", "counts": 13480001, "samples": 13480000,  "now": 1597388386069, "events": 1}
[2020-08-14T06:59:46.152993515Z INFO  solana_ledger::blockstore] slot 28425589 is full, last: 88
[2020-08-14T06:59:46.181745197Z INFO  solana_metrics::metrics] datapoint: poh-service ticks=128i hashes=1599872i elapsed_ms=420i
[2020-08-14T06:59:46.182326145Z INFO  solana_core::replay_stage] bank frozen: 28425589
[2020-08-14T06:59:46.182332136Z INFO  solana_metrics::metrics] datapoint: replay-slot-stats slot=28425589i fetch_entries_time=1156i fetch_entries_fail_time=0i entry_poh_verification_time=57324i entry_transaction_verification_time=1318i replay_time=1597i replay_total_elapsed=364048i total_entries=343i total_shreds=89i
[2020-08-14T06:59:46.182364026Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-08-14T06:59:46.184960544Z INFO  solana_runtime::bank] bank frozen: 28425589 hash: J9hn61py3d8BSc6G61jTq3i769hM7guYDfhYo9De5nYL accounts_delta: 58EMGzBT7Q999rJhvyiMk6X1uoW8e7vsZjZoDsBicWMz signature_count: 23 last_blockhash: 14Sm5RzrFqyYzS6p6PxVTReDXjipKJX5fP8xRG72xVrD capitalization: 488623339412639272
[2020-08-14T06:59:46.184979580Z INFO  solana_runtime::bank] accounts hash slot: 28425589 stats: BankHashStats { num_updated_accounts: 30, num_removed_accounts: 0, num_lamports_stored: 8338147039585, total_data_len: 161372, num_executable_accounts: 0 }
[2020-08-14T06:59:46.188277443Z INFO  solana_metrics::metrics] datapoint: tower-observed slot=28425583i root=28425552i
[2020-08-14T06:59:46.190393934Z INFO  solana_metrics::metrics] datapoint: best_slot slot=28425589i best_slot=28425589i
[2020-08-14T06:59:46.190393603Z INFO  solana_core::replay_stage] MCFmmmXdzTKjBEoMggi8JGFJmd856uYSowuH2sCU5kx slot_weight: 28425589 1079075531088112954146748560 154858837947296868884997397168320 28425584
[2020-08-14T06:59:46.190400637Z INFO  solana_metrics::metrics] datapoint: bank_weight slot=28425589i weight="37C9721C10BA771D852F890"
[2020-08-14T06:59:46.190416837Z INFO  solana_core::replay_stage] voting: 28425589 1079075531088112954146748560
[2020-08-14T06:59:46.190421687Z INFO  solana_core::replay_stage] LEADER CHANGE at slot: 28425589 leader: CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S. I am no longer the leader

Hi, just letting you know that I am watching this thread. I will do some experimenting and see if I can help in any way.

1 Like

I’d still love to understand the logs… but just to let you know, I’ve found a problem with my NVMe disk, it “stalls” intermittently, so most likely that is the issue… I have a new one coming tomorrow. :crossed_fingers:

I opened this issue regarding InvalidTickCount errors forcing the next leader to skip slots https://github.com/solana-labs/solana/issues/11653. There is a PR in the works to fix it.

Check your logs to see how many InvalidTickCount errors are coming from your node. These errors indicate bad forks & skipped slots – there might be something to learn there.

1 Like

Changing disk didn’t help skips – every single skip I have in the last 24 hours is; making the first block, skipping the next 3. I guess we’ll see if that change helps. I’m sorely tempted to dump the rig and move to GCP, it’s more expensive, but i don’t care at this point.

Are all of those skips related to the InvalidTickCount error? Do you see different errors? That pattern suggests that your node is bailing-out on the current fork to switch to a heavier fork. When you switch, you also dump/skip the remaining slots in that turn. We need to figure out why you are creating the new forks in the first place…

1 Like