[1 / 5] Optimize logic for gossiping assignments #4848
+51
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This part of the work to further optimize the approval subsystems, if you want to understand the full context start with reading #4849 (comment), however that's not necessary, as this change is self-contained and nodes would benefit from it regardless of subsequent changes landing or not.
While testing with 1000 validators found out that the logic for determining the validators and assignment should be gossiped to is taking a lot of time, because it always iterated through all the peers(1000), to determine which are X and Y neighbours and to which we should randomly gossip(4 samples).
This could be actually optimised, so we don't have to iterate through all 1000 peers for each new assignment, by fetching the list of X and Y peer ids from the topology first and then stopping the loop once we took the 4 random samples.
With this improvements we reduce the total CPU time spent in approval-distribution with 15% on networks with 500 validators and 20% on networks with 1000 validators.
Test coverage:
propagates_assignments_along_unshared_dimension
andpropagates_locally_generated_assignment_to_both_dimensions
cover already logic and they passed, confirm that there is no breaking change.Additionally, the approval voting benchmark measure the traffic sent to other peers, so I confirmed that for various network size there is no difference in the size of the traffic sent to other peers.