Collation fetching fairness #4880

tdimitrov · 2024-06-26T07:13:31Z

Related to #1797

When fetching collations in collator protocol/validator side we need to ensure that each parachain has got a fair core time share depending on its assignments in the claim queue. This means that the number of collations fetched per parachain should ideally be equal to (but definitely not bigger than) the number of claims for the particular parachain in the claim queue.

The current implementation doesn't guarantee such fairness. For each relay parent there is a waiting_queue (PerRelayParent -> Collations -> waiting_queue) which holds any unfetched collations advertised to the validator. The collations are fetched on first in first out principle which means that if two parachains share a core and one of the parachains is more aggresive it might starve the second parachain. How? At each relay parent up to max_candidate_depth candidates are accepted (enforced in fn is_seconded_limit_reached) so if one of the parachains is quick enough to fill in the queue with its advertisements the validator will never fetch anything from the rest of the parachains despite they are scheduled. This doesn't mean that the aggressive parachain will occupy all the core time (this is guaranteed by the runtime) but it will deny the rest of the parachains sharing the same core to have collations backed.

~~The solution I am proposing extends the checks in is_seconded_limit_reached with an additional check.~~ The solution I am proposing is to limit fetches and advertisements based on the state of the claim queue. At each relay parent the claim queue for the core assigned to the validator is fetched. For each parachain a fetch limit is calculated (equal to the number of entries in the claim queue). Advertisements are not fetched for a parachain which has exceeded its claims in the claim queue. This solves the problem with aggressive parachains advertising too much collations.

The second part is in collation fetching logic. Instead of popping the first entry from the waiting_queue the validator calculates score for each entry there. The score is performed collation fetches for paracahin A at relay parent X / number of entries in claim queue for parachain A at relay parent X. The score will be lower for parachains which has less fetches than expected and 0 for parachains which has no fetches at all. This should provide an ordering based on the urgency of each fetch. If two parachains end up with the same score then the one earlier in the claim queue is preferred.

TODOs:

Fix unit tests
Proper fallback - handle missing claim queue api
Write unit tests for the new logic
Add PR doc

polkadot/node/network/collator-protocol/src/validator_side/collation.rs

tdimitrov · 2024-06-26T07:41:44Z

polkadot/node/network/collator-protocol/src/validator_side/collation.rs

+		if let Some((_, mut lowest_score)) = lowest_score {
+			for claim in claims {
+				if let Some((_, collations)) = lowest_score.iter_mut().find(|(id, _)| *id == claim)
+				{
+					match collations.pop_front() {
+						Some(collation) => return Some(collation),
+						None => {
+							unreachable!("Collation can't be empty!")
+						},
+					}
+				}
+			}
+			unreachable!("All entries in waiting_queue should also be in claim queue")
+		} else {
+			None
+		}


Looking again at this I am a bit uneasy about the unreachables here. I'll try to refactor this to be more reliable.

…al to `allowed_ancestry_len`

tdimitrov · 2024-06-28T09:31:33Z

polkadot/node/network/collator-protocol/src/validator_side/mod.rs

@@ -266,9 +264,6 @@ impl PeerData {
 							let candidates =
 								state.advertisements.entry(on_relay_parent).or_default();

-							if candidates.len() > max_candidate_depth {


This error leads to reporting the peer with COST_UNEXPECTED_MESSAGE. I think we shold relax it to just ignoring the advertisement.

Pros:

with the new logic submitting more elements than scheduled is not such a major offence

old collators won't get punished for not respecting the claim queue

Cons:

we don't punish spammy collators

tdimitrov · 2024-07-01T12:10:40Z

polkadot/node/network/collator-protocol/src/validator_side/collation.rs

+	///
+	/// If prospective parachains mode is not enabled then we fall back to synchronous backing. In
+	/// this case there is a limit of 1 collation per relay parent.
+	pub(super) fn is_collations_limit_reached(


I'm open for better name suggestions

paritytech-cicd-pr · 2024-07-02T10:52:57Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6601866

Collation fetching fairness

f4738dc

tdimitrov added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Jun 26, 2024

tdimitrov commented Jun 26, 2024

View reviewed changes

polkadot/node/network/collator-protocol/src/validator_side/collation.rs Show resolved Hide resolved

Comments

c7074da

tdimitrov commented Jun 26, 2024

View reviewed changes

tdimitrov added 4 commits June 26, 2024 16:39

Fix tests and add some logs

73eee87

Fix per para limit calculation in is_collations_limit_reached

fa321ce

Fix default TestState initialization: claim queue len should be equ…

96392a5

…al to `allowed_ancestry_len`

clippy

0f28aa8

tdimitrov force-pushed the tsv-collator-proto-fairness branch from c7f24aa to 0f28aa8 Compare June 28, 2024 08:19

Update is_collations_limit_reached - remove seconded limit

e5ea548

tdimitrov commented Jun 28, 2024

View reviewed changes

tdimitrov added 2 commits July 1, 2024 13:59

Fix pending fetches and more tests

9abc898

Remove unnecessary clone

c07890b

tdimitrov commented Jul 1, 2024

View reviewed changes

tdimitrov added 4 commits July 1, 2024 15:20

Comments

e50440e

Better var names

42b05c7

Fix pick_a_collation_to_fetch and add more tests

2f5a466

Fix test: collation_fetching_respects_claim_queue

ff96ef9

Add collation_fetching_fallback_works test + comments

e837689

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collation fetching fairness #4880

Collation fetching fairness #4880

tdimitrov commented Jun 26, 2024 •

edited

Loading

tdimitrov Jun 26, 2024

tdimitrov Jun 28, 2024

tdimitrov Jul 1, 2024

paritytech-cicd-pr commented Jul 2, 2024

Collation fetching fairness #4880

Are you sure you want to change the base?

Collation fetching fairness #4880

Conversation

tdimitrov commented Jun 26, 2024 • edited Loading

tdimitrov Jun 26, 2024

Choose a reason for hiding this comment

tdimitrov Jun 28, 2024

Choose a reason for hiding this comment

tdimitrov Jul 1, 2024

Choose a reason for hiding this comment

paritytech-cicd-pr commented Jul 2, 2024

tdimitrov commented Jun 26, 2024 •

edited

Loading