Fix bottleneck `TransactionsManager` #6656

emhane · 2024-02-18T19:03:38Z

By default, concurrency in the TransactionFetcher is not enabled on a 'per peer' level. Hence, the number of inflight requests is limited to the total number of p2p connections the node can have, 130 connections by default. This number is some orders of magnitude smaller than capacities for other collections that implement Stream in the TransactionsManager. Leading from this, is that many hashes received in announcements will be buffered in the cache for hashes pending fetch. This would not necessarily be because there is no idle peer on a network level, rather just not in the context of TransactionsManager, i.e. the FetchEvent stream hasn't been polled yet but many events may be ready. The problem was identified by implementing #6590 and address #6336 in part.

This PR solves the problem by advancing the FuturesUnordered of inflight request and processing responses synchronously on-op. Processing responses is what marks a peer as idle. The operations are:

on new announcement
on trying to fetch hashes pending fetch.
These operations attempt to send new requests to idle peers.

The alternative solution considered, was increasing the concurrency parameter on 'per peer' level, to more than 1 request per peer at a time. Although this would have solved the problem at hand as a side-effect, it would have modified the rate at which GetPooledTransactions requests can be sent to a peer's session. Currently it's "send one request. get one response. send next request.". This is maybe something we want to do in the future, but with stronger intention. The presented solution was also preferred for marking peers as idle in a more deterministic way (easier to reason about).

mattsse

why was everything made pub?

none of these types, fields functions should be pub

This PR solves the problem by advancing the FuturesUnordered of inflight request and processing responses synchronously on-op. Processing responses is what marks a peer as idle.

could you please elaborate on this, I'm not following

crates/net/network/src/budget.rs

crates/net/network/src/transactions/fetcher.rs

emhane · 2024-02-18T21:50:11Z

why was everything made pub?

none of these types, fields functions should be pub

This PR solves the problem by advancing the FuturesUnordered of inflight request and processing responses synchronously on-op. Processing responses is what marks a peer as idle.

could you please elaborate on this, I'm not following

in order to link docs, so that ci would pass. I don't think it hurts anyway. validation should probably be exposed anyway at some point for custom impl on custom networks.

mattsse

I don't understand why the on_ event handlers now all need a context, or rather why we poll active inflight requests when we handle on_new_pooled_transaction_hashes

The fetcher now polls an internal channel that is filled by the fetcher itself, like an internal buffer that doesn't need any polling

imo the fetcher should keep advancing inflight requests when it is polled

emhane · 2024-02-22T11:56:10Z

I don't understand why the on_ event handlers now all need a context, or rather why we poll active inflight requests when we handle on_new_pooled_transaction_hashes

I used noop_context first, but then changed it based on #6656 (comment). think it can be changed back to noop_context since 5cc593e was committed, and I will try make that comment more verbose to explain why noop_conext is ok.

The fetcher now polls an internal channel that is filled by the fetcher itself, like an internal buffer that doesn't need any polling

yes it's an intermediary storage of fetch events. this is needed since we are limited to 130 inflight requests but potentially are trying to queue many many more in each loop in the tx manger future, here for example, one event contains up to 20 k txns.

reth/crates/net/network/src/transactions/mod.rs

Lines 1117 to 1120 in ca98261

    
           if let Poll::Ready(Some(event)) = this.transaction_events.poll_next_unpin(cx) { 
        
               this.on_network_tx_event(event); 
        
               some_ready = true; 
        
           }

please re-read the description of this pr for more detail, or refer to code comments in #6651 for description of orders of magnitude in flow in tx manager future loop.

imo the fetcher should keep advancing inflight requests when it is polled

it does since a commit older than this review 5cc593e

mattsse

This PR solves the problem by advancing the FuturesUnordered of inflight request and processing responses synchronously on-op. Processing responses is what marks a peer as idle.

It's not 100% clear to me what the problem is you're describing.

afai understood this, this introduces response buffering, to free up capacity for outgoing requests.

I think, in theory this makes sense, though the way it's implemented feels a bit obfuscated.

wouldn't we get the same if drain in-progress requests first before handling tx fetching + new incoming messages?

crates/net/network/src/transactions/mod.rs

mattsse

I'd prefer if we lift the drain call from the event handlers to the poll loop.

mattsse · 2024-03-01T16:02:38Z

crates/net/network/src/transactions/fetcher.rs

+    pub(super) fetch_events_head: UnboundedMeteredReceiver<FetchEvent>,
    /// Handle for queueing [`FetchEvent`]s as a result of advancing inflight requests.
-    pub fetch_events_tail: UnboundedMeteredSender<FetchEvent>,
+    pub(super) fetch_events_tail: UnboundedMeteredSender<FetchEvent>,


there's no need for a sync primitive here if we're using this as a vecdeque, right?

a VecDeque is slightly more heavy since it keeps track of its length and links the queue in two directions. I think channels don't exclusively need to be used for async communication

hmm, vecdeque is a ringbuffer, popping a value is just reading+shifting the head.
channel has a bunch of sync overhead that we don't need because nothing is shared here

emhane · 2024-03-02T14:13:07Z

I'd prefer if we lift the drain call from the event handlers to the poll loop.

then it doesn't address the bottleneck. when node is running on more than one OS thread, sessions can make progress while tx manager future is executing. this bottleneck can also be addressed by not being stuck so long in tx manager future since then so many request attempts won't be made (and not so many hashes will be buffered in hashes pending fetch cache because peers are busy) #6336.

mattsse · 2024-03-02T14:40:10Z

from what I understood is that the bottleneck can be attributed to the order in which poll currently processes things:

because there is no idle peer on a network level, rather just not in the context of TransactionsManager, i.e. the FetchEvent stream hasn't been polled yet but many events may be ready.

and this PR addresses this by draining, buffering, responses that are ready before fetching more txs?
It does so by buffering draining the responses in the event handler functions.

But we get the same if we call this at the top level, because rn this feels a bit obfuscated.

I'd prefer if we lift the drain call from the event handlers to the poll loop.

Though it's still a bit unclear to me why this right fix and not changing the order to

process responses
try fetch

emhane · 2024-03-02T14:59:19Z

from what I understood is that the bottleneck can be attributed to the order in which poll currently processes things:

because there is no idle peer on a network level, rather just not in the context of TransactionsManager, i.e. the FetchEvent stream hasn't been polled yet but many events may be ready.

and this PR addresses this by draining, buffering, responses that are ready before fetching more txs? It does so by buffering draining the responses in the event handler functions.

But we get the same if we call this at the top level, because rn this feels a bit obfuscated.

I'd prefer if we lift the drain call from the event handlers to the poll loop.

Though it's still a bit unclear to me why this right fix and not changing the order to

process responses

try fetch

agreed, this is done in #6590. I'd be fine with closing this pr to address the issue from a higher level in the linked pr.

gakonst · 2024-03-06T01:56:57Z

#6590 is approved, should we close this? @mattsse

emhane added 7 commits February 18, 2024 16:54

Advance inflight requests on-op

e1c35b2

Advance inflight requests when no requests are being queued

ff6bb4a

Only advance inflight requests if no fetch events ready

7ad212d

Clean up and shrink scope

693dbca

Fix conflicts cherry-picking off emhane/prioritisation-network-manager

502f19b

Fix docs by drive-by opening visibility to make tx fetcher extensible

185723b

Drive-by, fix lint

e265894

emhane requested review from mattsse, Rjected and gakonst as code owners February 18, 2024 19:03

emhane added A-networking Related to networking in general C-perf A change motivated by improving speed, memory usage or disk footprint labels Feb 18, 2024

emhane mentioned this pull request Feb 18, 2024

Unblock all Stream and Future impl in net crate #6541

Closed

emhane requested a review from rkrasiuk February 18, 2024 19:17

mattsse requested changes Feb 18, 2024

View reviewed changes

emhane added 3 commits February 18, 2024 21:43

Make docs more verbose

92f5374

Fix lint fix bug

235796b

Pass context as param to cover edge case no boradcast activity

7fc3bf9

emhane requested a review from mattsse February 18, 2024 21:49

emhane requested review from onbjerg and DaniPopes February 18, 2024 21:54

Always advance inflight requests on poll

5cc593e

mattsse requested changes Feb 20, 2024

View reviewed changes

emhane force-pushed the emhane/one-req-per-peer-bottleneck branch from 963012b to 5cc593e Compare February 22, 2024 18:45

emhane added 2 commits February 22, 2024 20:05

Fix merge conflicts

9065482

Fix merge conflicts with emhane/one-req-per-peer-bottleneck

358f3ac

emhane force-pushed the emhane/one-req-per-peer-bottleneck branch from 573f333 to 358f3ac Compare February 22, 2024 19:10

Fix merge conflicts

2aadec6

Fix merge conflicts

842c78d

emhane requested a review from mattsse February 22, 2024 19:43

Merge branch 'main' into emhane/one-req-per-peer-bottleneck

203181f

mattsse reviewed Feb 23, 2024

View reviewed changes

crates/net/network/src/transactions/mod.rs Show resolved Hide resolved

emhane added 3 commits March 1, 2024 00:49

Merge branch 'main' into emhane/one-req-per-peer-bottleneck

9c50ff3

Fix lint

3b21144

Fix lint

de92636

emhane mentioned this pull request Mar 1, 2024

Prioritisation network manager + transactions manager + eth request handler #6590

Merged

emhane requested a review from mattsse March 1, 2024 03:26

emhane mentioned this pull request Mar 1, 2024

Remove accidentally merged fields #6912

Merged

mattsse requested changes Mar 1, 2024

View reviewed changes

emhane requested a review from mattsse March 2, 2024 14:30

emhane closed this Mar 6, 2024

emhane deleted the emhane/one-req-per-peer-bottleneck branch March 6, 2024 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bottleneck `TransactionsManager` #6656

Fix bottleneck `TransactionsManager` #6656

emhane commented Feb 18, 2024 •

edited

Loading

mattsse left a comment

emhane commented Feb 18, 2024 •

edited

Loading

mattsse left a comment •

edited

Loading

emhane commented Feb 22, 2024

mattsse left a comment

mattsse left a comment

mattsse Mar 1, 2024

emhane Mar 2, 2024

mattsse Mar 2, 2024

emhane commented Mar 2, 2024

mattsse commented Mar 2, 2024

emhane commented Mar 2, 2024

gakonst commented Mar 6, 2024

Fix bottleneck TransactionsManager #6656

Fix bottleneck TransactionsManager #6656

Conversation

emhane commented Feb 18, 2024 • edited Loading

mattsse left a comment

Choose a reason for hiding this comment

emhane commented Feb 18, 2024 • edited Loading

mattsse left a comment • edited Loading

Choose a reason for hiding this comment

emhane commented Feb 22, 2024

mattsse left a comment

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment

mattsse Mar 1, 2024

Choose a reason for hiding this comment

emhane Mar 2, 2024

Choose a reason for hiding this comment

mattsse Mar 2, 2024

Choose a reason for hiding this comment

emhane commented Mar 2, 2024

mattsse commented Mar 2, 2024

emhane commented Mar 2, 2024

gakonst commented Mar 6, 2024

Fix bottleneck `TransactionsManager` #6656

Fix bottleneck `TransactionsManager` #6656

emhane commented Feb 18, 2024 •

edited

Loading

emhane commented Feb 18, 2024 •

edited

Loading

mattsse left a comment •

edited

Loading