Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(stages): Adds benchmark to TransactionLookupStage #1130

Merged
merged 27 commits into from
Feb 9, 2023
Merged

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Feb 2, 2023

Tries to tackle #1092

Added one benchmark, but to be honest the difference I'm seeing seems too small ( 2 - 4 %). I was expecting a bigger difference (for 20k blocks) and I'm not sure if it can't be explained by my machine instability.

One way to test the stage before (tx.put) and after (pre-sort + crsr.append):

$ git checkout dec44093c679d650be5fce04e23f22855b99b357
$ cargo bench --package reth-stages --bench criterion --features test-utils
$ git checkout joshie/txlookup
$ cargo bench --package reth-stages --bench criterion --features test-utils

To test insertion of tables with hash keys :

$ cargo bench --package reth-db --bench hash_keys

Note: sorting is benchmarked on insert_sorted and put_sorted.

@joshieDo joshieDo added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Feb 2, 2023
@joshieDo joshieDo added C-perf A change motivated by improving speed, memory usage or disk footprint and removed C-enhancement New feature or request labels Feb 2, 2023
let last = cursor_txhash.last()?;
let mut append = last.is_none();
if let (false, Some((last_txhash, _))) = (tx_list.is_empty(), last) {
append = last_txhash < tx_list[0].0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are comparing it with the first in the list, should tx_list be sorted at this pont?

self.query(|tx| {
let last = tx.cursor_read::<T>()?.last()?;
Ok(last.is_none())
})
}

/// Return full table as Vec
pub(crate) fn table<T: Table>(&self) -> Result<Vec<(T::Key, T::Value)>, DbError>
pub fn table<T: Table>(&self) -> Result<Vec<(T::Key, T::Value)>, DbError>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's clippy allow this

Comment on lines 39 to 42
let tx = prepare_blocks(NUM_BLOCKS).unwrap();

measure_txlookup_stage(&mut group, tx);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would add manual implementations of various access patterns in a loop, in addition to testing the stage itself. might be easier to compare specific parts if e.g. you have a bench that does X puts vs sorted w/ cursor etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For append we know it is fastest if table is empty, question is what is better if we already have some values with random keys in the table and want to insert an additional batch?

Copy link
Collaborator Author

@joshieDo joshieDo Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i should add those scenarios, however I'll probably put those elsewhere since they're more db specific? this seems like it should be only for stage benchmarking? (eg. reading from different tables and aggregating into one; mem vs disk read -> write, etc)

Copy link
Collaborator Author

@joshieDo joshieDo Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it on 100cea. There is a clear difference between presorted insert and unsorted insert.

cargo bench --package reth-db --bench hash_keys
Results

image

However, it's not that "visible" in the stage itself. I think because we're allocating all the data into a list before sorting+inserting it. Whereas, in the current state, we're just passing the value into put. But will need to run some more benches I guess

@gakonst
Copy link
Member

gakonst commented Feb 5, 2023

Pulled the parts about stage benchmarking here: #1171. Let's keep investigating the DB-specific patterns here.

@gakonst
Copy link
Member

gakonst commented Feb 6, 2023

TransactionLookup bench on main:

Stages/TransactionLookup
                        time:   [1.2050 s 1.2212 s 1.2390 s]
                        change: [+141.37% +145.11% +149.12%] (p = 0.00 < 0.05)
                        Performance has regressed.

And this PR:

Stages/TransactionLookup
                        time:   [492.02 ms 499.36 ms 507.76 ms]
                        change: [-59.921% -59.109% -58.261%] (p = 0.00 < 0.05)
                        Performance has improved.

Pretty significant improvement! Nice. What else do we need to review this?

However, it's not that "visible" in the stage itself. I think because we're allocating all the data into a list before sorting+inserting it. Whereas, in the current state, we're just passing the value into put. But will need to run some more benches I guess

^Is this still relevant? @joshieDo

@joshieDo
Copy link
Collaborator Author

joshieDo commented Feb 6, 2023

Just leaving here some results and thoughts for inserting into a table with keys as hashes.
b6cf12c

Notes:

  • preload: preloads the database with X rows.
  • Each size category uses the same dataset for insert_*, put_* and append_*.
  • append_all should be used only to compare table sizes.

Benchmarks

  • 10_000 rows
    image

  • 100_000 rows
    image

  • 1_000_000 rows
    image

Thoughts

  • append improves speed (append_size) and table size (append_all). So if a node has been up for some weeks/months, it can be beneficial to copy the data to disk, delete the table and append from disk again, I guess.
  • insert_sorted is faster than put_sorted but table size is the same.
  • insert_sorted is faster than insert_unsorted but for some reason the table size is bigger.
  • The bigger the number of rows, the more noticeable the difference is. (before I was just testing small sizes...)

The table size being bigger for insert_sorted is the more curious aspect.

Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smol nits

let mut tx_cursor = tx.cursor_write::<tables::Transactions>()?;
let mut cursor_tx = tx.cursor_write::<tables::Transactions>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use <type>_cursor everywhere else

// Sort before inserting the reverse lookup for hash -> tx_id.
tx_list.sort_by(|txa, txb| txa.0.cmp(&txb.0));

let mut cursor_txhash = tx.cursor_write::<tables::TxHashNumber>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut cursor_txhash = tx.cursor_write::<tables::TxHashNumber>()?;
let mut txhash_cursor = tx.cursor_write::<tables::TxHashNumber>()?;

crates/stages/src/stages/tx_lookup.rs Outdated Show resolved Hide resolved
Comment on lines 89 to 91
let mut append = last.is_none();
if let (false, Some((last_txhash, _))) = (tx_list.is_empty(), last) {
append = last_txhash < tx_list[0].0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut append = last.is_none();
if let (false, Some((last_txhash, _))) = (tx_list.is_empty(), last) {
append = last_txhash < tx_list[0].0;
let append = tx_list.first().map(|(first_txhash)| last_txhash < first_txhash).unwrap_or_default();

@joshieDo joshieDo marked this pull request as ready for review February 8, 2023 03:32
@joshieDo
Copy link
Collaborator Author

joshieDo commented Feb 8, 2023

Opening for review. The issue of the insert_sorted and insert_unsorted wrt table size can be investigated separately

@codecov-commenter
Copy link

Codecov Report

Merging #1130 (c6a77d4) into main (ba70b3b) will increase coverage by 0.26%.
The diff coverage is 79.31%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1130      +/-   ##
==========================================
+ Coverage   75.29%   75.56%   +0.26%     
==========================================
  Files         335      339       +4     
  Lines       36432    37707    +1275     
==========================================
+ Hits        27433    28492    +1059     
- Misses       8999     9215     +216     
Flag Coverage Δ
unit-tests 75.56% <79.31%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bin/reth/src/db/mod.rs 0.00% <0.00%> (ø)
crates/primitives/src/header.rs 94.15% <66.66%> (-0.35%) ⬇️
crates/stages/src/stages/tx_lookup.rs 92.50% <92.85%> (+0.02%) ⬆️
crates/primitives/src/block.rs 92.59% <100.00%> (+0.28%) ⬆️
crates/primitives/src/hardfork.rs 25.00% <0.00%> (-11.96%) ⬇️
crates/stages/src/stages/total_difficulty.rs 84.48% <0.00%> (-10.79%) ⬇️
crates/net/dns/src/config.rs 90.90% <0.00%> (-9.10%) ⬇️
crates/storage/provider/src/test_utils/noop.rs 8.88% <0.00%> (-5.93%) ⬇️
crates/primitives/src/genesis.rs 81.25% <0.00%> (-4.00%) ⬇️
crates/rpc/rpc-builder/src/lib.rs 28.29% <0.00%> (-3.35%) ⬇️
... and 76 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants