Compressed state dumps #5112

dbeal-eth · 2023-06-05T23:33:32Z

Motivation

9 months ago I wrote the code for anvil_dumpState and anvil_loadState. During the time, I mentioned the possible benefits of compressing the state dumps.

Recently, my state dumps have been getting so big that they no longre seem to work with the JSON RPC API (getting errors like "Invalid request" even though there should be no limit on the size of requests).

Whatever the reason, much larger state dumps could be much, much smaller by compressing the blob returned by dumpState and loaded by loadState. This improves the efficiency of data handling for tools outside of anvil, as well as eliminating the above described request issue due to request size.

Solution

Using the flate2 cargo package, compress the state dump returned by anvil on anvil_dumpState with GzEncoder. Use default compression options.

On anvil_loadState, use GzDecoder. If the state is not compressed, GzDecoder will fail to parse a header, and anvil will assume the state is not compressed and load it directly. This allows for backwards compatibility.

This change is not forward compatible (ie, state dumps created with newer versions of anvil after this change will not be loadable by older versions of anvil)

The gzip compression algorithm is so commonplace that it is often used by web servers/browsers for transparent compression, so it is a good choice for standardization of the format.

There is no streaming on either the serde_json encode/decode, nor the JSON RPC API, so there is no streaming used while compressing/decompressing. Its not super efficient, but that isn't such a big deal.

Very large state dumps can be un-loadable due to some unknown limitation with the message size (or something) which leads to `Invalid request` error. In addition, storing raw JSON dumps to files or this change will allow for backwards-compatible addition of ability to generate compressed state dumps. As even basic compression often yields a 10x increase in stored data capacity, this should allow for state dumps to be much larger.

dbeal-eth · 2023-06-05T23:40:46Z

previous PR from me introducing state dumps: #2256

Evalir

This is pretty cool and looks sound—but I'll defer to @mattsse for deciding if we actually want this. Just a few comments:

Evalir · 2023-06-05T23:57:05Z

anvil/src/eth/backend/mem/mod.rs

@@ -630,14 +637,25 @@ impl Backend {
    /// Write all chain data to serialized bytes buffer
    pub async fn dump_state(&self) -> Result<Bytes, BlockchainError> {
        let state = self.serialized_state().await?;
-        let content = serde_json::to_vec(&state).unwrap_or_default().into();
-        Ok(content)
+        let mut e = GzEncoder::new(Vec::new(), Compression::default());


a bit bikesheddy, but we could name this a tad better—maybe encoded_data and just shadow it when doing write_all?

generally addressed in 5deb317

Evalir · 2023-06-06T00:05:59Z

anvil/src/eth/backend/mem/mod.rs

+        let mut v = Vec::new();
+
+        let state: SerializableState = serde_json::from_slice(if d.header().is_some() {
+            d.read_to_end(v.as_mut()).map_err(|_| BlockchainError::FailedToDecodeStateDump)?;


hmm does the map_err return an err? if so we could maybe have a variant that includes the error

the read_to_end emits an io::Result. If there is a problem, the most description that could probably be returned to the user is probably Invalid Format or Unexpected Token or Invalid Length etc, things that probably wouldn't be more helpful than just returning a generic decoder error. I could add a distsinction between decoding failure of the serde_json vs. decompressing of data, but again what difference in action would that actually lead the user? They aren't supposed to be modifying state dumps by hand.

Yeah makes sense actually. It's alright then—this wouldn't surface extra info

Evalir

deferring to @mattsse for final approval—will also need nightly lint/clippy

mattsse

makes sense,

lgtm

dbeal-eth added 2 commits June 5, 2023 15:52

add flate2

4558b4b

Evalir requested review from mattsse and Evalir June 5, 2023 23:36

fix lint

24e13d3

Evalir requested changes Jun 6, 2023

View reviewed changes

better var names

5deb317

dbeal-eth requested a review from Evalir June 6, 2023 04:55

Evalir approved these changes Jun 6, 2023

View reviewed changes

mattsse approved these changes Jun 6, 2023

View reviewed changes

rustfmt

945ac31

mattsse merged commit 5ed3842 into foundry-rs:master Jun 6, 2023
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed state dumps #5112

Compressed state dumps #5112

dbeal-eth commented Jun 5, 2023

dbeal-eth commented Jun 5, 2023

Evalir left a comment

Evalir Jun 5, 2023

dbeal-eth Jun 6, 2023

Evalir Jun 6, 2023

dbeal-eth Jun 6, 2023

Evalir Jun 6, 2023

Evalir left a comment

mattsse left a comment

Compressed state dumps #5112

Compressed state dumps #5112

Conversation

dbeal-eth commented Jun 5, 2023

Motivation

Solution

dbeal-eth commented Jun 5, 2023

Evalir left a comment

Choose a reason for hiding this comment

Evalir Jun 5, 2023

Choose a reason for hiding this comment

dbeal-eth Jun 6, 2023

Choose a reason for hiding this comment

Evalir Jun 6, 2023

Choose a reason for hiding this comment

dbeal-eth Jun 6, 2023

Choose a reason for hiding this comment

Evalir Jun 6, 2023

Choose a reason for hiding this comment

Evalir left a comment

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment