Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed state dumps #5112

Merged
merged 5 commits into from
Jun 6, 2023
Merged

Conversation

dbeal-eth
Copy link
Contributor

Motivation

9 months ago I wrote the code for anvil_dumpState and anvil_loadState. During the time, I mentioned the possible benefits of compressing the state dumps.

Recently, my state dumps have been getting so big that they no longre seem to work with the JSON RPC API (getting errors like "Invalid request" even though there should be no limit on the size of requests).

Whatever the reason, much larger state dumps could be much, much smaller by compressing the blob returned by dumpState and loaded by loadState. This improves the efficiency of data handling for tools outside of anvil, as well as eliminating the above described request issue due to request size.

Solution

Using the flate2 cargo package, compress the state dump returned by anvil on anvil_dumpState with GzEncoder. Use default compression options.

On anvil_loadState, use GzDecoder. If the state is not compressed, GzDecoder will fail to parse a header, and anvil will assume the state is not compressed and load it directly. This allows for backwards compatibility.

This change is not forward compatible (ie, state dumps created with newer versions of anvil after this change will not be loadable by older versions of anvil)

The gzip compression algorithm is so commonplace that it is often used by web servers/browsers for transparent compression, so it is a good choice for standardization of the format.

There is no streaming on either the serde_json encode/decode, nor the JSON RPC API, so there is no streaming used while compressing/decompressing. Its not super efficient, but that isn't such a big deal.

Very large state dumps can be un-loadable due to some unknown limitation with the message size
(or something) which leads to `Invalid request` error.

In addition, storing raw JSON dumps to files or

this change will allow for backwards-compatible addition of ability to generate compressed state dumps.
As even basic compression often yields a 10x increase in stored data capacity, this should allow for state
dumps to be much larger.
@Evalir Evalir requested review from mattsse and Evalir June 5, 2023 23:36
@dbeal-eth
Copy link
Contributor Author

previous PR from me introducing state dumps: #2256

Copy link
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool and looks sound—but I'll defer to @mattsse for deciding if we actually want this. Just a few comments:

@@ -630,14 +637,25 @@ impl Backend {
/// Write all chain data to serialized bytes buffer
pub async fn dump_state(&self) -> Result<Bytes, BlockchainError> {
let state = self.serialized_state().await?;
let content = serde_json::to_vec(&state).unwrap_or_default().into();
Ok(content)
let mut e = GzEncoder::new(Vec::new(), Compression::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit bikesheddy, but we could name this a tad better—maybe encoded_data and just shadow it when doing write_all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally addressed in 5deb317

let mut v = Vec::new();

let state: SerializableState = serde_json::from_slice(if d.header().is_some() {
d.read_to_end(v.as_mut()).map_err(|_| BlockchainError::FailedToDecodeStateDump)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm does the map_err return an err? if so we could maybe have a variant that includes the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the read_to_end emits an io::Result. If there is a problem, the most description that could probably be returned to the user is probably Invalid Format or Unexpected Token or Invalid Length etc, things that probably wouldn't be more helpful than just returning a generic decoder error. I could add a distsinction between decoding failure of the serde_json vs. decompressing of data, but again what difference in action would that actually lead the user? They aren't supposed to be modifying state dumps by hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense actually. It's alright then—this wouldn't surface extra info

@dbeal-eth dbeal-eth requested a review from Evalir June 6, 2023 04:55
Copy link
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferring to @mattsse for final approval—will also need nightly lint/clippy

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense,

lgtm

@mattsse mattsse merged commit 5ed3842 into foundry-rs:master Jun 6, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants