Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LZ4 compressor with Java9 perf improvements #77153

Merged
merged 26 commits into from
Sep 9, 2021

Conversation

Tim-Brooks
Copy link
Contributor

Java9 added a number of features that are useful to improve compression
and decompression. These include the Arrays#mismatch method and
VarHandles. This commit adds compression tools forked from the java-lz4
library which include these improvements. We hope to contribute these
changes back to the original project, however the project currently
supports Java7 so this is not possible at the moment.

@Tim-Brooks Tim-Brooks added >non-issue :Distributed/Network Http and internode communication implementations v8.0.0 labels Sep 1, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Sep 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@jpountz
Copy link
Contributor

jpountz commented Sep 2, 2021

Have you been able to get a sense of how much faster compression/decompression get with this change?
Code-wise I'd like to copy the entire test suite as well, in order to increase confidence that we are not introducing a subtle bug with these performance improvements.

@Tim-Brooks
Copy link
Contributor Author

@jpountz - I added the tests from the lz4-java library. It does require committing binary files. I hope that is fine.

I benchmarked this using JMH on m5d.4xlarge. The compress benchmark is compressing ~1MB of highly compressible observability data 64KB at a time. And the uncompress benchmark uncompresses those 64KB blocks.

Benchmark                                 Mode  Cnt     Score    Error  Units
MyBenchmark.testCompressLZ4Java          thrpt   15   536.109 ±  1.879  ops/s
MyBenchmark.testCompressLZ4JavaForked    thrpt   15   907.001 ±  1.124  ops/s
MyBenchmark.testCompressLZ4JavaUnsafe    thrpt   15  1096.882 ± 39.805  ops/s
MyBenchmark.testDecompressLZ4Java        thrpt   15  1513.808 ±  5.423  ops/s
MyBenchmark.testDecompressLZ4JavaForked  thrpt   15  3073.232 ± 47.876  ops/s
MyBenchmark.testDecompressLZ4JavaUnsafe  thrpt   15  2508.444 ± 83.889  ops/s

I think the forked version is faster at decompressing the data than the unsafe version because there is a place in the Unsafe version where data is being copied twice (unnecessarily).

LZ4UnsafeUtils.java

static void safeIncrementalCopy(byte[] dest, int matchOff, int dOff, int matchLen) {
        for(int i = 0; i < matchLen; ++i) {
            dest[dOff + i] = dest[matchOff + i];
            UnsafeUtils.writeByte(dest, dOff + i, UnsafeUtils.readByte(dest, matchOff + i));
        }

    }

@Tim-Brooks
Copy link
Contributor Author

Also ARM (m6gd.4xlarge):

Benchmark                                 Mode  Cnt     Score    Error  Units
MyBenchmark.testCompressLZ4Java          thrpt   15   505.949 ±  0.588  ops/s
MyBenchmark.testCompressLZ4JavaForked    thrpt   15   809.159 ± 14.915  ops/s
MyBenchmark.testCompressLZ4JavaUnsafe    thrpt   15   973.499 ± 13.844  ops/s
MyBenchmark.testDecompressLZ4Java        thrpt   15  1464.056 ± 36.187  ops/s
MyBenchmark.testDecompressLZ4JavaForked  thrpt   15  2461.555 ± 19.669  ops/s
MyBenchmark.testDecompressLZ4JavaUnsafe  thrpt   15  2839.432 ± 59.338  ops/s

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running these benchmarks @tbrooks8, the results are super impressive. Let's share this on the lz4-java repository?

It does require committing binary files. I hope that is fine.

Hmm good question. On the one hand I don't like checking in files that don't compress well, but on the other hand I would like to make sure we have solid testing, as any bug here could have terrible consequences like silent data corruption. I wonder what other options we have, e.g. could the build download them?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I left some smaller comments to consider.

matchOff += 4;
int dec = 0;

assert dOff >= matchOff && dOff - matchOff < 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do notice that this is a copy from original, but I wonder how dOff - matchOff can ever be >= 4 since we handle that case above (and then add 4 to both here)? I feel like I am missing something obvious, help me 🙂

Copy link
Contributor Author

@Tim-Brooks Tim-Brooks Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. I accidentally had copied the class file opposed to the original source. I modified this PR to copy the original source which logically makes more sense. If it is reordered in the class file, well 🤷‍♂️.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits, otherwise this looks good to me.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be very careful with these changes given our history with LZO, where a very subtle bug caused index files to be silently corrupted upon recovery. I like the suggestion that @henningandersen made to compare that your fork produces the same bytes but we don't seem to be testing this on the test that runs on real-world data (or did I miss it?). This small addition would help me feel more confident that the changes are correct since real-world data tends to have patterns that are hard to reproduce in synthetic data.

Other than that the change looks good to me. I hope we'll be able to merge these changes upstream in the near future to be able to move back to something that is more widely deployed.

@Tim-Brooks
Copy link
Contributor Author

Tim-Brooks commented Sep 7, 2021

but we don't seem to be testing this on the test that runs on real-world data (or did I miss it?). This small addition would help me feel more confident that the changes are correct since real-world data tends to have patterns that are hard to reproduce in synthetic data.

I added this assertion to compare against the lz4-java safe instance.

tester.copyOf(data), off, len,
compressed, 0, maxCompressedLength);

// Modified to compress using an unforked lz4-java compressor and verify that the results are same.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small comments. I believe all of them were for the issues that exist in original lz4-java implementation, so I'm not sure whether we want to fix them or keep the amount of changes in our fork minimal


public static final LZ4Compressor INSTANCE = new ESLZ4Compressor();

ESLZ4Compressor() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant since the default constructor will be created by javac

return dOff - destOff;
}

public int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff, int maxDestLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @Override is missed here since we override the compress method from LZ4Compressor

public class ESLZ4Decompressor extends LZ4FastDecompressor {
public static final LZ4FastDecompressor INSTANCE = new ESLZ4Decompressor();

ESLZ4Decompressor() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here: I believe the constructor is redundant.

ESLZ4Decompressor() {
}

public int decompress(byte[] src, int srcOff, byte[] dest, int destOff, int destLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Override is missed here, decompress is overridden from LZ4FastDecompressor

int literalLen = token >>> 4;
if (literalLen == 15) {
byte len;
for(boolean var11 = true; (len = SafeUtils.readByte(src, sOff++)) == -1; literalLen += 255) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var11 seems to be some decompiling artifact? It doesn't seem to be used anywhere.

int matchLen = token & 15;
if (matchLen == 15) {
byte len;
for(boolean var15 = true; (len = SafeUtils.readByte(src, sOff++)) == -1; matchLen += 255) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for var15, seems to be a redundant variable

}
}

public int decompress(ByteBuffer src, int srcOff, ByteBuffer dest, int destOff, int destLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Override is missed here

}

static void safeIncrementalCopy(byte[] dest, int matchOff, int dOff, int matchLen) {
for (int i = 0; i < matchLen; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this loop be replaced with if (matchLen >= 0) System.arraycopy(dest, matchOff, dest, dOff, matchLen)?

// Modified wildIncrementalCopy to mirror version in LZ4UnsafeUtils
static void wildIncrementalCopy(byte[] dest, int matchOff, int dOff, int matchCopyEnd) {
if (dOff - matchOff < 4) {
for (int i = 0; i < 4; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do System.arraycopy(dest, matchOff, dest, dOff, 4) here

}

static int encodeSequence(byte[] src, int anchor, int matchOff, int matchRef, int matchLen, byte[] dest, int dOff, int destEnd) {
final int runLen = matchOff - anchor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runLen seems to be redundant

@mark-vieira
Copy link
Contributor

Have we considered simply doing a proper fork of https://github.com/lz4/lz4-java and referencing that rather than vendoring the code into the Elasticsearch codebase? Since the intention is for this to be temporary we could just fork, publish the JAR as a GitHub package and pull it in as a normal dependency. We'd also have a better audit of how it differs from upstream and could easily verify behavior but just running the existing test suites.

Since https://github.com/tbrooks8/lz4-java/tree/do_not_copy_twice already exists we can just use that.

@Tim-Brooks
Copy link
Contributor Author

Recognizing that I know nothing about this.

Since the intention is for this to be temporary we could just fork, publish the JAR as a GitHub package and pull it in as a normal dependency.

I assume this is easy? Like I would not need to do the whole publish to Maven thing for my fork? Do we feel comfortable using my account's fork, or should we just be forking it into an elastic/lz4-java repo?

You did not address this: we would need CI on the forked version. Is that something we can setup?

@Tim-Brooks
Copy link
Contributor Author

I believe all of them were for the issues that exist in original lz4-java implementation, so I'm not sure whether we want to fix them or keep the amount of changes in our fork minimal

These are all valid suggestions for cleaning up the code. However, we are attempting to only make behavioral changes that are necessary for our performance goals. We would like to otherwise keep it identical with the original code.

@mark-vieira
Copy link
Contributor

mark-vieira commented Sep 7, 2021

I assume this is easy? Like I would not need to do the whole publish to Maven thing for my fork? Do we feel comfortable using my account's fork, or should we just be forking it into an elastic/lz4-java repo?

Yes, should be easy. I can help facilitate the publication bit. Using an elastic fork would be better but not strictly necessary.

You did not address this: we would need CI on the forked version. Is that something we can setup?

We don't need CI if we don't intend on doing active development on this fork. If we're just implementing a patch we can manually test and build the artifacts for publication.

@henningandersen
Copy link
Contributor

I would prefer to keep this as is, perhaps zip'ing the test resources if that helps a little (I think it is not strictly necessary). We are not copying the entire lz4 code, only the relevant pieces that we need to subclass and overwrite.

We want to be able to move forward with this for possible inclusion into 7.x. I think having a separate fork will make it difficult to support, since the fix and release process will be known only to a small group. It will be sort of temporary, but we do not know nor can we control the timeline for getting this included into lz4-java. The changes upstream will be somewhat different too (i.e., not a straightforward contribution). So we may have to support versions running with the code Tim produced here (or at least a 7.x variant of it).

Even if we were to fork, we should keep the changes in new classes like Tim did in this PR to ensure that if someone ends up using lz4-java in our code base, they get the original version not the modified version (which may not fit every purpose, for instance it currently assumes only few threads use the compressor). So it is less of a fork with modification and much more new classes with improved behavior for our usage pattern.

First step as we see it is to get this into our main branch to get feedback on the full suite of benchmarks.

@@ -0,0 +1,256 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to fix the license header here. We should remove the Elastic header and instead add @notice to the first line of the Apache 2 header to indicate this is vendored code. Just do a search on that string to see other examples. Same applies to all files in this project.

@mark-vieira
Copy link
Contributor

mark-vieira commented Sep 9, 2021

So it is less of a fork with modification and much more new classes with improved behavior for our usage pattern.

That makes sense. If the changes aren't a general solution it's probably going to take more work to get things merged upstream. That said, we should still aim to do this vs vendoring code into our codebase.

@Tim-Brooks Tim-Brooks merged commit 11c6dfc into elastic:master Sep 9, 2021
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Sep 10, 2021
Java9 added a number of features that are useful to improve compression
and decompression. These include the Arrays#mismatch method and
VarHandles. This commit adds compression tools forked from the java-lz4
library which include these improvements. We hope to contribute these
changes back to the original project, however the project currently
supports Java7 so this is not possible at the moment.
@mark-vieira
Copy link
Contributor

Do we have an open issue to add this new lib jar to release manager? As it is we've broken testing for external plugin authors.

cc @rjernst

@mark-vieira
Copy link
Contributor

Do we have an open issue to add this new lib jar to release manager? As it is we've broken testing for external plugin authors.

For posterity I've added this new artifact to release manager.

Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Sep 29, 2021
Java9 added a number of features that are useful to improve compression
and decompression. These include the Arrays#mismatch method and
VarHandles. This commit adds compression tools forked from the java-lz4
library which include these improvements. We hope to contribute these
changes back to the original project, however the project currently
supports Java7 so this is not possible at the moment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >non-issue Team:Distributed Meta label for distributed team v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants