Add LZ4 compressor with Java9 perf improvements #77153

Tim-Brooks · 2021-09-01T21:45:27Z

Java9 added a number of features that are useful to improve compression
and decompression. These include the Arrays#mismatch method and
VarHandles. This commit adds compression tools forked from the java-lz4
library which include these improvements. We hope to contribute these
changes back to the original project, however the project currently
supports Java7 so this is not possible at the moment.

elasticmachine · 2021-09-01T21:45:30Z

Pinging @elastic/es-distributed (Team:Distributed)

jpountz · 2021-09-02T09:28:06Z

Have you been able to get a sense of how much faster compression/decompression get with this change?
Code-wise I'd like to copy the entire test suite as well, in order to increase confidence that we are not introducing a subtle bug with these performance improvements.

Tim-Brooks · 2021-09-02T23:44:13Z

@jpountz - I added the tests from the lz4-java library. It does require committing binary files. I hope that is fine.

I benchmarked this using JMH on m5d.4xlarge. The compress benchmark is compressing ~1MB of highly compressible observability data 64KB at a time. And the uncompress benchmark uncompresses those 64KB blocks.

Benchmark                                 Mode  Cnt     Score    Error  Units
MyBenchmark.testCompressLZ4Java          thrpt   15   536.109 ±  1.879  ops/s
MyBenchmark.testCompressLZ4JavaForked    thrpt   15   907.001 ±  1.124  ops/s
MyBenchmark.testCompressLZ4JavaUnsafe    thrpt   15  1096.882 ± 39.805  ops/s
MyBenchmark.testDecompressLZ4Java        thrpt   15  1513.808 ±  5.423  ops/s
MyBenchmark.testDecompressLZ4JavaForked  thrpt   15  3073.232 ± 47.876  ops/s
MyBenchmark.testDecompressLZ4JavaUnsafe  thrpt   15  2508.444 ± 83.889  ops/s

I think the forked version is faster at decompressing the data than the unsafe version because there is a place in the Unsafe version where data is being copied twice (unnecessarily).

LZ4UnsafeUtils.java

static void safeIncrementalCopy(byte[] dest, int matchOff, int dOff, int matchLen) {
        for(int i = 0; i < matchLen; ++i) {
            dest[dOff + i] = dest[matchOff + i];
            UnsafeUtils.writeByte(dest, dOff + i, UnsafeUtils.readByte(dest, matchOff + i));
        }

    }

Tim-Brooks · 2021-09-03T02:33:24Z

Also ARM (m6gd.4xlarge):

Benchmark                                 Mode  Cnt     Score    Error  Units
MyBenchmark.testCompressLZ4Java          thrpt   15   505.949 ±  0.588  ops/s
MyBenchmark.testCompressLZ4JavaForked    thrpt   15   809.159 ± 14.915  ops/s
MyBenchmark.testCompressLZ4JavaUnsafe    thrpt   15   973.499 ± 13.844  ops/s
MyBenchmark.testDecompressLZ4Java        thrpt   15  1464.056 ± 36.187  ops/s
MyBenchmark.testDecompressLZ4JavaForked  thrpt   15  2461.555 ± 19.669  ops/s
MyBenchmark.testDecompressLZ4JavaUnsafe  thrpt   15  2839.432 ± 59.338  ops/s

jpountz

Thanks for running these benchmarks @tbrooks8, the results are super impressive. Let's share this on the lz4-java repository?

It does require committing binary files. I hope that is fine.

Hmm good question. On the one hand I don't like checking in files that don't compress well, but on the other hand I would like to make sure we have solid testing, as any bug here could have terrible consequences like silent data corruption. I wonder what other options we have, e.g. could the build download them?

server/src/main/java/org/elasticsearch/transport/Compression.java

server/src/test/java/org/elasticsearch/common/compress/lz4/ESLZ4Tests.java

henningandersen

This looks good to me. I left some smaller comments to consider.

server/src/test/java/org/elasticsearch/common/compress/lz4/ESLZ4CompressorTests.java

server/src/main/java/org/elasticsearch/common/compress/lz4/ESLZ4Compressor.java

server/src/main/java/org/elasticsearch/common/compress/lz4/ESLZ4Decompressor.java

henningandersen · 2021-09-03T11:16:00Z

server/src/main/java/org/elasticsearch/common/compress/lz4/LZ4SafeUtils.java

+            matchOff += 4;
+            int dec = 0;
+
+            assert dOff >= matchOff && dOff - matchOff < 8;


I do notice that this is a copy from original, but I wonder how dOff - matchOff can ever be >= 4 since we handle that case above (and then add 4 to both here)? I feel like I am missing something obvious, help me 🙂

No idea. I accidentally had copied the class file opposed to the original source. I modified this PR to copy the original source which logically makes more sense. If it is reordered in the class file, well 🤷‍♂️.

server/src/test/java/org/elasticsearch/common/compress/lz4/ESLZ4CompressorTests.java

henningandersen

A couple nits, otherwise this looks good to me.

libs/lz4/src/test/java/org/elasticsearch/lz4/ESLZ4CompressorTests.java

libs/lz4/src/test/java/org/elasticsearch/lz4/ESLZ4DecompressorTests.java

jpountz

I want to be very careful with these changes given our history with LZO, where a very subtle bug caused index files to be silently corrupted upon recovery. I like the suggestion that @henningandersen made to compare that your fork produces the same bytes but we don't seem to be testing this on the test that runs on real-world data (or did I miss it?). This small addition would help me feel more confident that the changes are correct since real-world data tends to have patterns that are hard to reproduce in synthetic data.

Other than that the change looks good to me. I hope we'll be able to merge these changes upstream in the near future to be able to move back to something that is more widely deployed.

libs/lz4/src/test/java/org/elasticsearch/lz4/ESLZ4CompressorTests.java

Tim-Brooks · 2021-09-07T16:58:44Z

but we don't seem to be testing this on the test that runs on real-world data (or did I miss it?). This small addition would help me feel more confident that the changes are correct since real-world data tends to have patterns that are hard to reproduce in synthetic data.

I added this assertion to compare against the lz4-java safe instance.

Tim-Brooks · 2021-09-07T16:59:20Z

libs/lz4/src/test/java/org/elasticsearch/lz4/ESLZ4Tests.java

+            tester.copyOf(data), off, len,
+            compressed, 0, maxCompressedLength);
+
+        // Modified to compress using an unforked lz4-java compressor and verify that the results are same.


arteam

I left some small comments. I believe all of them were for the issues that exist in original lz4-java implementation, so I'm not sure whether we want to fix them or keep the amount of changes in our fork minimal

arteam · 2021-09-07T17:18:11Z

libs/lz4/src/main/java/org/elasticsearch/lz4/ESLZ4Compressor.java

+
+    public static final LZ4Compressor INSTANCE = new ESLZ4Compressor();
+
+    ESLZ4Compressor() {


I think this is redundant since the default constructor will be created by javac

arteam · 2021-09-07T17:19:16Z

libs/lz4/src/main/java/org/elasticsearch/lz4/ESLZ4Compressor.java

+        return dOff - destOff;
+    }
+
+    public int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff, int maxDestLen) {


I believe @Override is missed here since we override the compress method from LZ4Compressor

arteam · 2021-09-07T17:20:27Z