Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3987 [Java] Concurrency improvement for ReflectData FieldAccessors #2900

Merged

Conversation

ashley-taylor
Copy link
Contributor

What is the purpose of the change

Improve the performance by removing synchronisation calls.

Running the ReflectRecordTest with 8 threads on a M2 Macbook pro

before change

    Benchmark                  Mode  Cnt        Score        Error  Units
    ReflectRecordTest.decode  thrpt    3  4,157,551.230 ± 831123.819  ops/s
    ReflectRecordTest.encode  thrpt    3  4,926,092.511 ± 455798.533  ops/s

after

    Benchmark                  Mode  Cnt         Score         Error  Units
    ReflectRecordTest.decode  thrpt    3  12,221,568.529 ± 1308392.259  ops/s
    ReflectRecordTest.encode  thrpt    3  12,569,201.802 ± 1898124.234  ops/s

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

org.apache.avo.reflect already tests this code

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the Java Pull Requests for Java binding label May 8, 2024
@ashley-taylor
Copy link
Contributor Author

to recreate the benchmark, you need to change the threads value here to be greater than 1

.warmupIterations(warmupIterations).measurementIterations(measurementIterations).forks(1).threads(1)

bySchema.put(schema, result);
this.bySchema = bySchema;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without synchronization this.bySchema could be modified several times by now. Would that be a problem ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the object is derived It won't cause an issue. There could be duplicated generation across threads. Causing some wasted compute at the start of processing. But since this.bySchema is always replaced with a new map with no future modifications concurrent read are safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a Java programmer but the this.bySchema = bySchema assignment seems OK, assuming that the volatile causes a memory barrier that ensures the effects of bySchema.put(schema, result) will be seen by other threads that read this.bySchema after the assignment.

However, I wonder about bySchema.get(schema). Is WeakHashMap.get formally documented to be safe for concurrent calls? Each time WeakHashMap.get is called, WeakHashMap.expungeStaleEntries can delete some entries, and although WeakHashMap.expungeStaleEntries synchronizes those deletions against each other, it doesn't seem to synchronize them against any reads that WeakHashMap.get may be doing on other threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remark about volatile is valid: that's exactly what it does.

Also a few notes about WeakHashMap:

  1. it has weak keys, so it'll only cleanup entries that can no longer be looked up
  2. its internal hash table is a singly-linked list, and the update to remove an entry is an atomic operation (that never yields an invalid data structure)
  3. the update happens without a memory barrier, which means that the worst case scenario is that the update is not yet available to the other thread, which means that the entry to be removed is read & skipped over in the concurrent call to get

In practical terms, this means that because the we update a copy of the WeakHashMap before it becomes available, and it's only modified to expunge stale entries after it becomes available, the result is effectively thread safe.

@ashley-taylor
Copy link
Contributor Author

@martin-g anything you need to get this merged?

@martin-g
Copy link
Member

@martin-g anything you need to get this merged?

I'll leave it to a Java developer because I am not that familiar with the Java SDK.

@opwvhk opwvhk merged commit e932c94 into apache:main May 31, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants