-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3987 [Java] Concurrency improvement for ReflectData FieldAccessors #2900
AVRO-3987 [Java] Concurrency improvement for ReflectData FieldAccessors #2900
Conversation
to recreate the benchmark, you need to change the
|
bySchema.put(schema, result); | ||
this.bySchema = bySchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without synchronization this.bySchema
could be modified several times by now. Would that be a problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the object is derived It won't cause an issue. There could be duplicated generation across threads. Causing some wasted compute at the start of processing. But since this.bySchema
is always replaced with a new map with no future modifications concurrent read are safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a Java programmer but the this.bySchema = bySchema
assignment seems OK, assuming that the volatile
causes a memory barrier that ensures the effects of bySchema.put(schema, result)
will be seen by other threads that read this.bySchema
after the assignment.
However, I wonder about bySchema.get(schema)
. Is WeakHashMap.get formally documented to be safe for concurrent calls? Each time WeakHashMap.get is called, WeakHashMap.expungeStaleEntries can delete some entries, and although WeakHashMap.expungeStaleEntries synchronizes those deletions against each other, it doesn't seem to synchronize them against any reads that WeakHashMap.get may be doing on other threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remark about volatile
is valid: that's exactly what it does.
Also a few notes about WeakHashMap
:
- it has weak keys, so it'll only cleanup entries that can no longer be looked up
- its internal hash table is a singly-linked list, and the update to remove an entry is an atomic operation (that never yields an invalid data structure)
- the update happens without a memory barrier, which means that the worst case scenario is that the update is not yet available to the other thread, which means that the entry to be removed is read & skipped over in the concurrent call to
get
In practical terms, this means that because the we update a copy of the WeakHashMap
before it becomes available, and it's only modified to expunge stale entries after it becomes available, the result is effectively thread safe.
@martin-g anything you need to get this merged? |
I'll leave it to a Java developer because I am not that familiar with the Java SDK. |
What is the purpose of the change
Improve the performance by removing synchronisation calls.
Running the
ReflectRecordTest
with 8 threads on a M2 Macbook probefore change
after
Verifying this change
This change is already covered by existing tests, such as (please describe tests).
org.apache.avo.reflect
already tests this codeDocumentation