-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(db): race condition mdbx abort tx #6798
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_with_max_read_transaction_duration
change is legit, I like it and believe it's fully implemented in #6809 now.
Regarding write transactions: we don't actually need to track them in any way, because we don't enforce any timeouts on them, and they can be running for however long they want. See bullet point 2 in #5228 for the reason why long-running read transactions are bad (and this is why we implemented a hard timeout on them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two main things:
ReadTransactions.aborted
should only keep track of the transactions that were aborted due to a timeout, so we can check against it onMDBX_EBADSIGN
and report a nice error.- We still don't know if MDBX re-uses transaction pointers, and it's important because my previous assumption was that it doesn't.
txn_manager.remove_aborted_read_transaction(txn).is_some() && | ||
err_code == ffi::MDBX_EBADSIGN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it allows a txn ptr to be reused if mdbx deems it fit to reuse, i.e. error code is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err_code == 0
, but the txn stays in aborted txns, then aborting it will lead to Error::ReadTransactionAborted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the error code is not MDBX_EBADSIGN
(maybe even 0
which is success), why do we want to remove it from the list of aborted transactions? MDBX_EBADSIGN
is what usually returned by MDBX in case when we close the transaction and someone tries to re-use it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no wait got that mixed up.
txn is aborted because time out, makes mdbx say "hey this ptr can be assigned"
open "new" txn and mdbx uses that same ptr, so err code is 0
, so that check fails already at err_code == ffi::MDBX_EBADSIGN
txn stays in aborted list
user tries to abort the "new" txn, gets Error::TransactionAborted
, very confused user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes more sense to change order of that guard I think, then to remove from the aborted list when user tries to abort by drop or by TxnManagerMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user tries to abort the "new" txn, gets Error::TransactionAborted, very confused user
why would that happen? If a user tries to abort a new transaction, it will go through the mdbx_txn_abort
just fine and return a code 0
, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this to its own scope #6844
closed in favour of #6850 (review). |
Closes #6441.
Adds a list for aborted rw transactions, which follows the pattern for read transactions. Checks if a transaction is already aborted before callingffi::mdbx_txn_abort(tx_ptr)
.TxnManager
before starting message listener. Before message listener would always run with read transactions set toNone
. (cherry-picked to main already)