Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: safeguard localClusterState struct from race conditions #4163

Merged
merged 2 commits into from
Oct 10, 2023

Conversation

Rakshith-R
Copy link
Contributor

@Rakshith-R Rakshith-R commented Oct 5, 2023

Describe what this PR does

This commit uses atomic.Int64 and sync.Map with members of localClusterState and safeguards clusterAdditionalInfo map operations with a mutex.

Is the change backward compatible?

yes

Are there concerns around backward compatibility?

no

Related issues

Fixes: #4162


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@mergify mergify bot added the component/cephfs Issues related to CephFS label Oct 5, 2023
@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.27

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 5, 2023

This fixes #4162. feel free to attach the issue to the PR

Comment on lines 42 to 43
var clusterAdditionalInfo = make(map[string]*localClusterState)
var clusterAdditionalInfoMutex sync.Mutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early suggestion:- use sync.Map instead of map and sync.Mutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll make things more complex.

We'll need to get variable and convert to clusterState type every time and then store it back.
The race condition happens only when we are writing so current code should be good.
The upper map uses a mutex and members themselves are atomic.

@Rakshith-R
Copy link
Contributor Author

/retest ci/centos/mini-e2e/k8s-1.27

@Rakshith-R Rakshith-R linked an issue Oct 6, 2023 that may be closed by this pull request
@Rakshith-R Rakshith-R force-pushed the fix-crash branch 2 times, most recently from 61b19fa to 0f0d56a Compare October 6, 2023 08:48
@Rakshith-R Rakshith-R requested review from nixpanic, Madhu-1 and a team October 6, 2023 08:49
@Rakshith-R Rakshith-R marked this pull request as ready for review October 6, 2023 08:49
Madhu-1
Madhu-1 previously approved these changes Oct 6, 2023
const (
unknown operationState = iota
unknown int64 = iota
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this was int64 before, as we have minimal values we should have used uint32.

Comment on lines 306 to 307
if clusterAdditionalInfo[s.clusterID].resizeState.Load() == unknown ||
clusterAdditionalInfo[s.clusterID].resizeState.Load() == supported {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also go to the helper function like in metadata.go as an enhancement.

clusterAdditionalInfo = make(map[string]*localClusterState)
// clusterAdditionalInfoMutex is used to synchronize access to
// clusterAdditionalInfo map.
clusterAdditionalInfoMutex = sync.Mutex{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a sync.RWMutex not be better?

If you only mean to protect against concurrent writes, please state it clearly.

@@ -232,7 +238,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error {
}

// create subvolumegroup if not already created for the cluster.
if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] {
if _, found := clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated.Load(s.FsName); !found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still racy. If the subVolumeGroup is not found, there is a timespan where multiple processes can try to create it. This is something that should be protected by a Mutex.

If creating the subVolumeGroup multiple times is not problematic, please leave a comment in the code about it.

resizeState operationState
subVolMetadataState operationState
subVolSnapshotMetadataState operationState
resizeState atomic.Int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why using an atomic int is safer here. Can you explain that in the commit message?

riya-singhal31
riya-singhal31 previously approved these changes Oct 6, 2023
@Rakshith-R
Copy link
Contributor Author

On deeper inspection, indeed atomic really didn't serve the purpose and subVolgroupCreated was still kinda buggy.

Rewrote the code to use RWMutex locks individually with lot of helpers.

Please take a look again.

@Rakshith-R
Copy link
Contributor Author

/retest ci/centos/mini-e2e/k8s-1.27

subVolSnapshotMetadataState operationState
resizeState operationStateMutex
subVolMetadataState operationStateMutex
subVolSnapshotMetadataState operationStateMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be overdoing it, is the subVolumeGroupsRWMutex of this struct not sufficient to protect its members?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be overdoing it, is the subVolumeGroupsRWMutex of this struct not sufficient to protect its members?

No, all are independent parameters.

  • with R-Locks it should basically not have much overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about the performance overhead. I am worried about the complexity that looks unneeded.

@@ -240,7 +247,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error {
}

// create subvolumegroup if not already created for the cluster.
if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] {
if !s.isSubVolumeGroupCreated(s.SubvolumeGroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.

In order to prevent races, you will need to take a lock on the clusterAdditionalInfo[s.clusterID] object (a single lock for all members to keep it simple would be fine), or

  1. lock the .subVolumeGroupsRWMutex
  2. create the SubVolumeGroup
  3. unlock the .subVolumeGroupsRWMutex

If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.

In order to prevent races, you will need to take a lock on the clusterAdditionalInfo[s.clusterID] object (a single lock for all members to keep it simple would be fine), or

  1. lock the .subVolumeGroupsRWMutex
  2. create the SubVolumeGroup
  3. unlock the .subVolumeGroupsRWMutex

If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.

Yes that is intentional,
The mentioned parallel create will happen only once and there's no side affects to svg create cmd being called more than once. The follow on update will still be guarded by rwlock so we're fine.
I don't want to hold locks while we wait for a response from ceph.
And I want don't want other members to wait while one is being updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but that is not what the commit message describes:

Multiple go-routines may simultaneously check for
presence of a clusterID's metadata such as
subvolumegroup created state, resize state,
metadata state and snapshot metadata state in the
clusterAdditionalInfo map and update an entry after creation
if it is absent. This set of operation needs to be serialized.

There is no serialization in this commit. This commit add atomically setting/reading a value. Usually an int is atomic already, and sufficient if there are no calculations done with it. I do not think the mutex per member in the operationState contributes to preventing race conditions.

@Rakshith-R
Copy link
Contributor Author

👍
Modified second commit to protect only subVolumeGroupCreated map from consurrent creation/writes while allowing multiple readers.
PTAL

@riya-singhal31
Copy link
Contributor

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2023

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at d516a1d

Multiple go-routines may simultaneously check for a clusterID's
presence in clusterAdditionalInfo and create an entry if it is
absent. This set of operation needs to be serialized.

Therefore, this commit safeguards clusterAdditionalInfo map
from concurrent writes with a mutex to prevent the above problem.

Signed-off-by: Rakshith R <[email protected]>
Multiple go-routines may simultaneously create the
subVolumeGroupCreated map or  write into it
for a particular group.

This commit safeguards subVolumeGroupCreated map
from concurrent creation/writes while allowing for multiple
readers.

Signed-off-by: Rakshith R <[email protected]>
@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Oct 10, 2023
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.28

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Oct 10, 2023
@mergify mergify bot merged commit d516a1d into ceph:devel Oct 10, 2023
34 checks passed
@Rakshith-R
Copy link
Contributor Author

@Mergifyio backport release-3.9

@mergify
Copy link
Contributor

mergify bot commented Oct 16, 2023

backport release-3.9

❌ No backport have been created

  • Backport to branch release-3.9 failed

GitHub error: Branch not found

@Rakshith-R
Copy link
Contributor Author

@Mergifyio backport release-v3.9

@mergify
Copy link
Contributor

mergify bot commented Oct 16, 2023

backport release-v3.9

🟠 Pending

  • Backport to branch release-v3.9 in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent Map Writes With CephFS Plugin
5 participants