cephfs: safeguard localClusterState struct from race conditions #4163

Rakshith-R · 2023-10-05T11:15:50Z

Describe what this PR does

This commit uses atomic.Int64 and sync.Map with members of localClusterState and safeguards clusterAdditionalInfo map operations with a mutex.

Is the change backward compatible?

yes

Are there concerns around backward compatibility?

no

Related issues

Fixes: #4162

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

Rakshith-R · 2023-10-05T11:17:00Z

/test ci/centos/mini-e2e/k8s-1.27

Madhu-1 · 2023-10-05T11:20:21Z

This fixes #4162. feel free to attach the issue to the PR

Madhu-1 · 2023-10-05T11:21:14Z

internal/cephfs/core/volume.go

 var clusterAdditionalInfo = make(map[string]*localClusterState)
+var clusterAdditionalInfoMutex sync.Mutex


Early suggestion:- use sync.Map instead of map and sync.Mutex

It'll make things more complex.

We'll need to get variable and convert to clusterState type every time and then store it back.
The race condition happens only when we are writing so current code should be good.
The upper map uses a mutex and members themselves are atomic.

Rakshith-R · 2023-10-06T05:37:13Z

/retest ci/centos/mini-e2e/k8s-1.27

Madhu-1 · 2023-10-06T10:19:31Z

internal/cephfs/core/volume.go

 const (
-	unknown operationState = iota
+	unknown int64 = iota


not sure why this was int64 before, as we have minimal values we should have used uint32.

Madhu-1 · 2023-10-06T10:22:13Z

internal/cephfs/core/volume.go

+	if clusterAdditionalInfo[s.clusterID].resizeState.Load() == unknown ||
+		clusterAdditionalInfo[s.clusterID].resizeState.Load() == supported {


This should also go to the helper function like in metadata.go as an enhancement.

nixpanic · 2023-10-06T10:46:02Z

internal/cephfs/core/volume.go

+	clusterAdditionalInfo = make(map[string]*localClusterState)
+	// clusterAdditionalInfoMutex is used to synchronize access to
+	// clusterAdditionalInfo map.
+	clusterAdditionalInfoMutex = sync.Mutex{}


Would a sync.RWMutex not be better?

If you only mean to protect against concurrent writes, please state it clearly.

nixpanic · 2023-10-06T10:53:07Z

internal/cephfs/core/volume.go

@@ -232,7 +238,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error {
 	}

 	// create subvolumegroup if not already created for the cluster.
-	if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] {
+	if _, found := clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated.Load(s.FsName); !found {


This is still racy. If the subVolumeGroup is not found, there is a timespan where multiple processes can try to create it. This is something that should be protected by a Mutex.

If creating the subVolumeGroup multiple times is not problematic, please leave a comment in the code about it.

nixpanic · 2023-10-06T10:56:30Z

internal/cephfs/core/volume.go

-	resizeState                 operationState
-	subVolMetadataState         operationState
-	subVolSnapshotMetadataState operationState
+	resizeState                 atomic.Int64


I do not understand why using an atomic int is safer here. Can you explain that in the commit message?

Pull request has been modified.

Rakshith-R · 2023-10-06T11:46:32Z

On deeper inspection, indeed atomic really didn't serve the purpose and subVolgroupCreated was still kinda buggy.

Rewrote the code to use RWMutex locks individually with lot of helpers.

Please take a look again.

Rakshith-R · 2023-10-06T11:54:50Z

/retest ci/centos/mini-e2e/k8s-1.27

nixpanic · 2023-10-06T13:20:21Z

internal/cephfs/core/volume.go

-	subVolSnapshotMetadataState operationState
+	resizeState                 operationStateMutex
+	subVolMetadataState         operationStateMutex
+	subVolSnapshotMetadataState operationStateMutex


this may be overdoing it, is the subVolumeGroupsRWMutex of this struct not sufficient to protect its members?

this may be overdoing it, is the subVolumeGroupsRWMutex of this struct not sufficient to protect its members?

No, all are independent parameters.

with R-Locks it should basically not have much overhead.

It is not about the performance overhead. I am worried about the complexity that looks unneeded.

nixpanic · 2023-10-10T11:18:58Z

internal/cephfs/core/volume.go

@@ -240,7 +247,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error {
 	}

 	// create subvolumegroup if not already created for the cluster.
-	if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] {
+	if !s.isSubVolumeGroupCreated(s.SubvolumeGroup) {


This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.

In order to prevent races, you will need to take a lock on the clusterAdditionalInfo[s.clusterID] object (a single lock for all members to keep it simple would be fine), or

lock the .subVolumeGroupsRWMutex

create the SubVolumeGroup

unlock the .subVolumeGroupsRWMutex

If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.

This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.

In order to prevent races, you will need to take a lock on the clusterAdditionalInfo[s.clusterID] object (a single lock for all members to keep it simple would be fine), or

lock the .subVolumeGroupsRWMutex

create the SubVolumeGroup

unlock the .subVolumeGroupsRWMutex

If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.

Yes that is intentional,
The mentioned parallel create will happen only once and there's no side affects to svg create cmd being called more than once. The follow on update will still be guarded by rwlock so we're fine.
I don't want to hold locks while we wait for a response from ceph.
And I want don't want other members to wait while one is being updated.

Sorry, but that is not what the commit message describes:

Multiple go-routines may simultaneously check for presence of a clusterID's metadata such as subvolumegroup created state, resize state, metadata state and snapshot metadata state in the clusterAdditionalInfo map and update an entry after creation if it is absent. This set of operation needs to be serialized.

There is no serialization in this commit. This commit add atomically setting/reading a value. Usually an int is atomic already, and sufficient if there are no calculations done with it. I do not think the mutex per member in the operationState contributes to preventing race conditions.

Rakshith-R · 2023-10-10T15:07:47Z

👍
Modified second commit to protect only subVolumeGroupCreated map from consurrent creation/writes while allowing multiple readers.
PTAL

riya-singhal31 · 2023-10-10T19:03:36Z

@Mergifyio queue

mergify · 2023-10-10T19:03:49Z

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at d516a1d

Multiple go-routines may simultaneously check for a clusterID's presence in clusterAdditionalInfo and create an entry if it is absent. This set of operation needs to be serialized. Therefore, this commit safeguards clusterAdditionalInfo map from concurrent writes with a mutex to prevent the above problem. Signed-off-by: Rakshith R <[email protected]>

Multiple go-routines may simultaneously create the subVolumeGroupCreated map or write into it for a particular group. This commit safeguards subVolumeGroupCreated map from concurrent creation/writes while allowing for multiple readers. Signed-off-by: Rakshith R <[email protected]>

ceph-csi-bot · 2023-10-10T19:04:58Z

/test ci/centos/upgrade-tests-cephfs

ceph-csi-bot · 2023-10-10T19:04:59Z

/test ci/centos/upgrade-tests-rbd

ceph-csi-bot · 2023-10-10T19:04:59Z

/test ci/centos/k8s-e2e-external-storage/1.26

ceph-csi-bot · 2023-10-10T19:05:00Z

/test ci/centos/k8s-e2e-external-storage/1.27

ceph-csi-bot · 2023-10-10T19:05:01Z

/test ci/centos/mini-e2e-helm/k8s-1.26

ceph-csi-bot · 2023-10-10T19:05:01Z

/test ci/centos/mini-e2e-helm/k8s-1.27

ceph-csi-bot · 2023-10-10T19:05:02Z

/test ci/centos/k8s-e2e-external-storage/1.28

ceph-csi-bot · 2023-10-10T19:05:02Z

/test ci/centos/mini-e2e/k8s-1.26

ceph-csi-bot · 2023-10-10T19:05:02Z

/test ci/centos/mini-e2e/k8s-1.27

ceph-csi-bot · 2023-10-10T19:05:03Z

/test ci/centos/mini-e2e-helm/k8s-1.28

ceph-csi-bot · 2023-10-10T19:05:04Z

/test ci/centos/mini-e2e/k8s-1.28

Rakshith-R · 2023-10-16T09:28:17Z

@Mergifyio backport release-3.9

mergify · 2023-10-16T09:28:21Z

backport release-3.9

❌ No backport have been created

Backport to branch release-3.9 failed

GitHub error: Branch not found

Rakshith-R · 2023-10-16T09:28:40Z

@Mergifyio backport release-v3.9

mergify · 2023-10-16T09:28:45Z

backport release-v3.9

🟠 Pending

Backport to branch release-v3.9 in progress

mergify bot added the component/cephfs Issues related to CephFS label Oct 5, 2023

Madhu-1 reviewed Oct 5, 2023

View reviewed changes

Rakshith-R force-pushed the fix-crash branch from da3ef26 to 0b47519 Compare October 6, 2023 05:36

Rakshith-R force-pushed the fix-crash branch from 0b47519 to bb07f7d Compare October 6, 2023 08:19

Rakshith-R linked an issue Oct 6, 2023 that may be closed by this pull request

Concurrent Map Writes With CephFS Plugin #4162

Closed

Rakshith-R force-pushed the fix-crash branch 2 times, most recently from 61b19fa to 0f0d56a Compare October 6, 2023 08:48

Rakshith-R requested review from nixpanic, Madhu-1 and a team October 6, 2023 08:49

Rakshith-R marked this pull request as ready for review October 6, 2023 08:49

Madhu-1 previously approved these changes Oct 6, 2023

View reviewed changes

Rakshith-R mentioned this pull request Oct 6, 2023

cephfs: Enhance clusterAdditionalInfo states and resizeState #4164

Closed

nixpanic requested changes Oct 6, 2023

View reviewed changes

Rakshith-R force-pushed the fix-crash branch from 0f0d56a to 90dc22b Compare October 6, 2023 11:40

riya-singhal31 previously approved these changes Oct 6, 2023

View reviewed changes

Rakshith-R force-pushed the fix-crash branch from 90dc22b to 450f345 Compare October 6, 2023 11:43

Rakshith-R requested review from nixpanic, Madhu-1 and riya-singhal31 October 6, 2023 11:46

Rakshith-R force-pushed the fix-crash branch from 450f345 to 5c0fece Compare October 6, 2023 11:54

Rakshith-R force-pushed the fix-crash branch from 5c0fece to 70c9084 Compare October 6, 2023 12:54

nixpanic reviewed Oct 6, 2023

View reviewed changes

Rakshith-R force-pushed the fix-crash branch from 7e6309f to 8b9af56 Compare October 10, 2023 10:21

nixpanic requested changes Oct 10, 2023

View reviewed changes

Rakshith-R requested a review from nixpanic October 10, 2023 12:23

Rakshith-R force-pushed the fix-crash branch from 8b9af56 to 701b0d8 Compare October 10, 2023 15:06

nixpanic approved these changes Oct 10, 2023

View reviewed changes

riya-singhal31 approved these changes Oct 10, 2023

View reviewed changes

Rakshith-R added 2 commits October 10, 2023 19:04

Rakshith-R force-pushed the fix-crash branch from 701b0d8 to 4d6a111 Compare October 10, 2023 19:04

mergify bot added the ok-to-test Label to trigger E2E tests label Oct 10, 2023

ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Oct 10, 2023

mergify bot merged commit d516a1d into ceph:devel Oct 10, 2023
34 checks passed

		var clusterAdditionalInfo = make(map[string]*localClusterState)
		var clusterAdditionalInfoMutex sync.Mutex

		if clusterAdditionalInfo[s.clusterID].resizeState.Load() == unknown \|\|
		clusterAdditionalInfo[s.clusterID].resizeState.Load() == supported {

cephfs: safeguard localClusterState struct from race conditions #4163

cephfs: safeguard localClusterState struct from race conditions #4163

Conversation

Rakshith-R commented Oct 5, 2023 • edited

Describe what this PR does

Related issues

Rakshith-R commented Oct 5, 2023

Madhu-1 commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Oct 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Oct 6, 2023

Rakshith-R commented Oct 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R Oct 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Oct 10, 2023

riya-singhal31 commented Oct 10, 2023

mergify bot commented Oct 10, 2023 • edited

✅ The pull request has been merged automatically

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

ceph-csi-bot commented Oct 10, 2023

Rakshith-R commented Oct 16, 2023

mergify bot commented Oct 16, 2023

❌ No backport have been created

Rakshith-R commented Oct 16, 2023

mergify bot commented Oct 16, 2023

🟠 Pending

Rakshith-R commented Oct 5, 2023 •

edited

Rakshith-R Oct 10, 2023 •

edited

mergify bot commented Oct 10, 2023 •

edited