Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support migrate slot range[draft] #2389

Draft
wants to merge 6 commits into
base: unstable
Choose a base branch
from

Conversation

PokIsemaine
Copy link
Contributor

@PokIsemaine PokIsemaine commented Jul 1, 2024

issues: #2355

This draft PR demonstrates how to support migrating slot ranges.

What I Did

1 Migration job - 1 slot range:

  • I encapsulated a SlotRange structure and changed the migration-related class members from a single slot to a slot range.
  • Reference: [NEW] Support slot-based data migration #412. Slot migration includes the following phases: start migration, migrate existing data, migrate incremental data, and end migration. In each modified phase, the entire slot range must be completed before moving to the next phase.

TODO

TODO represents the items I hope to discuss.

  1. Support multiple slot ranges:

    • Current situation: Only one migration job can be performed at a time, and each migration job corresponds to a slot range.
    • Possible modification: Allow multiple migration jobs, migrating sequentially or in parallel.
  2. Perform multiple migrations consecutively but do not immediately use setslot to update the topology, referring to the example in TestSlotRangeMigrate:

	t.Run("MIGRATE - Repeat migration cases, but does not immediately update the topology via setslot", func(t *testing.T) {
		// Disjoint
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "114-116", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "114-116", SlotMigrationStateSuccess)
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "117-118", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "117-118", SlotMigrationStateSuccess)
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "112-113", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "112-113", SlotMigrationStateSuccess)

		errMsg := "Can't migrate slot which has been migrated"
		// TODO: Migrating 112-113, but 114-118 is covered and cannot be detected.
		// require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "114-116", id1).Err(), errMsg)
		// require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "117-118", id1).Err(), errMsg)

		// Intersection
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-112", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "113", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "113-113", id1).Err(), errMsg)

		// Subset
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-113", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-120", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "110-112", id1).Err(), errMsg)
	})

This situation also seems to exist in the original single slot migration, and I am not sure if such operations are reasonable.

migrate A => migrate B => setslot A&B
migrate A => migrate B => migrate A (expected to fail, but allowed to pass) => setslot A&B

// slotrange A-C
migrate A-B => migrate C => setslot A-C
migrate A-B => migrate C => migrate B (expected error, but allowed to pass) => setslot A-C
  1. More precise migration failure slot range:
    The current implementation determines the entire slot range to fail and cleans it up, and the user later migrates the entire slot range again.
    Do we want to support a more precise failure range? For example, [start_slot-fail_slot), [fail_slot, end]. Users can check the status with commands such as cluster info and then re-migrate the failed slot range by themselves. (Personally, I think it's a bit cumbersome and error-prone)

Miscellaneous

Other suggestions are welcome, such as more testing, code optimization, better user interaction, etc.

@PokIsemaine PokIsemaine marked this pull request as draft July 1, 2024 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant