Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple simultaneous snapshots result in silent failure and/or corruption of at least one snapshot #6226

Closed
brandond opened this issue Jun 19, 2024 · 1 comment
Assignees

Comments

@brandond
Copy link
Contributor

@ShylajaDevadiga
Copy link
Contributor

Validated on rke2 version v1.30.2-rc5+rke2r1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:

NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"

Cluster Configuration:

3 server, 1 agent

Steps to validate the fix

  1. Install rke2
  2. Take etcd snapshot more than once at the same time
  3. Check logs

Reproduction results:
I do not see the expected error message while taking snapshot simultaneously but can confirm snapshot is not created

# rke2 -v
rke2 version v1.30.1+rke2r1 (e7f87c6dd56fdd76a7dab58900aeea8946b2c008)
go version go1.22.2 X:boringcrypto

# /usr/local/bin/rke2 etcd-snapshot save & /usr/local/bin/rke2 etcd-snapshot save 
[1] 6801
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping 
WARN[0000] Unknown flag --tls-san found in config.yaml, skipping 
WARN[0000] Unknown flag --profile found in config.yaml, skipping 
WARN[0000] Unknown flag --selinux found in config.yaml, skipping 
WARN[0000] Unknown flag --node-external-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping 
WARN[0000] Unknown flag --tls-san found in config.yaml, skipping 
WARN[0000] Unknown flag --profile found in config.yaml, skipping 
WARN[0000] Unknown flag --selinux found in config.yaml, skipping 
WARN[0000] Unknown flag --node-external-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
INFO[0001] Snapshot on-demand-ip-172-31-2-16.us-east-2.compute.internal-1719028186 saved. 
[1]+  Done                    /usr/local/bin/rke2 etcd-snapshot save
ip-172-31-7-83:~ # journalctl -u rke2-server |grep "Failed to take etcd snapshot"
ip-172-31-7-83:~ # sudo ls -lrt /var/lib/rancher/rke2/server/db/snapshots
total 0
ip-172-31-7-83:~ # 

Validation results:

/usr/local/bin/rke2 -v
rke2 version v1.30.2-rc5+rke2r1 (3f678f964ad849e24449e49f0c2c44e75d944c9f)
go version go1.22.4 X:boringcrypto
> sudo /usr/local/bin/rke2 etcd-snapshot save & sudo /usr/local/bin/rke2 etcd-snapshot save & sleep 5
[1] 17603
[2] 17604
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping 
WARN[0000] Unknown flag --tls-san found in config.yaml, skipping 
WARN[0000] Unknown flag --profile found in config.yaml, skipping 
WARN[0000] Unknown flag --selinux found in config.yaml, skipping 
WARN[0000] Unknown flag --node-external-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping 
WARN[0000] Unknown flag --tls-san found in config.yaml, skipping 
WARN[0000] Unknown flag --profile found in config.yaml, skipping 
WARN[0000] Unknown flag --selinux found in config.yaml, skipping 
WARN[0000] Unknown flag --node-external-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
WARN[0000] Unknown flag --node-label found in config.yaml, skipping 
FATA[0000] see server log for details: Internal error occurred: etcd-snapshot error ID 54174 
INFO[0001] Snapshot on-demand-ip-172-31-9-91.us-east-2.compute.internal-1719026468 saved. 
[1]-  Exit 1                  sudo /usr/local/bin/rke2 etcd-snapshot save
[2]+  Done                    sudo /usr/local/bin/rke2 etcd-snapshot save
> sudo ls -lrt /var/lib/rancher/rke2/server/db/snapshots
total 24096
-rw------- 1 root root 12333088 Jun 22 03:21 on-demand-ip-172-31-9-91.us-east-2.compute.internal-1719026468
-rw------- 1 root root 12333088 Jun 22 03:50 on-demand-ip-172-31-9-91.us-east-2.compute.internal-1719028223
ec2-user@ip-172-31-9-91:~> 

Logs:

ec2-user@ip-172-31-9-91:~> sudo journalctl -u rke2-server |grep "Failed to take etcd snapshot"
ec2-user@ip-172-31-9-91:~>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants