Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Kine fix when rke2 restart apiserver #5931

Merged
merged 1 commit into from
May 24, 2024

Conversation

vitorsavian
Copy link
Member

@vitorsavian vitorsavian commented May 20, 2024

Proposed Changes

  • Mount the server directory to when rke2 restarts it doesn't stuck trying to connect to the kine socket

Types of Changes

  • Bugfix

Verification

To verify with sqlite you need to create a db file

touch /var/lib/rancher/rke2/server/db/rke2-kine.db

now to test rke2 you need to set the config.yaml with this setting

datastore-endpoint: sqlite:///var/lib/rancher/rke2/server/db/rke2-sqlite.db

and then after the server inits, you need to restart and then wait for the apiserver restart

Testing

Linked Issues

User-Facing Change

Fix apiserver delay to restart when apiserver is using kine

Further Comments

@vitorsavian vitorsavian force-pushed the kine-restart-fix branch 2 times, most recently from 1dc785a to 7f1fbe0 Compare May 20, 2024 20:05
@vitorsavian vitorsavian marked this pull request as ready for review May 20, 2024 21:15
@vitorsavian vitorsavian requested a review from a team as a code owner May 20, 2024 21:15
Copy link
Contributor

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on comment, lgtm otherwise

pkg/podexecutor/staticpod.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 26.61%. Comparing base (3e0fb75) to head (b1953f3).
Report is 2 commits behind head on master.

Files Patch % Lines
pkg/podexecutor/staticpod.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5931      +/-   ##
==========================================
+ Coverage   26.46%   26.61%   +0.15%     
==========================================
  Files          30       31       +1     
  Lines        2649     2645       -4     
==========================================
+ Hits          701      704       +3     
+ Misses       1903     1895       -8     
- Partials       45       46       +1     
Flag Coverage Δ
inttests 10.01% <0.00%> (+0.01%) ⬆️
unittests 18.94% <0.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Vitor Savian <[email protected]>

Remove unnecessary socket code

Signed-off-by: Vitor Savian <[email protected]>
@VestigeJ
Copy link
Contributor

VestigeJ commented May 28, 2024

Presently A delay of ~7 minutes is observed from a systemctl restart of the service - ready to validate backports when they merge

##Environment Details
Reproduced using VERSION=v1.30.1+rke2r1
Validated using COMMIT=3c4642bd1ebef5b2deb6f701f6d76b0d78698184

Infrastructure

  • Cloud

Node(s) CPU architecture, OS, and version:

Linux 5.14.21-150500.53-default x86_64 GNU/Linux
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"

Cluster Configuration:

NAME             STATUS   ROLES                  AGE     VERSION
ip-1-1-1-1       Ready    control-plane,master   4h57m   v1.30.1+rke2r1

Config.yaml:

node-external-ip: 1.1.1.1
token: YOUR_TOKEN_HERE
write-kubeconfig-mode: 644
debug: true
embedded-registry: true
datastore-endpoint: sqlite:///var/lib/rancher/rke2/server/db/rke2-sqlite.db

steps

$ curl https://get.rke2.io --output install-"rke2".sh
$ sudo chmod +x install-"rke2".sh
$ sudo groupadd --system etcd && sudo useradd -s /sbin/nologin --system -g etcd etcd
$ sudo modprobe ip_vs_rr
$ sudo modprobe ip_vs_wrr
$ sudo modprobe ip_vs_sh
$ sudo printf "on_oovm.panic_on_oom=0 \nvm.overcommit_memory=1 \nkernel.panic=10 \nkernel.panic_ps=1 \nkernel.panic_on_oops=1 \n" > ~/60-rke2-cis.conf
$ sudo cp 60-rke2-cis.conf /etc/sysctl.d/
$ sudo systemctl restart systemd-sysctl
$ sudo mkdir -p /var/lib/rancher/rke2/server/db/
$ sudo touch /var/lib/rancher/rke2/server/db/rke2-sqlite.db
$ VERSION=v1.30.1+rke2r1
$ sudo INSTALL_RKE2_VERSION=$VERSION INSTALL_RKE2_EXEC=server ./install-rke2.sh
$ go_rke2
$ set_kubefig
$ w2 kg no,po,svc -A
$ date;sudo systemctl restart rke2-server; date
--- log time here ---
$ kgn
$ COMMIT=3c4642bd1ebef5b2deb6f701f6d76b0d78698184
$ sudo INSTALL_RKE2_COMMIT=$COMMIT INSTALL_RKE2_EXEC=server ./install-rke2.sh
$ sudo systemctl restart rke2-server
$ date;sudo systemctl restart rke2-server; date;

Results:

$ date;sudo systemctl restart rke2-server
Tue 28 May 2024 10:01:11 PM UTC
----- elapsed time -----
$ date
Tue 28 May 2024 10:08:01 PM UTC

$ kgn

NAME             STATUS   ROLES                  AGE     VERSION
ip-1-1-1-1       Ready    control-plane,master   4h12m   v1.30.1+rke2r1

New behavior

$ date;sudo systemctl restart rke2-server; date;

Tue 28 May 2024 10:48:59 PM UTC
Tue 28 May 2024 10:49:21 PM UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants