Etcd member down after chaning OS from Centos7 to Rocky Linux9

Hi everyone,
I have a k8s cluster(ver. 1.21.10) contains 3 masters(OS: Centos7) 2 workers(OS: Centos7) and 2 LB
I changed the master’s os one by one to Rocky Linux 9, it goes well when I changed the first 2 masters and joined in my cluster, but after changing the 3rd master’s os and join the cluster for 30mins or longer, there’s something wrong with it:

  1. usually the 3rd etcd pod keeps CrashLoopBackOff (sometimes random one);
  2. the machine which etcd keeps CrashLoopBackOff become very slow(I type a letter it appers after 10s and ping more than 1000ms), I checked the CPU and memory usage, neither of them is high;
  3. my cluster recovers after deleting the 3rd etcd member(but in my circumstance I need 3 masters).

etcd Version: 3.4.13

Any help is much appreciated!

TBH this problem is with K8S, so you would most likely get a lot more help from K8s forums. But the topic can remain open here in case anyone uses it.

In OpenShift, when removing a node to re-add it, you need to remóve the entry from the etcd database and only at this point will the new bootstrapped node add itself correctly to the etcd database. I expect it could be this unless you’ve already done that. You can use etcdctl commands to do that.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.