Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rejoined node can not send out the events for watchers #8411

Closed
abel-von opened this issue Aug 17, 2017 · 5 comments
Closed

The rejoined node can not send out the events for watchers #8411

abel-von opened this issue Aug 17, 2017 · 5 comments

Comments

@abel-von
Copy link

Hi, we are doing some HA test for the etcd and kubernetes, and one of the test cases is to cut the network between one node of the etcd cluster and the leader. as we know that etcd can still work even some of the nodes is broken. But when we remain this for a long time (maybe one or two hours) and recover it, We have found that the kube-apiserver can not refresh its cache to the newest value in etcd.

By some investigation, we have found that it is the problem of watching mechanism in etcd. the events of key change are not sent out to apiserver.

And we also found that, if the network is cut for a shot time, it is working correctly. only cut the network for a long time can this issue be produced.

After we produce this and investigate it, we found that it is because that the wal files would be purged every 10000 requests. and after the broken node rejoin to the cluster, the leader will send it snapshot instead of raft logs. the node restore the snapshot to its backend db, but this restore operation is not defined in the watchableStore, so the events of change are not sent out.

@abel-von
Copy link
Author

the version is 3.1.9

@xiang90
Copy link
Contributor

xiang90 commented Aug 17, 2017

@abel-von can you write a simple script/program to reproduce this problem?

@abel-von
Copy link
Author

@xiang90 I have changed the code of watchable_store.go, and the issue has been fixed in my environment. the change is adding a function Restore to type watchableStore. like this:

func (s *watchableStore) Restore(b backend.Backend) error {
	s.mu.Lock()
	defer s.mu.Unlock()
	err := s.store.Restore(b)
	if err != nil {
		return err
	}
	s.unsynced = s.synced
	s.synced = newWatcherGroup()
	s.syncWatchers()
	return nil
}

just sync the watchers after restore.

but the master branch is 3.2.X and the mvcc code has been changed a lot, I will see if the issue still exist in the master branch, if exist, I will submit a PR then.

The steps to reproduce this issue:

  1. start up 3 etcd nodes.
  2. using etcdctl to watch a key, i.e. etcdctl --endpoints http://{brokennodeip}:2379 watch /aaa
  3. cut the network by iptables. iptables -I OUTPUT -d {leaderip} -j DROP and iptables -I INPUT -s {leaderip} -j DROP
    4.use etcdctl to update the the watched key again and again, at least 10000 times.
while true; do
    etcdctl --endpoints=http://{leaderip}:2379 put /aaa hhh
done
  1. recover the network between the node and the leader. iptables -D OUTPUT -d {leaderip} -j DROP and iptables -D INPUT -s {leaderip} -j DROP
  2. we can get the new value of the aaa from the brokennodeip by etcdctl --endpoints http://{brokennodeip}:2379 get /aaa, but the watch in the step 2 couldn't get the events of change.

@heyitsanthony
Copy link
Contributor

OK, so the repro is watch on some member, partition it, write values into the watched key on another member until triggering a snapshot, unpartition member, wait forever on the watch? I wouldn't be surprised if this breaks in 3.2 too; the restore+watch path isn't very well tested.

abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 17, 2017
abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 18, 2017
abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 19, 2017
gyuho pushed a commit to gyuho/etcd that referenced this issue Aug 21, 2017
gyuho pushed a commit that referenced this issue Aug 21, 2017
jpbetz pushed a commit to jpbetz/etcd that referenced this issue Nov 1, 2017
jpbetz pushed a commit to jpbetz/etcd that referenced this issue Nov 3, 2017
chestack pushed a commit to chestack/etcd that referenced this issue Apr 1, 2019
@chestack
Copy link

chestack commented Apr 24, 2019

@abel-von, came into same issue on my environment, thanks for your fixing.

As you mentioned It works correctly without this PR if the network is cut for a shot time. So which is the root cause: node partition for a long time or sufficient events(updated 10000 times) happened during the partition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants