-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDN-4168: Optimize timing for IPsec tests #28797
base: master
Are you sure you want to change the base?
Conversation
9b28169
to
95898fc
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pperiyasamy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
95898fc
to
94bd34b
Compare
@pperiyasamy: This pull request references SDN-4168 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Job Failure Risk Analysis for sha: 94bd34b
|
/assign @jcaamano |
94bd34b
to
59a8b3a
Compare
Job Failure Risk Analysis for sha: 59a8b3a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pperiyasamy I think we need to find a way to test this in CI without merging
return wait.PollUntilContextTimeout(context.Background(), ipsecRolloutWaitInterval, | ||
ipsecRolloutWaitDuration, true, func(ctx context.Context) (bool, error) { | ||
ds, err := getDaemonSet(oc, ovnNamespace, ovnIPsecDsName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment here that you expect the daemon set to not be deployed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/extended/networking/ipsec.go
Outdated
err := checkDstNodeTraffic(dst) | ||
if err != nil { | ||
return fmt.Errorf("error capturing traffic on the dst node: %v", err) | ||
} | ||
return nil | ||
}) | ||
pingSync.Go(func() error { | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this wait group is going to help
I would just increase the ping counter if you think we need that
@@ -371,6 +388,7 @@ var _ = g.Describe("[sig-network][Feature:IPsec]", g.Ordered, func() { | |||
o.Expect(err).To(o.HaveOccurred()) | |||
err = pingAndCheckNodeTraffic(config.srcNodeConfig, config.dstNodeConfig, icmp) | |||
o.Expect(err).To(o.HaveOccurred()) | |||
err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized you probably need to consider using GinkgoHelper()
if you are going to use gomega on helper functions
https://onsi.github.io/gomega/#making-assertions-in-helper-functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err is expected to happen here. so err = nil
is needed to make defer block clean up test pods.
added GinkgoHelper() in helper functions.
test/extended/networking/ipsec.go
Outdated
o.Expect(nsCertMachineConfig).NotTo(o.BeNil()) | ||
o.Eventually(func() bool { | ||
pools, err := getMachineConfigPoolByLabel(oc, workerRoleMachineConfigLabel) | ||
o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized your function needs to take a gomega instance as parameter if you plan to use it within Eventually blocks
https://onsi.github.io/gomega/#category-3-making-assertions-eminem-the-function-passed-into-codeeventuallycode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done.
@@ -511,37 +551,17 @@ var _ = g.Describe("[sig-network][Feature:IPsec]", g.Ordered, func() { | |||
checkNodeTraffic(v1.IPsecModeDisabled) | |||
|
|||
g.By(fmt.Sprintf("configure IPsec in %s mode and validate traffic", mode)) | |||
// Change IPsec mode to External and packet capture on the node's interface | |||
// must be geneve encapsulated ones. | |||
// Change IPsec mode to given mode and do packet capture on the node's interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This err := configureIPsecMode(oc, mode)
is not causing a reboot when ipsec machine config extensions are deployed? Is that because we the CI job starts deploys the cluster with IPSec already enabled? In Full mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is what we discussed last week. We'll have CI job starting cluster with IPsec full mode and then execute the tests with minimal number of node reboots
if !time.Now().Before(certExpirationDate) { | ||
framework.Failf("certficates in the Machine Config are expired, Please consider recreating those certificates") | ||
} | ||
nsCertMachineConfig, err := createIPsecCertsMachineConfig(oc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this does not cause a reboot because they are just ceerts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still causes reboot of worker nodes and certs are deployed via machine config.
The ipsec test suite takes about 5-6 hours to finish running the tests which doesn't work well with e2e-aws-ovn-ipsec-serial CI lane as it produces many disruptive events and failing all the time. Hence this commit optimizes ipsec tests so that it doesn't cause any cluster reboot when changing ipsec modes. Signed-off-by: Periyasamy Palanisamy <[email protected]>
59a8b3a
to
ff43448
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to find a way to test this in CI without merging
yes, we must merge this CI lane openshift/release#50687 to get this tested. cc @jluhrsen
return wait.PollUntilContextTimeout(context.Background(), ipsecRolloutWaitInterval, | ||
ipsecRolloutWaitDuration, true, func(ctx context.Context) (bool, error) { | ||
ds, err := getDaemonSet(oc, ovnNamespace, ovnIPsecDsName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -371,6 +388,7 @@ var _ = g.Describe("[sig-network][Feature:IPsec]", g.Ordered, func() { | |||
o.Expect(err).To(o.HaveOccurred()) | |||
err = pingAndCheckNodeTraffic(config.srcNodeConfig, config.dstNodeConfig, icmp) | |||
o.Expect(err).To(o.HaveOccurred()) | |||
err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err is expected to happen here. so err = nil
is needed to make defer block clean up test pods.
added GinkgoHelper() in helper functions.
if !time.Now().Before(certExpirationDate) { | ||
framework.Failf("certficates in the Machine Config are expired, Please consider recreating those certificates") | ||
} | ||
nsCertMachineConfig, err := createIPsecCertsMachineConfig(oc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still causes reboot of worker nodes and certs are deployed via machine config.
@@ -511,37 +551,17 @@ var _ = g.Describe("[sig-network][Feature:IPsec]", g.Ordered, func() { | |||
checkNodeTraffic(v1.IPsecModeDisabled) | |||
|
|||
g.By(fmt.Sprintf("configure IPsec in %s mode and validate traffic", mode)) | |||
// Change IPsec mode to External and packet capture on the node's interface | |||
// must be geneve encapsulated ones. | |||
// Change IPsec mode to given mode and do packet capture on the node's interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is what we discussed last week. We'll have CI job starting cluster with IPsec full mode and then execute the tests with minimal number of node reboots
test/extended/networking/ipsec.go
Outdated
o.Expect(nsCertMachineConfig).NotTo(o.BeNil()) | ||
o.Eventually(func() bool { | ||
pools, err := getMachineConfigPoolByLabel(oc, workerRoleMachineConfigLabel) | ||
o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done.
/test ? |
@pperiyasamy: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test e2e-aws-ovn-ipsec-serial |
@pperiyasamy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: ff43448
|
The ipsec test suite takes about 5-6 hours to finish running the tests which doesn't work well with e2e-aws-ovn-ipsec-serial CI lane as it produces many disruptive events and failing all the time. Hence this commit optimizes ipsec tests so that it doesn't cause any cluster reboot when changing ipsec modes.