Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait for kube controllers to be ready before starting informers #17855

Merged
merged 2 commits into from
Jan 4, 2018

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Dec 18, 2017

Fixes the various controller panic problems.

/assign derekwaynecarr
/assign gnufied

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 18, 2017
@openshift-merge-robot openshift-merge-robot added the vendor-update Touching vendor dir or related files label Dec 18, 2017
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2017
} else {
StartInformers(ctx.Stop)
}
close(ctx.InformersStarted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So As I understand it - the reason we had to make this change is because - in case of openshift (where StartInformers is not nil), we do not want to close the ctx.InformersStarted channel immediately after calling StartInformers because we want control over that in openshift code.

Do we want to backport this to upstream at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So As I understand it - the reason we had to make this change is because - in case of openshift (where StartInformers is not nil), we do not want to close the ctx.InformersStarted channel immediately after calling StartInformers because we want control over that in openshift code.

Correct

Do we want to backport this to upstream at all?

No. In 3.9, we plan to start executing this separately.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 18, 2017

/retest

1 similar comment
@deads2k
Copy link
Contributor Author

deads2k commented Dec 18, 2017

/retest

@derekwaynecarr
Copy link
Member

/lgtm
/approve no-issue

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 18, 2017
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, derekwaynecarr

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@deads2k
Copy link
Contributor Author

deads2k commented Dec 18, 2017

/retest

2 similar comments
@deads2k
Copy link
Contributor Author

deads2k commented Dec 19, 2017

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Dec 19, 2017

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Dec 20, 2017

@stevekuznetsov are the e2e tests borked?

@deads2k
Copy link
Contributor Author

deads2k commented Dec 20, 2017

/retest

@stevekuznetsov
Copy link
Contributor

/retest

No idea -- router tests seem 100% broken in that last run

@stevekuznetsov
Copy link
Contributor

We're not seeing that in other runs of the job. Could be local to this.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 21, 2017

We're not seeing that in other runs of the job. Could be local to this.

@stevekuznetsov It's happening on every pull to older branches. See #17896 and #17807 and and #17849

@stevekuznetsov
Copy link
Contributor

@deads2k OK -- can you help root-cause it or determine which merge into the branch broke it? Or ask @knobunc to nominate someone from his team to dig?

@deads2k
Copy link
Contributor Author

deads2k commented Dec 21, 2017

@deads2k OK -- can you help root-cause it or determine which merge into the branch broke it? Or ask @knobunc to nominate someone from his team to dig?

It wasn't us. Nothing merged since the last successful merge here (#17690) and that passed while running those tests. We merged zero commits and now it breaks every time. It's something in the infrastructure.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17690/test_pull_request_origin_end_to_end/6930/

Passed

github.com/openshift/origin/test/end-to-end TestIngressConfiguredRouter
github.com/openshift/origin/test/end-to-end TestRouter
github.com/openshift/origin/test/end-to-end TestRouterBindsPortsAfterSync
github.com/openshift/origin/test/end-to-end TestRouterDuplications
github.com/openshift/origin/test/end-to-end TestRouterHealthzEndpoint
github.com/openshift/origin/test/end-to-end TestRouterPathSpecificity
github.com/openshift/origin/test/end-to-end TestRouterReloadCoalesce
github.com/openshift/origin/test/end-to-end TestRouterServiceUnavailable
github.com/openshift/origin/test/end-to-end TestRouterStatsPort

@deads2k
Copy link
Contributor Author

deads2k commented Dec 21, 2017

@deads2k OK -- can you help root-cause it or determine which merge into the branch broke it? Or ask @knobunc to nominate someone from his team to dig?
It wasn't us. Nothing merged since the last successful merge here (#17690) and that passed while running those tests. We merged zero commits and now it breaks every time. It's something in the infrastructure.

@stevekuznetsov aka "I didn't ask at random"

@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented Dec 25, 2017

This is probably related to the Go version on the AMI. We have never really had an approach to versioning the AMIs and approaches that would maintain N AMIs for each version we needed were shot down in the distant past. Right now I think we should just ignore these tests for the older branches. In the future we need to think critically about whether or not we can either containerize the test as-is or remove it's dependency on Docker so that it can run wherever.

@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift openshift deleted a comment from openshift-bot Jan 2, 2018
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@soltysh
Copy link
Member

soltysh commented Jan 3, 2018

/retest

@mfojtik
Copy link
Contributor

mfojtik commented Jan 3, 2018

/retest

@deads2k this now should work I hope

@deads2k
Copy link
Contributor Author

deads2k commented Jan 3, 2018

/retest

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@deads2k
Copy link
Contributor Author

deads2k commented Jan 3, 2018

/retest

@stevekuznetsov
Copy link
Contributor

There is no point in re-testing this PR. As explained above we do not have support in the Origin tests for backports to a version of Origin that does not use the same Go version as master. Green-button this or try to get the OSE CI to cooperate, as it has that support.

@deads2k
Copy link
Contributor Author

deads2k commented Jan 4, 2018

There is no point in re-testing this PR. As explained above we do not have support in the Origin tests for backports to a version of Origin that does not use the same Go version as master. Green-button this or try to get the OSE CI to cooperate, as it has that support.

We updated the tests. Note the green e2e run.

/retest

@mfojtik
Copy link
Contributor

mfojtik commented Jan 4, 2018

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 4, 2018

@deads2k: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_install 5f41e97 link /test extended_conformance_install
ci/openshift-jenkins/end_to_end 5f41e97 link /test end_to_end

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@deads2k
Copy link
Contributor Author

deads2k commented Jan 4, 2018

This had all tests green. CI seems globally broken now, but it clears a blocker bug. The two other changes that went in are unrelated. Merging.

@deads2k deads2k merged commit b6e5465 into openshift:release-3.7 Jan 4, 2018
@deads2k deads2k deleted the controller-16-wait branch January 24, 2018 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants