Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Build Cache #1431

Closed
tustvold opened this issue Aug 6, 2019 · 18 comments
Closed

Remote Build Cache #1431

tustvold opened this issue Aug 6, 2019 · 18 comments

Comments

@tustvold
Copy link

tustvold commented Aug 6, 2019

Currently ClonerUsingGitExec creates a new directory within /tmp and checks out remote git repositories to that location. This is not ideal for many reasons:

  • Makes iteration on a kustomize patch for a remote config incredibly slow and arduous as it checks out the repository every single time
  • /tmp is often tmpfs and so checking out large git repositories to it is potentially ill advised
  • If kustomize exits with an error it will leak the files in /tmp - if using tmpfs this is especially bad

I would like to propose some way to instruct kustomize to use a specific directory to check out remotes persistently, potentially via an environment variable. This would resolve all of the above and make kustomize a lot less painful to use on large projects.

This would potentially mitigate #1132 and #1147

@pyaillet
Copy link
Contributor

I think it's already possible to instruct kustomize to use a specific directory by using the environment variables used by golang itself, see here and there

However, I would too like to see some remote build cache implemented.
I have tried a naive implementation of such a thing locally.

  • Without cache:

kustomize build . > out.nocache 23,98s user 29,99s system 38% cpu 2:19,85 total

  • With cache:

~/Tools/go/src/sigs.k8s.io/kustomize/kustomize build . > out.cache 2,55s user 0,66s system 68% cpu 4,658 total

Would it be acceptable to have a shared temp base folder to handle cleanup globally ?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dsyer
Copy link

dsyer commented May 12, 2020

/reopen
/remove-lifecycle rotten

Please re-open. The build times are excruciating.

@k8s-ci-robot
Copy link
Contributor

@dsyer: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle rotten

Please re-open. The build times are excruciating.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 12, 2020
@ringerc
Copy link

ringerc commented Sep 20, 2021

See also #1735 #2460

@ringerc
Copy link

ringerc commented Sep 20, 2021

The option to specify a map of --reference repos would be a big help if people are worried about cache freshness. That way you can clone very quickly while still being guaranteed a fresh result.

@ringerc
Copy link

ringerc commented Sep 20, 2021

Looking at https://github.com/kubernetes-sigs/kustomize/blob/master/api/internal/git/cloner.go, an alternative would be the ability to specify extra git config lines for the repos inited by kustomize for fetching. This could be used to inject url.XXXX.insteadOf.YYYY options.

Right now this only appears to be possible using git config --global, which is clumsy and can have unwanted side-effects.

The only workaround I can see so far is to put a wrapper for the git command on the PATH for kustomize, where the wrapper re-writes URIs, injects options, etc. Or it can set up .git/objects/info/alternates after git init to point to a reference repo. But this is error prone and very awkward to use.

Ideally kustomize's cloner.go should support:

  • git clone --reference-if-able some-local-path
  • remapping the URIs using a configuration passed to kustomize as a command-line argument and/or as a map in the kustomization.yaml file
  • bases as a map of separate uri and ref keys instead of the combined - https://some/uri?ref=foo, for easier rewriting and var substitution

@edrandall
Copy link

/reopen
/remove-lifecycle-rotten

@k8s-ci-robot
Copy link
Contributor

@edrandall: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle-rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@edrandall
Copy link

The rotten issue robot is a backlog antipattern, particularly when a project is clearly underresourced.

@ringerc
Copy link

ringerc commented May 4, 2022

Also, note that while it's tempting to think you can add a --reference-if-able ~/.kustomize/gitcache or something, where a single bare git repo serves as a cache for multiple upstreams, this will fall down rapidly. Git's logic for finding common commits between a reference repo and an upstream to clone does not scale to very large numbers of heads in in the reference repo, and will quickly become extremely slow. In practice you need one reference repo to serve as a cache for each upstream repo.

@ringerc
Copy link

ringerc commented May 4, 2022

There might be a truly nasty way to make this work without patching kustomize, and a clean way to solve this in kustomize.

Use the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable to essentially inject reference repos into every git invocation under kustomize's control.

Here's a demo showing how the alternates mechanism works:

# normal clone
git clone --bare https://github.com/kubernetes-sigs/kustomize kustomize-cache
# now re-clone with the other repo as a cache. It'll be almost instant.
GIT_ALTERNATE_OBJECT_DIRECTORIES=kustomize-cache/objects git clone https://github.com/kubernetes-sigs/kustomize

Or closer to how kustomize's git fetching works:

REF=master
git init kustomize
mkdir -p kustomize/.git/objects/info
echo "$(realpath kustomize-cache/objects)" >> kustomize/.git/objects/info/alternates
git -C kustomize remote add origin https://github.com/kubernetes-sigs/kustomize
git -C kustomize fetch --no-tags --depth=1 origin $REF

Note it's nearly instant.

Unfortunately I haven't found any improvement when testing with kustomize. In fact, it's slower. Presumably it's sending all the candidate cache refs, and none are matching, perhaps because my refs are to specific commit hashes not branch heads. I'll look into it soon.

I suspect it would work much better if implemented in https://github.com/kubernetes-sigs/kustomize/blob/master/api/internal/git/cloner.go with a caching cloner option or something, since then it could set up a GIT_ALTERNATE_OBJECT_DIRECTORIES that points only to a cache for a specific repo, reducing the number of refs that must be checked, or set up a .git/objects/info/alternates file before it does a git fetch.

It could also populate the cache if it doesn't already exist.

@natasha41575
Copy link
Contributor

FWIW we are working on #3980, which I believe will resolve the use cases described in this issue.

@ringerc
Copy link

ringerc commented May 5, 2022

@natasha41575 That sounds cool and useful. However it will not solve the use cases here IMO.

In particular, for CI systems, pre-commit hooks, and other automation, you often have almost the same kustomizations being built running repeatedly, but with a few small variations.

A fully localized, cacheable tree doesn't help much with that unless the cache can be used as a base to be refreshed with only more recent updates.

In those cases what's really desirable is:

  • For git remote bases, a reference repository to use when cloning the remote;
  • For https file remote bases, a cache of downloaded files, and http requests that can send If-Modified-Since cache headers so servers can respond with 304 when the resource is unchanged

Of the two, git repo caching will have the biggest benefits, and is easiest.

I'm experimenting with a hacky PoC in a local kustomize branch to see if I can make it work.

@briceburg
Copy link

Agree w/ ^^^.

I'd like to use a kustomize repo where we develop and version shared bases and components in a conventional way.

e.g.

resources:
- https://github.com/acme/kustomize//bases/service?ref=v1

components:
- https://github.com/acme/kustomize//components/ingress-public/overlays/dev?ref=v1
- https://github.com/acme/kustomize//components/secretstore/overlays/dev?ref=v1
- ../../components/secrets

... and to avoid the maintenance burden and duplicate code (wet not DRY?) of vendoring.

Not sure this issue is closed, can the bot re-open ? =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants