Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload the package zip in chunks for GHA #1043

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

quyykk
Copy link
Contributor

@quyykk quyykk commented Apr 28, 2023

This fixes microsoft/vcpkg#31072 and microsoft/vcpkg#31132.

It changes the upload to the cache so that it happens in chunks of 450MB, instead of all at once. This is because GitHub rejects uploads bigger than ~500MB (educated guess) to its cache.

The implementation is a bit hacky though, but I haven't found a better solution: It splits the file into multiple 450MB chunk files on disk.

It now reads 450MB chunks from the file at a time.

@BillyONeal
Copy link
Member

Sorry for the noise, trying to verify that the transition over to GitHub Actions for the PR bot is working...

src/vcpkg/archives.cpp Outdated Show resolved Hide resolved
src/vcpkg/archives.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
@quyykk quyykk requested a review from BillyONeal May 23, 2023 19:59
@autoantwort
Copy link
Contributor

I haven't found a better solution: It splits the file into multiple 450MB chunk files on disk.

You could pass the data via stdin.

@quyykk
Copy link
Contributor Author

quyykk commented Jun 4, 2023

You could pass the data via stdin.

I could but I honestly have no idea how.

@autoantwort
Copy link
Contributor

I could but I honestly have no idea how.

There is now #1134 :)

@quyykk
Copy link
Contributor Author

quyykk commented Aug 30, 2023

I've switched it to use stdin instead, and it still works 😄. This PR is again ready for review

@quyykk
Copy link
Contributor Author

quyykk commented Sep 7, 2023

Seems like the formatting errors are caused by GitHub upgrading their clang-format, and not my fault. 😄

src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/downloads.cpp Outdated Show resolved Hide resolved
@autoantwort
Copy link
Contributor

I always also build the format target locally so that the format is always right (if you use clang-format 16)

StringView(buffer.data(), bytes_read));
if (!res.get() || *res.get() != 0 || (code >= 100 && code < 200) || code >= 300)
{
return msg::format_error(msgCurlFailedToPutHttp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically not completely true, but I don't care

@quyykk
Copy link
Contributor Author

quyykk commented Sep 14, 2023

@BillyONeal friendly ping 😄

@DerZade
Copy link

DerZade commented Jan 22, 2024

What is the status of this? 🤔

@quyykk
Copy link
Contributor Author

quyykk commented Mar 11, 2024

I fixed the merge conflicts that accumulated. Just needs someone from the team to review 😄

base_cmd.string_arg(url);

auto file_ptr = fs.open_for_read(file, VCPKG_LINE_INFO);
std::vector<char> buffer(chunk_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the default size is 450 MB. And no limits.
Is there an alternative to reading it into a buffer first just to forward it to anothers command stdin?
(Remembering all those Raspi people which struggle to build vcpkg due to low memory...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original way was to split the file on disk, but that's pretty hacky I think.

But I can decrease the buffer size. I'm not sure what you mean with limit.

Are people really running a GitHub Runner server on a Raspi lmao 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are people really running a GitHub Runner server on a Raspi lmao

Well, this is only the tool uploading the artifacts. Caching large artifacts is more important when build machine power is low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. What buffer size do you think I should use? I can't make it really small or else the upload will be way slower than it would otherwise be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I see the trade-offs and barriers.

  • Can't make curl read chunks directly from (within) a large file.
  • Can't feed (the vcpkg function running) curl with piecewise input. (IO buffer smaller than network chunks.)

Changing curl (tool) is out of scope here.
If the interface remains running curl instead of calling into libcurl, then it would be best to fix the second point.
If this is too intrusive, it might be helpful to have a way for the user to change the buffer size, or at least to turn of the buffering in case of trouble.

std::size_t bytes_read = 0;
for (std::size_t i = 0; i < file_size; i += bytes_read)
{
bytes_read = file_ptr.read(buffer.data(), sizeof(decltype(buffer)::value_type), chunk_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a whole curl process launch per chunk like this is kind of a problem. I don't see reasonable ways to achieve the effect this PR wants without linking with libcurl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could still be done like this but the chunk sizes would have to be bigger than make sense to denote as a single contiguous memory buffer; there should be more than one read / write per curl launch, etc.......

and that sounds like a lot more work than linking with libcurl.

@Neumann-A
Copy link
Contributor

#1422 pulls in libcurl.

@Neumann-A Neumann-A mentioned this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants