[RFC 0122] IPFS CID optionally on narinfo in binary caches #122

lucasew · 2022-03-07T16:51:06Z

The idea is to provide the CID of the nar file from the binary cache optionally to allow reducing bandwidth costs and in some cases increase efficiency by allowing users to download the binary cache nar files over IPFS

Rendered

This RFC was abandoned by the author as their primary goal was saving upstream bandwidth in a controlled/very limited network with a lot of computers and simpler solutions using the existing binary cache infrastructure, like a local cache, were found.

Signed-off-by: lucasew <[email protected]>

Ericson2314 · 2022-03-07T17:19:20Z

We should be able to use the existing CA field for this. That has many other benefits, too. That is what we did in our IPFS Nix work.

kevincox · 2022-03-09T02:18:40Z

rfcs/0122-binary-cache-ipfs.md

+
+IPFS is still not a present reality on the mainstream Nix ecosystem, altough it's not reliable to store long term data, it can reduce bandwith costs for both the servers and the clients but the question is where the NAR file could be obtained in IPFS.
+
+Its not espected that, for example, cache.nixos.org would run a IPFS daemon for seeding but it could just calculate the hash using `ipfs add -nq $file` and provide it on the narinfo so other nodes can figure out alternative places to download the NAR files, even closer than a CDN could be.


One little concern is that a given file doesn't have exactly one CID. Depending on how you chunk the file you can get effectively unlimited different CIDs. This isn't a problem when the CID distributor starts the seed and the CID stays live on the network because whatever CID is advertised will be fetched. However for the case like this is matters a lot, because different settings will result in a would-be seeder generating the wrong CID.

IIUC the current default for ipfs add is fixed-size blocks of 262144B each (aka size-262144). However for a nixpkgs cache where subsequent versions of a derivation may be largely similar it may make more sense to do a smarter chunker based on a rolling hash.

Anyways, the exact chunking mechanism is bikeshedding, but what do we want to do about this? I see a few main options.

Put the chunker into the narinfo so it can be reproduced. (I don't know if there is a well defined standard format but current go-ipfs uses strings like size-262144 and rabin-2048-65536-131072 which are pretty easy to understand and unlikely to be ambiguous.)

Declare a chunker upfront and expect people to use it. (We can revert to 1 in the future by adding the chunker information later).

Convince cache.nixos.org to also run an IPFS node that advertises the CIDs that are advertised in the narinfo files.

rsync has a pretty interesting algorithm for syncing files https://stackoverflow.com/questions/1535017/rolling-checksums-in-the-rsync-algorithm , there maybe something in that, However probably not directly portable to IPFS and chunking.

I'd vote for 3! and get that working today (or perhaps tomorrow) and think about options 1/2 for the day after tomorrow (or some point in the future).

Thanks for your detailed analysis of this, my understanding of Nars on IPFS has increased!

This is basically equivalent to the Rabin chunking. But the biggest problem isn't what algorithm to use but how to know what algorithm was used.

For this we could do like how we already do with hashes, like sha256:something

AFAIK ipfs has symbol friendly names for the chunking methods

Other possibilities of chunking with casync: https://discourse.nixos.org/t/nix-casync-a-more-efficient-way-to-store-and-substitute-nix-store-paths/16539

I really don't care about the chunking algorithm. Please stop discussing this here.

What I care about is that we record the chunking algorithm in a way that someone who wishes to advertise this path can do so.

edolstra · 2022-03-23T14:23:03Z

This RFC is now open for shepherd nominations!

Ericson2314 · 2022-04-07T18:37:57Z

I supposed I could shepherd this, but really I want and should soon be able to write a counter-proposal RFC for the work we did in 2020. So perhaps there ought to be one shepherd team for two "competing" RFCs (though it's really more prioritizing features than actually disagreement).

tomberek · 2022-04-20T13:14:48Z

I'll volunteer as shepherd. (note from RFCSC: need a few more nominations in the next few weeks, otherwise this will be put on standby)

Ericson2314 · 2022-04-20T16:08:10Z

I am now thinking this is probably fine as a complement.

We did a lot of different things in our 2020 IPFS × Nix saga, but thing thing I would like to focus on first is distributing and archiving source code. Conversely, this mainly about build artifact. Thus, no conflict! I am confident the two approaches will bore the "tunnel" from both ends, and so there will be a grand meeting in the middle eventually.

The one thing I would do is generalize so instead of thinking of IPFS in particular, we think "narinfo" (ValidPathInfo the C++ type) can have a list of "auxiliary" content addresses useful for fetching via other systems. They are "auxiliary" in the sense that they don't effect how the store path is computed. In fact, we can ret-conn today's NAR hash as just another auxiliary content address!

Ericson2314 · 2022-04-20T16:30:31Z

You might take a look at NixOS/nix#3727, which I locally fixed conflicts with. (Tests however, are broken. Still debugging, so didn't push yet.)

That goes a few steps in trying to put the narinfos in IPFS as IPLD rather than files too, but this should be complementary:

We should make a new JSON narinfo format for "regular" binary caches too, as the current line-oriented file makes backwards compatible evolution too hard. (We can simply upload both types of narinfo for compatibility with old versions of Nix if we like.)

If we do that we can also share lots of code between both approaches:

All the "how do I talk to IPFS" legwork can of course be shared.
The JSON serialization can be shared between a "native" IPLD narinfo and legacy file version
Code to deal with getting the file data to/from IPFS can be shared.

kamadorueda · 2022-04-21T17:44:45Z

I nominate myself as a shepherd

Ericson2314 · 2022-04-21T21:22:25Z

Looks like we have the required number! :)

Ericson2314 · 2022-05-07T22:15:02Z

#nix-rfc-122:matrix.org

edolstra · 2022-06-15T13:15:32Z

Any updates on the status of this RFC?

lucasew · 2022-06-15T14:30:12Z

We (or I) need to build a proof of concept. Maybe we will pivot this RFC to an LRU-based cache proxy approach at the beginning and iterate to a p2p approach if necessary, but I am without time to test it now, I am very busy because of the end of the semester.

The plan is to apply that prototype to an organization to reduce internet usage with things people often need, so, that prototype should be working until the end of the year, or I definitely will not get my degree by the end of the year xD.

lheckemann · 2022-06-29T13:20:13Z

Sounds good! On behalf of the Steering Committee, I'd like to suggest moving the RFC to draft status until then --- any objections?

binary-cache-ipfs: genesis

fe0b4f2

Signed-off-by: lucasew <[email protected]>

kevincox reviewed Mar 9, 2022

View reviewed changes

tomberek mentioned this pull request Mar 9, 2022

Meeting 2022-03-09 NixOS/rfc-steering-committee#85

Closed

19 tasks

edolstra added status: new status: open for nominations Open for shepherding team nominations and removed status: new labels Mar 23, 2022

edolstra mentioned this pull request Mar 23, 2022

Meeting 2022-03-23 NixOS/rfc-steering-committee#86

Closed

19 tasks

edolstra mentioned this pull request Apr 6, 2022

Meeting 2022-04-06 NixOS/rfc-steering-committee#87

Closed

19 tasks

lheckemann mentioned this pull request Apr 20, 2022

Meeting 2022-04-20 NixOS/rfc-steering-committee#88

Closed

19 tasks

Ericson2314 mentioned this pull request Apr 27, 2022

The current status obsidiansystems/ipfs-nix-guide#2

Open

Add shepherd metadata

57b2bc5

lheckemann added status: in discussion and removed status: open for nominations Open for shepherding team nominations labels May 4, 2022

kevincox mentioned this pull request May 4, 2022

Meeting 2022-05-04 NixOS/rfc-steering-committee#89

Closed

19 tasks

spacekookie mentioned this pull request May 18, 2022

Meeting 2022-05-18 NixOS/rfc-steering-committee#90

Closed

20 tasks

tomberek mentioned this pull request Jun 1, 2022

Meeting 2022-06-01 NixOS/rfc-steering-committee#91

Closed

21 tasks

edolstra mentioned this pull request Jun 15, 2022

Meeting 2022-06-15 NixOS/rfc-steering-committee#92

Closed

22 tasks

lucasew marked this pull request as draft June 29, 2022 13:29

lheckemann mentioned this pull request Jun 29, 2022

Meeting 2022-06-29 NixOS/rfc-steering-committee#94

Closed

21 tasks

lheckemann mentioned this pull request Jul 13, 2022

Meeting 2022-07-13 NixOS/rfc-steering-committee#95

Closed

22 tasks

kevincox added status: draft and removed status: in discussion labels Jul 27, 2022

kevincox mentioned this pull request Jul 27, 2022

Meeting 2022-07-27 NixOS/rfc-steering-committee#97

Closed

24 tasks

lucasew closed this Aug 21, 2022

Ericson2314 mentioned this pull request Aug 29, 2022

[RFC 0133] Git hashing and Git-hashing-based remote stores #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC 0122] IPFS CID optionally on narinfo in binary caches #122

[RFC 0122] IPFS CID optionally on narinfo in binary caches #122

lucasew commented Mar 7, 2022 •

edited

Loading

Ericson2314 commented Mar 7, 2022

kevincox Mar 9, 2022

nixinator Mar 12, 2022

kevincox Mar 12, 2022 •

edited

Loading

lucasew Mar 12, 2022

tomberek Mar 23, 2022

kevincox Mar 23, 2022

edolstra commented Mar 23, 2022

Ericson2314 commented Apr 7, 2022

tomberek commented Apr 20, 2022

Ericson2314 commented Apr 20, 2022 •

edited

Loading

Ericson2314 commented Apr 20, 2022 •

edited

Loading

kamadorueda commented Apr 21, 2022

Ericson2314 commented Apr 21, 2022 •

edited

Loading

Ericson2314 commented May 7, 2022

edolstra commented Jun 15, 2022

lucasew commented Jun 15, 2022

lheckemann commented Jun 29, 2022


		IPFS is still not a present reality on the mainstream Nix ecosystem, altough it's not reliable to store long term data, it can reduce bandwith costs for both the servers and the clients but the question is where the NAR file could be obtained in IPFS.

		Its not espected that, for example, cache.nixos.org would run a IPFS daemon for seeding but it could just calculate the hash using `ipfs add -nq $file` and provide it on the narinfo so other nodes can figure out alternative places to download the NAR files, even closer than a CDN could be.

[RFC 0122] IPFS CID optionally on narinfo in binary caches #122

[RFC 0122] IPFS CID optionally on narinfo in binary caches #122

Conversation

lucasew commented Mar 7, 2022 • edited Loading

Ericson2314 commented Mar 7, 2022

kevincox Mar 9, 2022

Choose a reason for hiding this comment

nixinator Mar 12, 2022

Choose a reason for hiding this comment

kevincox Mar 12, 2022 • edited Loading

Choose a reason for hiding this comment

lucasew Mar 12, 2022

Choose a reason for hiding this comment

tomberek Mar 23, 2022

Choose a reason for hiding this comment

kevincox Mar 23, 2022

Choose a reason for hiding this comment

edolstra commented Mar 23, 2022

Ericson2314 commented Apr 7, 2022

tomberek commented Apr 20, 2022

Ericson2314 commented Apr 20, 2022 • edited Loading

Ericson2314 commented Apr 20, 2022 • edited Loading

kamadorueda commented Apr 21, 2022

Ericson2314 commented Apr 21, 2022 • edited Loading

Ericson2314 commented May 7, 2022

edolstra commented Jun 15, 2022

lucasew commented Jun 15, 2022

lheckemann commented Jun 29, 2022

lucasew commented Mar 7, 2022 •

edited

Loading

kevincox Mar 12, 2022 •

edited

Loading

Ericson2314 commented Apr 20, 2022 •

edited

Loading

Ericson2314 commented Apr 20, 2022 •

edited

Loading

Ericson2314 commented Apr 21, 2022 •

edited

Loading