Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Road to a unixfs-like in golang with ADLs #258

Open
4 of 10 tasks
warpfork opened this issue Sep 27, 2021 · 6 comments
Open
4 of 10 tasks

Road to a unixfs-like in golang with ADLs #258

warpfork opened this issue Sep 27, 2021 · 6 comments
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project kind/enhancement A net-new feature or improvement to an existing feature kind/tracking A meta-issue for tracking work P1 High: Likely tackled by core team if no one steps up

Comments

@warpfork
Copy link
Collaborator

warpfork commented Sep 27, 2021

This issue is a short checklist/roadmap for the major areas of work needed in order to make something "unixfs-like" happen in golang and be implemented almost entirely in ADLs.

What?

"unixfs-like"

By "unixfs-like", I mean both of: we can reimplement a feature-parity unixfsv1 (like what go-ipfs does) this way; and we would also reuse the majority of these code and components in building a "unixfsv2"; and most importantly, someone else could grab most of the components we'll need here and build their own custom evolution of what they consider a "unixfs-like" with minimal hurdles.

"implemented as ADLs"

By "implemented almost entirely in ADLs" I mean: there should be a few pieces of code which implement key functionality by making complex structures act "like a map-kind node" and "like a bytes-kind node". So: directories? Map. Files? Bytes. Even though either of them can be sharded? Yep. And that should, when composed, make a unixfs-like system almost trivial to assemble.

The desirable outcomes of this are two (or three): one, it means things like pathing and traversals and selectors over them are already defined (because it's just "working over a map", etc); two, it's a very clear abstraction which allows for reuse when someone wants to make a new system. (Also, three, in the longer run -- we hope to make ADLs into a language-agnostic plugin system. But that is not today :) Today we are happy to implement it in one language, get something working, and plan to move on later.)

other sells

It's been suggested that we could significantly improve UX in other systems like the IPFS Gateways and their rendering of data by having this work done -- and make it so much easier to do that it becomes more likely for those improvements to actually get done. When unixfs systems (including unixfsv1 as a retrofit) are available as ADLs, it means we can build more consistent and reusable rendering technologies with a lot less code, less special cases, less complexity to explain, etc.

Roadmap

Without further ado, some checkboxes...

  • Already have: the write API for map ADLs
    • (it's just NodeBuilder/NodeAssembler)
  • Already have: the read API for map ADLs
    • (it's just Node)
  • Already have: the read implementation of the ADL for directories
  • Already have: pathing and traversals and selectors working over map ADLs
    • (should go without saying, but I'll say it to be clear :))
  • Need: the write implementation of the ADL for directories
  • Need: the read API for bigbytes ADLs
    • this will go here in go-ipld-prime
    • should be implemented by some kind of feature detection. (We already have the API for bytes as slices, but we need one that works with seeking and streaming.)
  • Need: the write API for bigbytes ADLs
    • this will go here in go-ipld-prime
    • should be implemented by some kind of feature detection. (We already have the API for bytes as slices, but we need one that works with seeking and streaming.)
  • Need: the read implementation of ADL for unixfs files
    • ideally, uses the FBL spec and can read many things... but we might end up needing a special one-off for unixfsv1, too.
  • Need: the write implementation of the ADL for unixfs files…
    • this will be fun: probably this should be constructed with chunker functions as a parameter!
  • Want: an API convention for efficient "append" to map ADLs
    • should be something we can do relatively late in the development pipeline, because we should do it with feature detection anyway.
    • should appear here in go-ipld-prime... eventually. (It can be incubated in specific ADL implementation repos, and we finalize it by copying the interface to go-ipld-prime when there's good evidence we've found one that works well.)

Not all of this work has to take place in this repo, and several parts of it can be started without blocking on other parts. It also doesn't all have to be done by the same people. This is just a tracking and roadmap-sharing issue.

@willscott
Copy link
Member

in terms of getting to a working implementation, we can defer the two 'bigbytes' items, because the existing https://github.com/ipfs/go-ipfs-chunker interface provides []byte for each individual node as it streams out of the file. As long as we don't keep all in-progress nodes in memory, which is a different problem, the direct handling of io.Reader can probably come after the MVP.

  • the read implementation for unixfs files doesn't exist yet at go-unixfsnode.

@warpfork
Copy link
Collaborator Author

warpfork commented Sep 27, 2021

We get a lot of value out of having just the directory/map stuff done. I would think it's good to focus fire on that first. Yes.

There's probably no harm in writing the file chunking stuff into the current Node and NodeAssembler APIs that are non-streaming -- because somewhere between 99% and 100% of that code does need to be written anyway.

I think we'll need some "big bytes" interface work (e.g. io.Reader or io.ReadSeeker or etc, and io.Writer for the other way) to see full success for files, though. Because yeah, the current non-streaming APIs don't really make it possible to have an abstraction and still have good memory usage.

@willscott
Copy link
Member

Yep, i agree that we do eventually need a cleaner interaction with io.Reader / io.Writer, especially at the ADL level.

The main thing I was pointing out is that initially it'll probably be desirable to use the existing set of chunker implementations, and since those are not fully streaming, it probably is another nudge that we can push off figuring out the final streaming designs since we'll end up with that crutch anyway for a while.

@BigLep
Copy link

BigLep commented Sep 27, 2021

@guseggert: you had an issue/comment where you detailed some of the problems with "huge directories" right? I'd like to link that to this as I believe it will be alleviated. I couldn't find it in my 5 minutes of searching, so maybe I'm imagining things.

@warpfork warpfork added effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project kind/enhancement A net-new feature or improvement to an existing feature kind/tracking A meta-issue for tracking work P1 High: Likely tackled by core team if no one steps up labels Oct 9, 2021
@guseggert
Copy link

Yes the issue is here: ipfs/kubo#8455

@warpfork
Copy link
Collaborator Author

There's a https://github.com/ipfs/go-unixfsnode/ repo nowaday, which I believe has even more of these features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project kind/enhancement A net-new feature or improvement to an existing feature kind/tracking A meta-issue for tracking work P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

4 participants