Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology: Block vs Object (vs Node) #5058

Closed
schomatis opened this issue Jun 1, 2018 · 4 comments · Fixed by #10375
Closed

Terminology: Block vs Object (vs Node) #5058

schomatis opened this issue Jun 1, 2018 · 4 comments · Fixed by #10375
Labels
topic/docs-ipfs Topic docs-ipfs

Comments

@schomatis
Copy link
Contributor

From the IPFS example Dealing With Blocks:

Blocks vs Objects

In IPFS, a block refers to a single unit of data, identified by its key (hash). A block can be any sort of data, and does not necessarily have any sort of format associated with it. An object, on the other hand, refers to a block that follows the Merkle DAG protobuf data format. It can be parsed and manipulated via the ipfs object command. Any given hash may represent an object or a block.

I think the block distinction is very clearly explained but I'm having trouble assimilating the Object term, which pretty much seems to be dealing with DAG nodes,

USAGE
  ipfs object - Interact with IPFS objects.

  'ipfs object' is a plumbing command used to manipulate DAG objects
  directly.

  ipfs object get <key>            - Get and serialize the DAG node named by <key>.

Ideally it would be nice to have a single term for the PDU (abusing a bit the OSI terminology) of each of the layers that make up information in IPFS, something like (UnixFS) Object over a (DAG) Node over a (raw) Block (this is just an example off the top of my head). Right now Object and Node seem to be overlapping.

@schomatis schomatis added the topic/docs-ipfs Topic docs-ipfs label Jun 1, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jun 1, 2018
@schomatis schomatis self-assigned this Jun 1, 2018
@Mr0grog
Copy link
Contributor

Mr0grog commented Jun 2, 2018

I'm having trouble assimilating the Object term, which pretty much seems to be dealing with DAG nodes… (UnixFS) Object over a (DAG) Node over a (raw) Block

I think you’ve mostly got it right, at least as far as my understanding goes.

One thing that probably makes this confusing is that IPFS’s old “MerkleDAG” format (which UnixFS is a layer above) is being very slowly replaced by the IPLD format (also a type of [merkle] DAG). An object is specifically a block that is formatted according to the old MerkleDAG format. A UnixFS file node is an object whose Data field is a binary blob encoded with the UnixFS protobuf format.

(Worth noting: determining that an object represents a UnixFS file node tells you how to interpret the object’s Links field (e.g. a directory’s links represent all the file nodes it contains)).

Here’s how I think of it:

                       +------------------------------+
                       |            Block             |
                       |                              |
                       | Package of raw bytes in IPFS |
                       | This is the thing that gets  |
                       | hashed to make a CID.        |
                       +------------------------------+
                                      |
               +----------------------+----------------------+
               |                                             |
+------------------------------+             +------------------------------+
|            Object            |             |          IPLD Node           |
|                              |             |                              |
| Block formatted according to |             | Block formatted according to |
| IPFS "MerkleDAG" format with |             | IPLD format -- it has a JSON |
| a list of `Links` and a blob |             | compatible structure. Any    |
| of unknown `Data`.           |             | key whose value is:          |
+------------------------------+             |   {"/": "<any valid CID>"}   |
               |                             | represents a link to another |
               |                             | node. IPLD doesn't assume    |
               |                             | anything about other keys or |
               |                             | values in the node.          |
               |                             +------------------------------+
               |                                             |
               |                                             |
+------------------------------+             +------------------------------+
|       UnixFS File Node       |             |     UnixFS v2 File Node      |
|                              |             |                              |
| Object where the `Data` blob |             | IPLD node where the keys and |
| is a Protobuf parseable by   |             | values represent things      |
| UnixFS protobuf definition.  |             | according to UnixFS v2       |
| Knowing the Object is a file |             | format. Specifics still to   |
| node also tells you how to   |             | be determined.               |
| use the Object's `Links`.    |             +------------------------------+
+------------------------------+

Or in CLI terms:

# If we have a UnixFS directory stored at QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq...

# Blocks are raw data:
$ ipfs block get QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq
[...binary data omitted... ]

# Objects are things with `Links` (a list of specialized objects) and `Data` (binary blob):
$ ipfs object get QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq | jq
{
  "Links": [
    {
      "Name": "assets",
      "Hash": "QmTAn8ipupd4Hu2XxekSwxpwXqYsrttwqYmf3Eakd5c1KS",
      "Size": 1862319
    },
    {
      "Name": "categories",
      "Hash": "Qmd55L3rJ2BhiXpeZV49VB49Y1TpeqDdRDqrDTPjzCh51X",
      "Size": 27146
    },
    # [...more links omitted...]
  ],
  "Data": "\b\u0001"  # <- Note this is a binary string. At the "object" level, we still don't know how to parse it.
}

# UnixFS file nodes are used by top-level commands like ls, add, cat, get:
$ ipfs ls --headers QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq
Hash                                           Size    Name
QmTAn8ipupd4Hu2XxekSwxpwXqYsrttwqYmf3Eakd5c1KS 1862319 assets/
Qmd55L3rJ2BhiXpeZV49VB49Y1TpeqDdRDqrDTPjzCh51X 27146   categories/
QmTLGx8mKCjQoa51Cd89MrzfRanUeCPsmcKbgJXiwouHg7 122471  community/
# [...more entries omitted...]

Also: the dag commands are the IPLD equivalent of the object commands.

Finally, it’s somewhat important that MerkleDAG nodes and IPLD nodes are a level below UnixFS. The idea here is that you could use them to represent all kinds of immutable DAGs besides files. You could make a database or a blockchain or whatever out of linked, content-addressable items (if you had to go down to the block level to do this, you’d lose the built-in tools for resolving and verifying links). IPLD is replacing MerkleDAG because it should be more flexible for representing different sorts of data while still keeping links identifiable in a standard way.

@schomatis
Copy link
Contributor Author

@Mr0grog First, thank you very much for that diagram, that is exactly the kind of final product I aspire to at the end of this milestone.

I need to take a closer look at the IPLD spec, I agree with your general description, we can clearly distinguish three distinct protocol layers (three vertical levels in your diagram), but there is a gap between the code base and the specifications that I would like to bridge.

From the code, an IPLD Node is a Go interface, a specification of how a node should behave. Apart from that, there exist the ProtoNode structure from the merkledag package which implements that interface, it behaves as the IPLD spec says it should behave (in theory at least, in practice Go just guarantees that the structure has the methods defined in the interface).

I have more to say about the subject, but first I would like to check if we're talking about the same thing and that the interface and structure I'm pointing to are indeed the two specs you're linking.

One thing that probably makes this confusing is that IPFS’s old “MerkleDAG” format (which UnixFS is a layer above) is being very slowly replaced by the IPLD format (also a type of [merkle] DAG).

Your previous comment makes me suspect they're not (I'm sensing there's something I'm missing here) because I'm pointing to an interface and a structure, the latter implements the former, one cannot "slowly replace" the other.

@schomatis
Copy link
Contributor Author

Independent of that, returning to the three layers we agree on, I think it would really help the conversation to also include in the specifications (so I should raise an issue there but first I would like to discuss it here) different names for the "thing that encapsulates data" in each layer.

I think we all agree that Block is at the bottom, raw bytes. After that there's one "thing" that adds links to the blocks so we can join them in a DAG (Node? Object? Thing with links?). On top of that there's pretty much everything else, a file, a git commit, a chat, a filecoin, etc. It would be nice to have a different word at this layer to append to these, e.g., "gadget" (just for the sake of this argument), and we could refer to the "unix file gadget", the "git commit gadget" and similar, and applying this "gadget" suffix would hint the reader we're referring to the known world but with an underlying IPFS representation. Does that make sense?

@Stebalien
Copy link
Member

@Mr0grog's diagram looks mostly correct (and that example is a bit out of date). However, technically, the hiererchy is:

Block -> IPLD Node -> UnixFS 1, 2

Or, more explicitly,

Block -> IPLD Node (
           DagCBOR,
           DagPB (merkledag object),
           Raw,
           ...
         ) -> UnixFS 1, 2

That, we started with:

Block -> Merkledag (objects) -> UnixFS 1

And then replaced the merkledag with a more generalized IPLD.

Now, currently, we don't allow UnixFS 1 nodes to be encoded in CBOR. Technically, we should. However, this restriction may make our lives easier in the future (for complicated reasons I won't get into).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/docs-ipfs Topic docs-ipfs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants