Terminology: Block vs Object (vs Node) #5058

schomatis · 2018-06-01T01:06:50Z

From the IPFS example Dealing With Blocks:

Blocks vs Objects

In IPFS, a block refers to a single unit of data, identified by its key (hash). A block can be any sort of data, and does not necessarily have any sort of format associated with it. An object, on the other hand, refers to a block that follows the Merkle DAG protobuf data format. It can be parsed and manipulated via the ipfs object command. Any given hash may represent an object or a block.

I think the block distinction is very clearly explained but I'm having trouble assimilating the Object term, which pretty much seems to be dealing with DAG nodes,

USAGE
  ipfs object - Interact with IPFS objects.

  'ipfs object' is a plumbing command used to manipulate DAG objects
  directly.

  ipfs object get <key>            - Get and serialize the DAG node named by <key>.

Ideally it would be nice to have a single term for the PDU (abusing a bit the OSI terminology) of each of the layers that make up information in IPFS, something like (UnixFS) Object over a (DAG) Node over a (raw) Block (this is just an example off the top of my head). Right now Object and Node seem to be overlapping.

The text was updated successfully, but these errors were encountered:

Mr0grog · 2018-06-02T07:00:08Z

I'm having trouble assimilating the Object term, which pretty much seems to be dealing with DAG nodes… (UnixFS) Object over a (DAG) Node over a (raw) Block

I think you’ve mostly got it right, at least as far as my understanding goes.

One thing that probably makes this confusing is that IPFS’s old “MerkleDAG” format (which UnixFS is a layer above) is being very slowly replaced by the IPLD format (also a type of [merkle] DAG). An object is specifically a block that is formatted according to the old MerkleDAG format. A UnixFS file node is an object whose Data field is a binary blob encoded with the UnixFS protobuf format.

(Worth noting: determining that an object represents a UnixFS file node tells you how to interpret the object’s Links field (e.g. a directory’s links represent all the file nodes it contains)).

Here’s how I think of it:

                       +------------------------------+
                       |            Block             |
                       |                              |
                       | Package of raw bytes in IPFS |
                       | This is the thing that gets  |
                       | hashed to make a CID.        |
                       +------------------------------+
                                      |
               +----------------------+----------------------+
               |                                             |
+------------------------------+             +------------------------------+
|            Object            |             |          IPLD Node           |
|                              |             |                              |
| Block formatted according to |             | Block formatted according to |
| IPFS "MerkleDAG" format with |             | IPLD format -- it has a JSON |
| a list of `Links` and a blob |             | compatible structure. Any    |
| of unknown `Data`.           |             | key whose value is:          |
+------------------------------+             |   {"/": "<any valid CID>"}   |
               |                             | represents a link to another |
               |                             | node. IPLD doesn't assume    |
               |                             | anything about other keys or |
               |                             | values in the node.          |
               |                             +------------------------------+
               |                                             |
               |                                             |
+------------------------------+             +------------------------------+
|       UnixFS File Node       |             |     UnixFS v2 File Node      |
|                              |             |                              |
| Object where the `Data` blob |             | IPLD node where the keys and |
| is a Protobuf parseable by   |             | values represent things      |
| UnixFS protobuf definition.  |             | according to UnixFS v2       |
| Knowing the Object is a file |             | format. Specifics still to   |
| node also tells you how to   |             | be determined.               |
| use the Object's `Links`.    |             +------------------------------+
+------------------------------+

Or in CLI terms:

# If we have a UnixFS directory stored at QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq...

# Blocks are raw data:
$ ipfs block get QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq
[...binary data omitted... ]

# Objects are things with `Links` (a list of specialized objects) and `Data` (binary blob):
$ ipfs object get QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq | jq
{
  "Links": [
    {
      "Name": "assets",
      "Hash": "QmTAn8ipupd4Hu2XxekSwxpwXqYsrttwqYmf3Eakd5c1KS",
      "Size": 1862319
    },
    {
      "Name": "categories",
      "Hash": "Qmd55L3rJ2BhiXpeZV49VB49Y1TpeqDdRDqrDTPjzCh51X",
      "Size": 27146
    },
    # [...more links omitted...]
  ],
  "Data": "\b\u0001"  # <- Note this is a binary string. At the "object" level, we still don't know how to parse it.
}

# UnixFS file nodes are used by top-level commands like ls, add, cat, get:
$ ipfs ls --headers QmVUdHfpo9hyC8wXmgd2frRrsp83iRvuL8HWyp1LPzjsPq
Hash                                           Size    Name
QmTAn8ipupd4Hu2XxekSwxpwXqYsrttwqYmf3Eakd5c1KS 1862319 assets/
Qmd55L3rJ2BhiXpeZV49VB49Y1TpeqDdRDqrDTPjzCh51X 27146   categories/
QmTLGx8mKCjQoa51Cd89MrzfRanUeCPsmcKbgJXiwouHg7 122471  community/
# [...more entries omitted...]

Also: the dag commands are the IPLD equivalent of the object commands.

Finally, it’s somewhat important that MerkleDAG nodes and IPLD nodes are a level below UnixFS. The idea here is that you could use them to represent all kinds of immutable DAGs besides files. You could make a database or a blockchain or whatever out of linked, content-addressable items (if you had to go down to the block level to do this, you’d lose the built-in tools for resolving and verifying links). IPLD is replacing MerkleDAG because it should be more flexible for representing different sorts of data while still keeping links identifiable in a standard way.

schomatis · 2018-06-04T00:27:15Z

@Mr0grog First, thank you very much for that diagram, that is exactly the kind of final product I aspire to at the end of this milestone.

I need to take a closer look at the IPLD spec, I agree with your general description, we can clearly distinguish three distinct protocol layers (three vertical levels in your diagram), but there is a gap between the code base and the specifications that I would like to bridge.

From the code, an IPLD Node is a Go interface, a specification of how a node should behave. Apart from that, there exist the ProtoNode structure from the merkledag package which implements that interface, it behaves as the IPLD spec says it should behave (in theory at least, in practice Go just guarantees that the structure has the methods defined in the interface).

I have more to say about the subject, but first I would like to check if we're talking about the same thing and that the interface and structure I'm pointing to are indeed the two specs you're linking.

One thing that probably makes this confusing is that IPFS’s old “MerkleDAG” format (which UnixFS is a layer above) is being very slowly replaced by the IPLD format (also a type of [merkle] DAG).

Your previous comment makes me suspect they're not (I'm sensing there's something I'm missing here) because I'm pointing to an interface and a structure, the latter implements the former, one cannot "slowly replace" the other.

schomatis · 2018-06-04T00:41:48Z

Independent of that, returning to the three layers we agree on, I think it would really help the conversation to also include in the specifications (so I should raise an issue there but first I would like to discuss it here) different names for the "thing that encapsulates data" in each layer.

I think we all agree that Block is at the bottom, raw bytes. After that there's one "thing" that adds links to the blocks so we can join them in a DAG (Node? Object? Thing with links?). On top of that there's pretty much everything else, a file, a git commit, a chat, a filecoin, etc. It would be nice to have a different word at this layer to append to these, e.g., "gadget" (just for the sake of this argument), and we could refer to the "unix file gadget", the "git commit gadget" and similar, and applying this "gadget" suffix would hint the reader we're referring to the known world but with an underlying IPFS representation. Does that make sense?

Stebalien · 2018-06-30T01:05:53Z

@Mr0grog's diagram looks mostly correct (and that example is a bit out of date). However, technically, the hiererchy is:

Block -> IPLD Node -> UnixFS 1, 2

Or, more explicitly,

Block -> IPLD Node (
           DagCBOR,
           DagPB (merkledag object),
           Raw,
           ...
         ) -> UnixFS 1, 2

That, we started with:

Block -> Merkledag (objects) -> UnixFS 1

And then replaced the merkledag with a more generalized IPLD.

Now, currently, we don't allow UnixFS 1 nodes to be encoded in CBOR. Technically, we should. However, this restriction may make our lives easier in the future (for complicated reasons I won't get into).

schomatis added the topic/docs-ipfs Topic docs-ipfs label Jun 1, 2018

schomatis added this to the Files API Documentation milestone Jun 1, 2018

schomatis self-assigned this Jun 1, 2018

schomatis mentioned this issue Jun 4, 2018

Terminology: MFS vs UnixFS vs Files API #5051

Open

schomatis mentioned this issue Jun 29, 2018

core/commands/ls: wrap NewDirectoryFromNode error #5166

Merged

travis-g mentioned this issue Mar 1, 2019

bias API towards IPLD support travis-g/vault-ipfs#2

Open

Stebalien removed this from the Files API Documentation milestone Apr 29, 2020

schomatis mentioned this issue May 22, 2020

Explain how files are managed in IPFS. ipfs/ipfs-docs#251

Closed

Stebalien unassigned schomatis Apr 22, 2021

PedrobyJoao mentioned this issue Aug 2, 2023

Clarify exactly what IPLD *is.* ipld/ipld#39

Open

lidel mentioned this issue Mar 21, 2024

core/commands!: remove deprecated object APIs #10375

Merged

hacdias closed this as completed in #10375 Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminology: Block vs Object (vs Node) #5058

Terminology: Block vs Object (vs Node) #5058

schomatis commented Jun 1, 2018

Mr0grog commented Jun 2, 2018

schomatis commented Jun 4, 2018

schomatis commented Jun 4, 2018

Stebalien commented Jun 30, 2018

Terminology: Block vs Object (vs Node) #5058

Terminology: Block vs Object (vs Node) #5058

Comments

schomatis commented Jun 1, 2018

Mr0grog commented Jun 2, 2018

schomatis commented Jun 4, 2018

schomatis commented Jun 4, 2018

Stebalien commented Jun 30, 2018