Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs object diff assumes links will have different names for the comparison #5278

Closed
schomatis opened this issue Jul 23, 2018 · 10 comments
Closed
Labels
kind/bug A bug in existing code (including security flaws) topic/merkledag Topic merkledag topic/UnixFS Topic UnixFS

Comments

@schomatis
Copy link
Contributor

See #5135 (comment).

It would also be interesting to see how this can be integrated with some UnixFS command that could show not only which IPLD node is different but how it differs at the UnixFS layer.

@schomatis schomatis added topic/merkledag Topic merkledag topic/UnixFS Topic UnixFS labels Jul 23, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jul 23, 2018
@schomatis schomatis self-assigned this Jul 23, 2018
@magik6k
Copy link
Member

magik6k commented Jul 25, 2018

Note: this command will soon be changed to use CoreAPI - #4643

@schomatis
Copy link
Contributor Author

Thanks for the heads-up, I'm mostly interested in the logic inside dagutils.Diff, I don't mind calling it from a different API.

@bjzhang
Copy link

bjzhang commented Jul 30, 2018

Hi, When I try to use “ipfs object diff” command, I found that the diff command could not find the exact difference between two hash of object if the link.Name is empty, relative code in dagutils/diff.go. In the following steps, I create a test_data and test_data_appended which append one line. I expect “ipfs object diff” could say that the only difference is the last object of test_data_append.
Reproduce steps(useless output has been removed, test in origin/master and feat/coreapi-commands):

$ ipfs version
ipfs version 0.4.17
$ dd if=/dev/random of=test_data bs=1024k count=1
$ ipfs add test_data
$ ipfs object get <hash of test_data>
2018/07/30 14:43:14 proto: duplicate proto type registered: basictracer_go.wire.TracerState
{"Links":[{"Name":"","Hash":"QmbaVqMRkcuoc28EJiiHQ5X1E9c1EQSmZ7h6jpKM4MWAas","Size":262158},{"Name":"","Hash":"QmTLj6JAu9EqjAT8wzjhcgktA9SFaeRhikV8fPfirEGC9H","Size":262158},{"Name":"","Hash":"QmRteFHwCk7ViWoiNSvAXGHApYJZZuiShTqNkbEXgR13kh","Size":262158},{"Name":"","Hash":"QmV5o9vMJVvg6rGJZTukpz2BFzrz3xfaD2uriSA6qt9oMS","Size":262158}],"Data":"\u0008\u0002\u0018��@ ��\u0010 ��\u0010 ��\u0010 ��\u0010"}
$ echo 1 >> test_data_appended
$ ipfs add test_data_appended
$ ipfs object get <hash of test_data_appended>
{"Links":[{"Name":"","Hash":"QmbaVqMRkcuoc28EJiiHQ5X1E9c1EQSmZ7h6jpKM4MWAas","Size":262158},{"Name":"","Hash":"QmTLj6JAu9EqjAT8wzjhcgktA9SFaeRhikV8fPfirEGC9H","Size":262158},{"Name":"","Hash":"QmRteFHwCk7ViWoiNSvAXGHApYJZZuiShTqNkbEXgR13kh","Size":262158},{"Name":"","Hash":"QmV5o9vMJVvg6rGJZTukpz2BFzrz3xfaD2uriSA6qt9oMS","Size":262158},{"Name":"","Hash":"QmdytmR4wULMd3SLo6ePF4s3WcRHWcpnJZ7bHhoj3QB13v","Size":10}],"Data":"\u0008\u0002\u0018��@ ��\u0010 ��\u0010 ��\u0010 ��\u0010 \u0002"}
$ ipfs object diff -v <hash of test_data> <hash of test_data_appended>
QmRN6hWdkdv9kPeyGetpzusv3Nn9nQQ9NgRPX5aZk4KmiZ QmemtUpKDXqWacFtsVEKSn669Y3ZoR99sHT6rtZEPFyFbd
2018/07/30 14:44:52 proto: duplicate proto type registered: basictracer_go.wire.TracerState
Changed "" from QmTLj6JAu9EqjAT8wzjhcgktA9SFaeRhikV8fPfirEGC9H to QmbaVqMRkcuoc28EJiiHQ5X1E9c1EQSmZ7h6jpKM4MWAas.
Changed "" from QmRteFHwCk7ViWoiNSvAXGHApYJZZuiShTqNkbEXgR13kh to QmbaVqMRkcuoc28EJiiHQ5X1E9c1EQSmZ7h6jpKM4MWAas.
Changed "" from QmV5o9vMJVvg6rGJZTukpz2BFzrz3xfaD2uriSA6qt9oMS to QmbaVqMRkcuoc28EJiiHQ5X1E9c1EQSmZ7h6jpKM4MWAas.

Not sure if I use this command correctly. Thanks in advance.

@schomatis schomatis added the kind/bug A bug in existing code (including security flaws) label Jul 30, 2018
@schomatis
Copy link
Contributor Author

schomatis commented Jul 30, 2018

Hey @bjzhang, thanks for taking the lead on this one, good catch!

Yes, it seems the Diff method relies on link names to compare IPLD nodes, but as you have seen in the importer package, a file DAG is made up of nodes with empty link names, so another more general method should be designed that doesn't assume that each link should have its own different name. It can still separate links by name (to speed up the comparison), but within each name group it should cross check every CID of the two nodes to see if it finds a match. (This will probably also need to be fixed in the complement ApplyChange function.)

I'll get on this so you can focus on #5135, but ping me if the fix isn't moving fast enough for your needs.

@schomatis schomatis changed the title Review the ipfs object diff command ipfs object diff assumes links will have different names for the comparison Jul 30, 2018
@bjzhang
Copy link

bjzhang commented Jul 31, 2018

@schomatis You are welcome. It is not a big blocker for me, IIUC I could compare the object manually by "ipfs object get". I could help you test it when your patches are available.

@schomatis
Copy link
Contributor Author

So, on second thought, this function won't be useful for the purposes of #5135, we may need a specialized UnixFS function. The algorithm of this function relies on the link names to find where the difference might be (and only then checks the CIDs), the problem is that we can't rely on link names as a UnixFS file doesn't use them, but we also can't rely just on CIDs, as any difference in a node will propagate back to the root.

What we want here is an algorithm that does go down when the CIDs differ (in contrast to this function), to actually check which child is causing the parent's CID to be different from the expected value. That could work when two DAGs are identical except for one node, but I'm not sure how it would play out in more complex scenarios. Let me rethink this a bit more.

@schomatis
Copy link
Contributor Author

@whyrusleeping I'm trying to give some closure to this issue, I wanted to check this command as I thought it might helpful as a debug tool, but I was wrong, we're looking for something different. However, @bjzhang finding is valid, the Diff function relies on link names being different (pretty much because all of the methods in the merkledag package seem to work on that assumption), is that a bug that should be fixed? Or is that something that should be clarified in the function documentation?

@schomatis
Copy link
Contributor Author

@bjzhang In the meanwhile you can use @achingbrain's little beauty unixfs (see #5024 (comment) on how to install it), that command will show you the attributes of the UnixFS node. The problem with the ipfs object get command is that it only shows the DAG basic attributes (link, name, hash, data), but what you're probably interested in is the UnixFS attributes (encoded inside DAG's data) like the Type and Blocksizes (the error you're getting in #5135 now is an invalid UnixFS type, which you won't see it with ipfs object get).

@whyrusleeping
Copy link
Member

@schomatis the diff algorithm could definitely be better. The only real reason for the name requirement was that it produced nice looking diffs for the usecase I was using it for (diffing unixfs structures).

Agree that it makes sense to recursively diff a bit more aggressively

@Stebalien Stebalien removed this from the Files API Documentation milestone Apr 29, 2020
@lidel
Copy link
Member

lidel commented May 2, 2021

Closing: ipfs object is being deprecated (#7936) so we are not investing time into object commands.

Let's continue this in #4801 which suggest generic ipfs dag diff which works with all DAG types, not just dag-pb.

@lidel lidel closed this as completed May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/merkledag Topic merkledag topic/UnixFS Topic UnixFS
Projects
None yet
Development

No branches or pull requests

6 participants