Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unverified input and cache graph data. #133

Open
wlngai opened this issue Jun 29, 2017 · 1 comment
Open

Unverified input and cache graph data. #133

wlngai opened this issue Jun 29, 2017 · 1 comment
Labels

Comments

@wlngai
Copy link
Contributor

wlngai commented Jun 29, 2017

The current Graphalytics assumes that the input graph and (cache) graph are correct, which is fine as if it is not then the corresponding benchmark runs will not pass validation. However, it is unclear to users why the validation failed, as they assume the input graph and (cache) graph are correct. These datasets can be accidentally corrupted for example when the caching process was interrupted.

A check-sum (e.g. sha1) should be implemented on these files for full validation.

@szarnyasg
Copy link
Member

I just ran into this issue 5.5 years later :). If the program is interrupted during the cache file's generation, it will leave a partial file and the next execution will assume it is correct even when the number of rows is different between the two files:

$ wc -l /data/gx/graphs/cache/datagen-7_5-fb.e
30759439 /data/gx/graphs/cache/datagen-7_5-fb.e

$ wc -l /data/gx/graphs/datagen-7_5-fb.e
34185747 /data/gx/graphs/datagen-7_5-fb.e

@szarnyasg szarnyasg added the bug label Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants