Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row hash validation failed because of Memory error in Python #652

Closed
nmani1191 opened this issue Dec 21, 2022 · 3 comments
Closed

Row hash validation failed because of Memory error in Python #652

nmani1191 opened this issue Dec 21, 2022 · 3 comments

Comments

@nmani1191
Copy link

nmani1191 commented Dec 21, 2022

I ends up with a Memory issue, when I tried to do hash-based row validation for the 59 million row table using the DVT. the error message looked like this below

"numpy.core._exceptions.MeomoryError: Unable to allocate 247. TiB for an array with shape (33977224160767,) and data type int64"

Do we have any suggestion/practice when we need to do the actual row-to-row validation between the source and target system which have million/billion rows?

I am aware of running the row hash validation for only the random/stratified sample set from the table. But I am looking for ways to validate the whole table.

@nehanene15
Copy link
Collaborator

Memory constraint is the biggest blocker for whole table validations on large tables. The alternative is to either add more memory to the machine that is running DVT, or filter the table and validate chunks at a time.

We are working on Issue #619 that would support creating table partitions based on a numeric partition key to address this constraint. We recently released support for running multiple YAMLs from a directory (PR #654 ) so each YAML can represent a portion of the table and then be run sequentially. For example, the first config filters on id >= 0 and id < 10; the second config filters on id >= 10 and id < 20; and so on.

@nmani1191
Copy link
Author

Looking forward to a solution that addresses this constraint.

@nehanene15
Copy link
Collaborator

This has been merged with PR #653
Note that it only supports a numeric monotonically increasing key for V1. We will be working on supporting other partitioning keys as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants