Row hash validation failed because of Memory error in Python #652

nmani1191 · 2022-12-21T07:54:18Z

I ends up with a Memory issue, when I tried to do hash-based row validation for the 59 million row table using the DVT. the error message looked like this below

"numpy.core._exceptions.MeomoryError: Unable to allocate 247. TiB for an array with shape (33977224160767,) and data type int64"

Do we have any suggestion/practice when we need to do the actual row-to-row validation between the source and target system which have million/billion rows?

I am aware of running the row hash validation for only the random/stratified sample set from the table. But I am looking for ways to validate the whole table.

nehanene15 · 2023-01-12T16:10:13Z

Memory constraint is the biggest blocker for whole table validations on large tables. The alternative is to either add more memory to the machine that is running DVT, or filter the table and validate chunks at a time.

We are working on Issue #619 that would support creating table partitions based on a numeric partition key to address this constraint. We recently released support for running multiple YAMLs from a directory (PR #654 ) so each YAML can represent a portion of the table and then be run sequentially. For example, the first config filters on id >= 0 and id < 10; the second config filters on id >= 10 and id < 20; and so on.

nmani1191 · 2023-01-12T17:04:35Z

Looking forward to a solution that addresses this constraint.

nehanene15 · 2023-01-25T15:31:10Z

This has been merged with PR #653
Note that it only supports a numeric monotonically increasing key for V1. We will be working on supporting other partitioning keys as well.

nehanene15 closed this as completed Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row hash validation failed because of Memory error in Python #652

Row hash validation failed because of Memory error in Python #652

nmani1191 commented Dec 21, 2022 •

edited

Loading

nehanene15 commented Jan 12, 2023

nmani1191 commented Jan 12, 2023

nehanene15 commented Jan 25, 2023

Row hash validation failed because of Memory error in Python #652

Row hash validation failed because of Memory error in Python #652

Comments

nmani1191 commented Dec 21, 2022 • edited Loading

nehanene15 commented Jan 12, 2023

nmani1191 commented Jan 12, 2023

nehanene15 commented Jan 25, 2023

nmani1191 commented Dec 21, 2022 •

edited

Loading