-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on Snowflake binary primary key random row validations #1146
Comments
Hi @henrihem, the fix to this issue should break up the problem Snowflake IN clauses into a series of IN/OR combinations. I was unable to find a way to use array manipulation like you suggested via the framework we use. Plus this solution is applicable to other SQL engines should we discover similar limitation elsewhere. Let me know if this fix does not resolve your issue. |
Hi @nj1973 There are a few bugs in this fix. I'll report those below. Once I was able to bypass those bugs, the fix for the described issue works as intended. Was able to run random row validations with 1000 rows. So the fix itself seems to work. Issue 1
When clean installing from develop-branch to venv, there is issue that third_party is missing completely. Was able to add the third_party manually and after that is works Issue 2Not sure if this is issue or just intended limitation for random row validations, but with this fix the limit of random rows looks to be around 5000 and 10 000 rows.
If there is some maximum amount of random rows, that should be documented. |
Hi @henrihem, thanks a lot for your detailed comment and continuous collaboration with us! Issue 1 was a side-effect of a configuration file switch that we attempted but we already reverted it at #1156. You should not face it anymore, very much appreciated for letting us know. And about Issue 2 we just merged a bug fix at #1158 that @nj1973 greatly solved. If you could please take a look and try to test it would be great, thank you! |
Sorry, to be clear, my bug fix does not solve issue 2. I just noticed a problem with my original fix while investigating it. We are hitting a Python limit due to the way Ibis builds OR clauses and the high number of them due to the 50 item IN list limit on Snowflake when the items are expressions. We won't have this problem for regular IN lists, only ones on binary columns. I think we might have to accept this is a limitation and think about how we document it. |
When validating tables using random rows we build a list of primary key literal values and then inject that list into the validation queries as an IN list, for example:
For tables with binary primary keys we cannot inject binary literals into a SQL statement and therefore convert the binary values to hex first and cast them back to binary in SQL, for example:
We have an issue on Snowflake for binary primary keys when the random row sample size is > 50:
We need a workaround for Snowflake engines.
cc: @henrihem
The text was updated successfully, but these errors were encountered: