Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixes bug in get_max_in_list_size #1158

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

nj1973
Copy link
Contributor

@nj1973 nj1973 commented Jun 3, 2024

This PR fixes a bug in the code introduced in #1146 and also caters for Oracle IN list limit of 1000.

Closes #1157

@nj1973 nj1973 requested a review from a team as a code owner June 3, 2024 10:09
@nj1973 nj1973 linked an issue Jun 3, 2024 that may be closed by this pull request
@nj1973
Copy link
Contributor Author

nj1973 commented Jun 3, 2024

/gcbrun

@helensilva14
Copy link
Contributor

/gcbrun

Copy link
Contributor

@helensilva14 helensilva14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job again Neil!

# getattr(..., "cast") expression above is looking for lists where the contents are casts and not simple literals.
return 50
else:
return 16000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know: how did you find this other number for the list size? It would be helpful to put some sort of product documentation as a comment for future reference, but I could only find community posts mentioning it like this one: https://community.snowflake.com/s/question/0D5Do00000O3sJ0KAJ/how-o-get-around-th-16k-limit-on-number-of-expressions-in-in-clause-in-sql-generated-by-a-businessobjects-report

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not the same post but I saw another 2 Snowflake community posts.

I don't think we need to document it because this code is ensuring that we don't hit the 16k limit, at 16,000 we'll switch to:

WHERE id IN (1, ..., 16000) OR id IN (16001, ..., n)

We will find other limits for other engines I'm sure. I started to read about SQL Server but that has a memory limit based on the execution plan and not a hard/fixed limit, so I decided to just include the limits I knew about for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it makes sense, it's clearer for me now, thank you!

And for sure, totally agree that we might face it for other engines but at the moment we can stick with the ones we're aware of the values and usage

@helensilva14
Copy link
Contributor

/gcbrun

@helensilva14 helensilva14 merged commit 973e6b6 into develop Jun 4, 2024
5 checks passed
@helensilva14 helensilva14 deleted the 1157-random-row-oracle-1001-ids branch June 4, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random row validations fail on Oracle when sampling 1001+ ids
2 participants