-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add cross engine tests using BigQuery target #843
Conversation
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I agree with the route you took to disable the homogenous Hive tests; I wonder if making our Dataproc cluster bigger would help with the slow query time.
I wondered the same. But I don't know if increasing the instance core count will automatically increase the cores available to Yarn or whether we need to change config. And I couldn't figure out how to get into the UI. In short, I gave up and disabled the tests :-D |
Prior to this PR all integration tests validate an engine with itself. This is good for proving things like data type and function coverage but doesn't ensure end-to-end comparisons are good.
Based on the assumption that BigQuery is our most mature engine support then we've decided to add cross engine comparisons for other engines to BigQuery. If they all match BigQuery then they should all match each other.
We've used our CORE_DVT_TYPES test table for these comparisons. There will be a requirement for some other cross engine tests, for example for engine specific data types that are not covered by our CORE_DVT_TYPES test table. These can come later via different issues.
It is worth noting that the addition of 3 new Hive tests caused the integration tests to hit the 2 hour timeout. Because of this I've disabled the original Hive to Hive integration tests in favour of the new Hive to BigQuery versions. This is not ideal because there are a small number of columns excluded from the Hive to BigQuery tests that are no longer tested. Happy to take guidance on any better solution.
I've spun a number of new issues out of this work and left comments in the new tests where workarounds have been implemented.