Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Add cross engine tests using BigQuery target #843

Merged
merged 12 commits into from
May 12, 2023

Conversation

nj1973
Copy link
Contributor

@nj1973 nj1973 commented May 11, 2023

Prior to this PR all integration tests validate an engine with itself. This is good for proving things like data type and function coverage but doesn't ensure end-to-end comparisons are good.

Based on the assumption that BigQuery is our most mature engine support then we've decided to add cross engine comparisons for other engines to BigQuery. If they all match BigQuery then they should all match each other.

We've used our CORE_DVT_TYPES test table for these comparisons. There will be a requirement for some other cross engine tests, for example for engine specific data types that are not covered by our CORE_DVT_TYPES test table. These can come later via different issues.

It is worth noting that the addition of 3 new Hive tests caused the integration tests to hit the 2 hour timeout. Because of this I've disabled the original Hive to Hive integration tests in favour of the new Hive to BigQuery versions. This is not ideal because there are a small number of columns excluded from the Hive to BigQuery tests that are no longer tested. Happy to take guidance on any better solution.

I've spun a number of new issues out of this work and left comments in the new tests where workarounds have been implemented.

@nj1973 nj1973 linked an issue May 11, 2023 that may be closed by this pull request
@nj1973
Copy link
Contributor Author

nj1973 commented May 12, 2023

/gcbrun

@nj1973 nj1973 requested a review from nehanene15 May 12, 2023 09:05
Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I agree with the route you took to disable the homogenous Hive tests; I wonder if making our Dataproc cluster bigger would help with the slow query time.

@nj1973
Copy link
Contributor Author

nj1973 commented May 12, 2023

This looks great. I agree with the route you took to disable the homogenous Hive tests; I wonder if making our Dataproc cluster bigger would help with the slow query time.

I wondered the same. But I don't know if increasing the instance core count will automatically increase the cores available to Yarn or whether we need to change config. And I couldn't figure out how to get into the UI. In short, I gave up and disabled the tests :-D

@nj1973 nj1973 merged commit b3a828c into develop May 12, 2023
@nj1973 nj1973 deleted the test/issue-831-cross-engine-tests branch May 12, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: Create cross engine integration tests
2 participants