Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a solution for multiple BigQuery versions #171

Closed
arostamianfar opened this issue Apr 5, 2018 · 8 comments
Closed

Find a solution for multiple BigQuery versions #171

arostamianfar opened this issue Apr 5, 2018 · 8 comments

Comments

@arostamianfar
Copy link
Contributor

In our integration tests, we rely on google-cloud-bigquery>0.25, however, Beam needs BigQuery google-cloud-bigquery==0.25. Even though the two code paths don't currently connect, we should find some way to handle both versions. We currently have an [int_test] target in our setup.py to handle this case.

We can also ask the Beam team whether actually installing bigquery>0.25 is ok (there is a comment that it's only used for tests?).

@samanvp
Copy link
Member

samanvp commented Jun 20, 2018

An issue was created for this problem but it seems there is no activity on it:
https://issues.apache.org/jira/browse/BEAM-4492

@samanvp
Copy link
Member

samanvp commented Jun 25, 2018

It seems updating version is not trivial because of intertwined dependencies of:

  • google-cloud-bigquery and
  • google-cloud-pubsub

on google-cloud-core

If we add google-cloud-bigquery>0.25 then we will get this conflict:

google-cloud-bigquery 1.3.0 has requirement google-cloud-core<0.29dev,>=0.28.0, but you'll have google-cloud-core 0.25.0 which is incompatible.

which according to this comment is needed by google-cloud-pubsub used in apache_beam source code.

@bashir2
Copy link
Member

bashir2 commented Jun 29, 2018

Re. the comment you mentioned, there is a background why google-cloud-storage is even there as a dependency:

  • google-cloud-storage is only used in integration tests so is not really a dependency of our production code.
  • It was first added to the int_test dependencies in PR Add preprocessor integration tests #206 #237, but it was failing one of the import checks, so @allieychen tried to exclude integration test from this check in that PR.
  • That solution did not work (for reasons that I do not exactly remember) and so google-cloud-storage was moved to general dependencies in PR Integration tests fail #244.

So since we now have a version conflict issue, maybe we should consider that int_test based solution again. @allieychen can you please comment what was eventually wrong with that approach?

@allieychen
Copy link
Contributor

The int_test based solution fails because of one import error (Travis error example). It happens because one integration test script run_preprocessor_tests (it imports google.cloud.storage) is treated as unit test script.

One attempt was made in PR 237. The intention was to exclude some scripts for unit tests. But the changes (check setup.py in PR 237) only exclude the files for building/distributing purposes. So this is not the correct way to exclude files for unit tests.

Another possible way could be removing tests/replacing tests with some other word in the script name.

@arostamianfar
Copy link
Contributor Author

RE unit test update: I think it may be better to update our unit testing framework to not use Nose as there was a recent thread on the Beam dev site as well about this....We can wait and see what Beam decides to do and follow them (https://issues.apache.org/jira/browse/BEAM-3713).

RE BigQuery version: Saman and I discussed this offline, and it may be easier to just use the bq update CLI command for now until https://issues.apache.org/jira/browse/BEAM-4492 is fixed.

@bashir2
Copy link
Member

bashir2 commented Jul 4, 2018

Regardless of whether we change our unit-test framework, we should drop the google-cloud-storage dependency from our production code. In general, we should keep the dependencies to a minimum and as I have commented in Issue #71, make sure pip install . does not produce any warnings/errors. For example, even without the BigQuery update that @samanvp mentioned, we currently have oauth2client version issues and one can argue that we are just lucky that this is working.

@arostamianfar
Copy link
Contributor Author

RE google-cloud-storage dependency: I agree. Could we just use Beam's FileSystem instead? It adds a dependency on Beam, but would be simpler to work with and we'd just let Beam deal with the dependency issues.

RE Issue #71: Also agree, but that's a separate and more general issue to be resolved in #71. This issue is mainly for the particular BQ dependency.

@allieychen
Copy link
Contributor

Resolved after updating Beam version PR 470.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants