Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incompatible httplib2 versions #564

Merged
merged 1 commit into from
Mar 12, 2020
Merged

Conversation

samanvp
Copy link
Member

@samanvp samanvp commented Mar 12, 2020

Our build is broken due to this error message:

pkg_resources.ContextualVersionConflict: (httplib2 0.17.0 (/home/travis/build/googlegenomics/gcp-variant-transforms/.eggs/httplib2-0.17.0-py2.7.egg), Requirement.parse('httplib2<=0.12.0,>=0.8'), set(['apache-beam']))
Coverage.py warning: No data was collected. (no-data-collected)

For more info please refer to #71 (comment)

Our build is broken due to this error message:

pkg_resources.ContextualVersionConflict: (httplib2 0.17.0 (/home/travis/build/googlegenomics/gcp-variant-transforms/.eggs/httplib2-0.17.0-py2.7.egg), Requirement.parse('httplib2<=0.12.0,>=0.8'), set(['apache-beam']))
Coverage.py warning: No data was collected. (no-data-collected)

For more information please refer to issue googlegenomics#71
@samanvp samanvp requested a review from tneymanov March 12, 2020 19:15
samanvp added a commit to samanvp/gcp-variant-transforms that referenced this pull request Mar 12, 2020
@samanvp samanvp merged commit 8daf1a7 into googlegenomics:master Mar 12, 2020
@samanvp samanvp deleted the fix_build branch March 12, 2020 20:42
samanvp added a commit to samanvp/gcp-variant-transforms that referenced this pull request Mar 18, 2020
samanvp added a commit that referenced this pull request Mar 18, 2020
* Return all tests

but the following:
 * option_update_schema_on_append
 * test_non_splittable_gzip
 * test_splittable_gzip

Also adding a new test:
* option_append_to_table

* Add --keep_intermediate_avro_files flag

Also refactor vcf_to_bq.py a little bit to make it more organized. Also added enough checks to avro load to BQ stage to recover from failures.

* Run AVRO load jobs to BigQuery in parallel

Our tests shows that for big inputs (for example 1000Genome) loading AVRO files to BigQuery can take up to 6 mintue per chromosome.
If we run 24 jobs serially, the delay could be significant.
For this reason in this commit we modify the code to run the load jobs in parallel.

* Remove platinum-no-merge test

It seems majority of failed load AVRO jobs belongs to this tests.
Still investigating, removed temporarily.

So far we disabled the following tests, still investigating:
test_non_splittable_gzip
test_splittable_gzip

Temporarily returned:
platinum_no_merge

* Limit the number of concurrent load avro jobs

To avoid failing load jobs which seems to be caused by too many
concurrent jobs.

* Use BigQuery API for loading AVRO files and creating BQ tables

instead of using `bq load ...` command.

* To sync with RP #564

* Second round of review

* Sync to #566
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants