Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro test #7192

Merged
merged 20 commits into from
Apr 22, 2021
Merged

Avro test #7192

merged 20 commits into from
Apr 22, 2021

Conversation

RoriCremer
Copy link
Contributor

No description provided.

@gatk-bot
Copy link

gatk-bot commented Apr 9, 2021

Travis reported job failures from build 33622
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 33622.13 logs
unit openjdk8 33622.3 logs

@@ -0,0 +1,100 @@
30,HG00561
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably could use just a few samples, and use a subsetted interval list, so that the test files can be small. there's a small interval list in the CreateVariantIngestFiles integration test resources, if that helps!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! Just take a few ( < 5 ) samples and just a few sites (100s)

public class ExtractCohortEngineTest extends CommandLineProgramTest {

private static final String cohortAvroFileName = // is it okay to just grab one?
"src/test/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/AnVIL_WGS_100_exported_cohort/000000000000";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use getToolTestDataDir() to get the path

@gatk-bot
Copy link

gatk-bot commented Apr 12, 2021

Travis reported job failures from build 33661
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 33661.13 logs
unit openjdk8 33661.3 logs

@gatk-bot
Copy link

gatk-bot commented Apr 12, 2021

Travis reported job failures from build 33681
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 33681.13 logs
unit openjdk8 33681.3 logs

@gatk-bot
Copy link

gatk-bot commented Apr 14, 2021

Travis reported job failures from build 33742
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 33742.13 logs
unit openjdk8 33742.3 logs

@RoriCremer RoriCremer marked this pull request as ready for review April 14, 2021 21:28
@gatk-bot
Copy link

gatk-bot commented Apr 14, 2021

Travis reported job failures from build 33764
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 33764.13 logs
integration openjdk11 33764.12 logs

@kcibul
Copy link
Contributor

kcibul commented Apr 16, 2021

One concern I have is the maintainability of the test (having been burned by this in other places myself). When we add a new output field, etc we need a very easy way to update/generate these results. At the very least some instructions would be helpful (and imagine someone to follow those as part of a PR)

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some small comments, and I think we shouldn't have a 23 MB test file, but other than that 👍

// if there is a avro file, the BQ specific parameters are unnecessary,
// but they all are required if there is no avro file
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) {
throw new UserException("a project id and cohort table are required if no avro file is provided");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super small, but for ease of the user, could you provide the argument flags corresponding to project id and cohort table? e.g. Project id (--project-id) and cohort table (--cohort-extract-table) are required if no avro file is provided.


class ExtractCohortTest extends CommandLineProgramTest {
private final String prefix = getToolTestDataDir();
private final String cohortAvroFileName = prefix +"000000000000.avro";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is 22.8 MB - can we make it smaller?

)
private String cohortTable = null;

@Argument(
fullName = "cohort-avro-file-name",
doc = "Path of the cohort avro file",
mutex={"cohort-extract-table"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm so sorry to be this person but can you put spaces before and after the = on all these mutex sets so it's consistent?

// but they all are required if there is no avro file
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) {
throw new UserException("Project id (--project-id) " +
"and cohort table (--cohort-extract-table) are required if no avro file is provided.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome -- could you also add a similar check for for the sample-file, sample-table parameters? I think it's the same logic (if you don't give a sample file, you need both a sample-table and a project-id)

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Rori! Just make sure that after removing output.vcf from your commit/PR that the test still pass.

// but they all are required if there is no avro file
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) {
throw new UserException("Project id (--project-id) " +
"and cohort table (--cohort-extract-table) are required if no avro file is provided.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome -- could you also add a similar check for for the sample-file, sample-table parameters? I think it's the same logic (if you don't give a sample file, you need both a sample-table and a project-id)


@Test
public void testFinalVCFfromAvro() throws Exception {
// To create the expected output file--create a temp table in BQ with the folowing query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome

@@ -0,0 +1,3 @@
2,HG00408
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be sorted?

runCommandLine(args);

final File expectedVCF = getTestFile("expected.vcf");
final File outputVCF = getTestFile("output.vcf");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to get the output.vcf that you checked in as part of this commit? It should be the output of the run above (which I think won't live in the same place as getTestFile(). You can tell for sure by removing src/test/resources/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohort/output.vcf from your PR (which I think shouldn't be commited anyway).

.add("sample-file", sampleFile);

runCommandLine(args);
IntegrationTestSpec.assertEqualTextFiles(outputVCF, expectedVCF);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you also check the index file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like it would be outside the scope of the test no? a GATK issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, totally - i just wasn't sure why you were checking in the expected index file in that case

@RoriCremer RoriCremer merged commit d3affb1 into ah_var_store Apr 22, 2021
@RoriCremer RoriCremer deleted the rc-avro-test branch April 22, 2021 04:15
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants