Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produced VCFs are claimed to be malformed by IGV #57

Open
priesgo opened this issue Oct 24, 2023 · 2 comments
Open

Produced VCFs are claimed to be malformed by IGV #57

priesgo opened this issue Oct 24, 2023 · 2 comments

Comments

@priesgo
Copy link
Member

priesgo commented Oct 24, 2023

When trying to load a VCF in IGV it gives the following error message:

The provided VCF file is malformed at approximately line number 69: The VCF specification does not allow for whitespace in the INFO field. Offending field value was "DP=29;AF=0.103448;SB=0;DP4=13,13,1,2;INDEL;HRUN=5;ANN=C|frameshift_variant|HIGH|ORF1ab|gene-GU280_gp01|transcript|TRANSCRIPT_gene-GU280_gp01|protein_coding|1/1|c.10122delT|p.S3376fs|10122/21290|10122/21290|3374/7095||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS;LOF=(ORF1ab|gene-GU280_gp01|1|1.00);CONS_HMM_SARS_COV_2=0.57215;CONS_HMM_SARBECOVIRUS=0.57215;CONS_HMM_VERTEBRATE_COV=0;PFAM_NAME=Peptidase_C30_CoV;PFAM_DESCRIPTION=Peptidase C30,coronavirus;vafator_af=0.103448;vafator_ac=3;vafator_dp=29",

Apparently, the PFAM_DESCRIPTION field does contain white spaces. A possible solution would affect both the pipeline and the processor. The pipeline would need to generate valid VCF. For instance replacing white spaces by under scores. The processor would need to replace back the under scores into white spaces when loading the data into the database. One possible problem in this implementation is that there may be other under scores in INFO fields that we don't want to replace by white spaces.

@priesgo
Copy link
Member Author

priesgo commented Oct 24, 2023

Three options at least:

  • Escape white spaces with something like under scores
  • Escape white spaces with HTML codes and expect that IGV&friends parse this properly
  • Integrate pfam annotations in SnpEff reference for SARS-CoV-2 and see what SnpEff does with white spaces

@priesgo
Copy link
Member Author

priesgo commented Oct 24, 2023

Fourth option: remove the Pfam long description altogether if not used in the dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant