Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Error #1

Closed
davidepisu opened this issue Sep 10, 2017 · 14 comments
Closed

Pipeline Error #1

davidepisu opened this issue Sep 10, 2017 · 14 comments

Comments

@davidepisu
Copy link

Got this error while generating the Expression Matrix:

[Sun Sep 10 01:02:04 2017] Finished job 1.
[Sun Sep 10 01:02:04 2017] 4 of 5 steps (80%) done
[Sun Sep 10 01:02:04 2017]
[Sun Sep 10 01:02:04 2017] localrule all:
input: logs/MLW12_hist_out_cell.txt
log: logs/Dropseq_post_align.log
jobid: 0
[Sun Sep 10 01:02:04 2017]
[Sun Sep 10 01:02:04 2017] Finished job 0.
[Sun Sep 10 01:02:04 2017] 5 of 5 steps (100%) done
Mode is generate-plots
Generating multiqc report
[INFO ] multiqc : This is MultiQC v1.2
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '/SSD/MLW12/logs'
[INFO ] multiqc : Searching '/SSD/MLW12/summary'
Searching 62 files.. [####################################] 100%
[INFO ] star : Found 2 reports
[INFO ] fastqc : Found 2 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : MLW12/multiqc_report.html
[INFO ] multiqc : Data : MLW12/multiqc_data
[INFO ] multiqc : MultiQC complete
Extracting expression
[Sun Sep 10 01:02:43 2017] Provided cores: 20
[Sun Sep 10 01:02:43 2017] Rules claiming more threads will be scaled down.
[Sun Sep 10 01:02:43 2017] Job counts:
count jobs
1 all
1 extract_expression
1 extract_umi_per_gene
1 gunzip
4
[Sun Sep 10 01:02:43 2017]
[Sun Sep 10 01:02:43 2017] rule extract_umi_per_gene:
input: MLW12_final.bam
output: logs/MLW12_umi_per_gene.tsv
jobid: 1
wildcards: sample=MLW12
[Sun Sep 10 01:02:43 2017]
[Sun Sep 10 01:02:43 2017] /programs/Drop-seq_tools-1.12/GatherMolecularBarcodeDistributionByGene I=MLW12_final.bam O=logs/MLW12_umi_per_gene.tsv CELL_BC_FILE=summary/MLW12_barcodes.csv
[Sun Sep 10 01:02:43 2017] rule extract_expression:
input: MLW12_final.bam
output: summary/MLW12_expression_matrix.txt.gz
jobid: 3
wildcards: sample=MLW12
[Sun Sep 10 01:02:43 2017]
[Sun Sep 10 01:02:43 2017] /programs/Drop-seq_tools-1.12/DigitalExpression I=MLW12_final.bam O=summary/MLW12_expression_matrix.txt.gz SUMMARY=summary/MLW12_dge.summary.txt CELL_BC_FILE=summary/MLW12_barcodes.csv MIN_BC_READ_THRESHOLD=1
[Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.DigitalExpression SUMMARY=summary/MLW12_dge.summary.txt OUTPUT=summary/MLW12_expression_matrix.txt.gz INPUT=MLW12_final.bam MIN_BC_READ_THRESHOLD=1 CELL_BC_FILE=summary/MLW12_barcodes.csv OUTPUT_READS_INSTEAD=false CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM GENE_EXON_TAG=GE STRAND_TAG=GS EDIT_DISTANCE=1 READ_MQ=10 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene OUTPUT=logs/MLW12_umi_per_gene.tsv INPUT=MLW12_final.bam CELL_BC_FILE=summary/MLW12_barcodes.csv CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM GENE_EXON_TAG=GE STRAND_TAG=GS EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Sun Sep 10 01:02:44 EDT 2017] Executing as [email protected] on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 1.12(d3aeea7_1452606774) IntelDeflater
[Sun Sep 10 01:02:44 EDT 2017] Executing as [email protected] on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 1.12(d3aeea7_1452606774) IntelDeflater
[Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.DigitalExpression done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2022178816
Exception in thread "main" [Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2022178816
Exception in thread "main" htsjdk.samtools.SAMException: Error opening file: MLW12_barcodes.csvhtsjdk.samtools.SAMException: Error opening file: MLW12_barcodes.csv

at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:501)	at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:501)

at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)	at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)

at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)	at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)

at picard.util.BasicInputParser.<init>(BasicInputParser.java:91)	at picard.util.BasicInputParser.<init>(BasicInputParser.java:91)

at org.broadinstitute.dropseqrna.barnyard.ParseBarcodeFile.readCellBarcodeFile(ParseBarcodeFile.java:13)	at org.broadinstitute.dropseqrna.barnyard.ParseBarcodeFile.readCellBarcodeFile(ParseBarcodeFile.java:13)

at org.broadinstitute.dropseqrna.barnyard.BarcodeListRetrieval.getCellBarcodes(BarcodeListRetrieval.java:47)	at org.broadinstitute.dropseqrna.barnyard.BarcodeListRetrieval.getCellBarcodes(BarcodeListRetrieval.java:47)

at org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene.doWork(GatherMolecularBarcodeDistributionByGene.java:55)	at org.broadinstitute.dropseqrna.barnyard.DigitalExpression.doWork(DigitalExpression.java:74)

at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)

at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)

at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:29)	at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:29)

Caused by: java.io.FileNotFoundException: summary/MLW12_barcodes.csv (No such file or directory)Caused by: java.io.FileNotFoundException: summary/MLW12_barcodes.csv (No such file or directory)

at java.io.FileInputStream.open0(Native Method)	at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)	at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:497)
at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:497)
... 9 more
... 9 more

[Sun Sep 10 01:02:44 2017] Error in job extract_expression while creating output file summary/MLW12_expression_matrix.txt.gz.
[Sun Sep 10 01:02:44 2017] Error in job extract_umi_per_gene while creating output file logs/MLW12_umi_per_gene.tsv.
[Sun Sep 10 01:02:44 2017] RuleException:
CalledProcessError in line 21 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake:
Command '/programs/Drop-seq_tools-1.12/DigitalExpression I=MLW12_final.bam O=summary/MLW12_expression_matrix.txt.gz SUMMARY=summary/MLW12_dge.summary.txt CELL_BC_FILE=summary/MLW12_barcodes.csv MIN_BC_READ_THRESHOLD=1' returned non-zero exit status 1.
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake", line 21, in __rule_extract_expression
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run
[Sun Sep 10 01:02:44 2017] RuleException:
CalledProcessError in line 34 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake:
Command '/programs/Drop-seq_tools-1.12/GatherMolecularBarcodeDistributionByGene I=MLW12_final.bam O=logs/MLW12_umi_per_gene.tsv CELL_BC_FILE=summary/MLW12_barcodes.csv' returned non-zero exit status 1.
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake", line 34, in __rule_extract_umi_per_gene
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run
[Sun Sep 10 01:02:44 2017] Removing output files of failed job extract_umi_per_gene since they might be corrupted:
logs/MLW12_umi_per_gene.tsv
[Sun Sep 10 01:02:44 2017] Will exit after finishing currently running jobs.
[Sun Sep 10 01:02:44 2017] Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in
load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')()
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 223, in main
shell(extract_expression_single)
File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake --cores 20 -pT -d /SSD/MLW12 --configfile /SSD/local.yaml ' returned non-zero exit status 1.

@Hoohm
Copy link
Owner

Hoohm commented Sep 12, 2017

Normally, the generate-plot should create a file in the summary file. So there is something wrong there
Something is really odd, you don't get any errors while running the generate-plots mode?

@Hoohm
Copy link
Owner

Hoohm commented Sep 13, 2017

Can you provide the config.yaml file?
I'm thinking maybe your datatype value is wrong.
Is it SingleCell instead of singleCell?

@davidepisu
Copy link
Author

I can't attach the file here. I copied the settings from here: https://github.com/Hoohm/dropSeqPipe/wiki/Create-config-files

Anyway this is my config file:

Samples:
MLW15:
fraction: 0.001
expected_cells: 2000
GENOMEREF: /SSD/ref/genome.fa
REFFLAT: /SSD/ref/annotation.refFlat
RRNAINTERVALS: /SSD/ref/genome.rRNA.intervals
METAREF: /SSD/ref/STAR_INDEX_NO_GTF/
GTF: /SSD/ref/annotation.gtf
SPECIES:
- HUMAN
GLOBAL:
5PrimeSmartAdapter: AAGCAGTGGTATCAACGCAGAGT
data_type: SingleCell
allowed_aligner_mismatch: 10
min_count_per_umi: 1
Cell_barcode:
start: 1
end: 12
min_quality: 10
num_below_quality: 1
UMI:
start: 13
end: 20
min_quality: 10
num_below_quality: 1

@Hoohm
Copy link
Owner

Hoohm commented Sep 13, 2017

Ok, so datatype has to be either bulk or singleCell. And it is case sensitive.
I will put some checks in.

@davidepisu
Copy link
Author

Ok, I can try running the pipeline on another sample, setting singleCell instead of SingleCell in the config file.

@Hoohm
Copy link
Owner

Hoohm commented Sep 13, 2017

Added a check for Data_type value.
Please let me know if that fixed the issue.

@davidepisu
Copy link
Author

Still getting the error at the fastqc...

/programs/FastQC-0.11.5/ MLW4_R1.fastq.gz MLW4_R2.fastq.gz -t 2 -o logs --extract
/bin/bash: /programs/FastQC-0.11.5/: Is a directory
[Sat Oct 14 23:39:40 2017] Error in job fastqc while creating output file logs/MLW4_R1_fastqc.html.
[Sat Oct 14 23:39:40 2017] RuleException:
CalledProcessError in line 23 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake:
Command '/programs/FastQC-0.11.5/ MLW4_R1.fastq.gz MLW4_R2.fastq.gz -t 2 -o logs --extract' returned non-zero exit status 126.
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake", line 23, in __rule_fastqc
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run
[Sat Oct 14 23:39:40 2017] Will exit after finishing currently running jobs.
[Sat Oct 14 23:39:40 2017] Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in
load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')()
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 113, in main
shell(fastqc)
File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake --cores 60 -pT -d /SSD/MLW4 --configfile /SSD/local.yaml ' returned non-zero exit status 1.

Pipeline has been updated to 0.24

@Hoohm
Copy link
Owner

Hoohm commented Oct 16, 2017

Oh, I see now.
Your fastqc path is wrong. You probably used something like: /path/to/fastqcFOLDER
You should have /path/to/fastqc
fastqc should be the executable.

@davidepisu
Copy link
Author

Oh ok, now I get the following error:

Mode is generate-plots
Plotting knee plots
Error in file(con, "r") : cannot open the connection
Calls: yaml.load_file -> yaml.load -> paste -> readLines -> file
In addition: Warning message:
In file(con, "r") :
cannot open file '/SSD/MLW4config.yaml': No such file or directory
Execution halted
Traceback (most recent call last):
File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in
load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')()
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 180, in main
shell(knee_plot)
File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'Rscript /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Rscripts/singleCell/knee_plot.R /SSD/MLW4' returned non-zero exit status 1.

@Hoohm
Copy link
Owner

Hoohm commented Oct 16, 2017

Hello,
I know there is some error handling to do but this one is actually pretty straight forward:
cannot open file '/SSD/MLW4config.yaml': No such file or directory
This means you forgot the slash at the end of your -f arg.
You should use -f /SSD/MLW4/ instead of -f /SSD/MLW4

@davidepisu
Copy link
Author

Gotcha, I think the problems are arising from a bad configuration file anyway. Now I get the following:

Mode is generate-plots
Plotting knee plots
Warning message:
In readLines(input, encoding = "UTF-8") :
incomplete final line found on '/SSD/MLW10/config.yaml'
Warning message:
Removed 1425492 rows containing missing values (geom_point).
Plotting base stats
Loading required package: magrittr
Warning message:
In readLines(input, encoding = "UTF-8") :
incomplete final line found on '/SSD/MLW10/config.yaml'
Error in mmm < each : comparison of these types is not implemented
Calls: plotRNAMetrics ... Reduce -> f -> rbind_gtable -> compare_unit -> unit -> comp
Execution halted
Traceback (most recent call last):
File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in
load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')()
File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 182, in main
shell(base_summary)
File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'Rscript /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Rscripts/singleCell/rna_metrics.R /SSD/MLW10/' returned non-zero exit status 1.

My config.yaml is as follows:

Samples:
MLW10:
fraction: 0.001
expected_cells: 2000
GENOMEREF: /SSD/ref/genome.fa
REFFLAT: /SSD/ref/annotation.refFlat
RRNAINTERVALS: /SSD/ref/genome.rRNA.intervals
METAREF: /SSD/ref/STAR_INDEX_NO_GTF/
GTF: /SSD/ref/annotation.gtf
SPECIES:
- HUMAN
GLOBAL:
5PrimeSmartAdapter: AAGCAGTGGTATCAACGCAGAGT
data_type: singleCell
allowed_aligner_mismatch: 10
min_count_per_umi: 1
Cell_barcode:
start: 1
end: 12
min_quality: 10
num_below_quality: 1
UMI:
start: 13
end: 20
min_quality: 10
num_below_quality: 1

So I don't get which lines I'm missing.....

@Hoohm
Copy link
Owner

Hoohm commented Nov 2, 2017

@davidepisu the issue should be resolved thanks to @duyck
Did it fix it for you?

@Hoohm
Copy link
Owner

Hoohm commented Jan 23, 2018

Hello @davidepisu,
could you test it out on the new version and tell me if it's fixed?

@Hoohm
Copy link
Owner

Hoohm commented Mar 21, 2018

No response so I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants