Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome not found, Picard's command line syntax is changing, violing plot not generated, expression.tsv not generated #84

Closed
abmmki opened this issue Jun 27, 2019 · 8 comments

Comments

@abmmki
Copy link

abmmki commented Jun 27, 2019

Here I am reporting several Errors/warnings.
I am using Fedora 30. The Guideline (Wiki) asked to install miniconda3. I tried that way, but it could not get all required dependent packages. I installed them manually from bioconda/conda-forge. However, during "meta" step it complained for a missing library.so.1 .... (forgot exact name, not in my record now). So, I deleted miniconda, and then "conda create" a new environment and "conda activate myconda". Then I installed manually all the dependent packages, for example

conda install -c bioconda cutadapt

[0] Violin plot and expression.tsv not generated

However, it gave some errors and finally everything generated except the violin plot and the desired expression matrix file "..../sample/read/expression.tsv
all other expression.long, expression.mtx, barcode.tsv files are there. This happened for both UMI and READ.
Knee plot looks good
When I generated final report, it is showing the file ".../sample/expression.tsv" --- but actually no file there. May be the final report generate without checking existence of actual file? It also says there are files like

samples/....../Aligned.merged.bam
This file is also not there in the location ......... Well, final.bam is there but the report doesn't talk about that.
Note that I used my own downloaded genome with a GFP gene sequence inserted in genome.fa and also annothation there for in annotation.gtf
Base is Ensembl 37 and release 74. However, I used the file Homo_sapiens.GRCh37.74.dna.toplevel.fa.gz -- to cover additional genome locations like immunologic chromosomes etc there

[1] Genome fasta file could not download:

Well, next round I tried not my genome.fa file, but wanted the pipeline to download the genome (this time no additional sequence)
I tried human genome 37, ensembl release 74. So, the pipeline generated to download a file:

Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz

from following FTP:

http://ftp.ensembl.org/pub/release-74/fasta/homo_sapiens/dna/

However, actual file name there is (with ".74)
Homo_sapiens.GRCh37.74.dna.primary_assembly.fa.gz

I fixed it manually (not syntext regular expression)

[2] I am using Drop-seq_tools-2.3.0/ and latest picard tools in it (2.20.2)

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
INFO 2019-06-27 09:14:35 ReduceGtf

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:

[3] Violin plot not generated:

conda install -c bioconda r-seurat # could not make it **
conda install -c russh r-seurat
..... etc

I tried several repositories of conda. Also tried latest Seurat-3 and Old Seurat 2
I am using R 3.5.3. But now way it was installed.
Just saying that

PackagesNotFoundError: The following packages are not available from current channels:

  • r-seurat -> r-cowplot
  • r-seurat -> r-future.apply
  • r-seurat -> r-ggridges
  • r-seurat -> r-ica
  • r-seurat -> r-metap
  • r-seurat -> r-pbapply
  • r-seurat -> r-rsvd
  • r-seurat -> r-sctransform[version='>=0.2.0']
  • r-seurat -> r-sdmtools
  • r-seurat -> r-dosnow
  • r-seurat -> r-dtw
  • r-seurat -> r-hdf5r
  • r-seurat -> r-diffusionmap
  • r-seurat -> r-tclust

Well, those packages I have installed one by one manually, but still could not get Seurat installed. After installing those packages, Seurat installtion still not possible

conda install -c bioconda r-seurat

UnsatisfiableError: The following specifications were found to be in conflict:

  • r-seurat

Thanks,

@abmmki
Copy link
Author

abmmki commented Jun 27, 2019

it is libcrypto.so.1.0.0
MergeBamAlignment

ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

@abmmki
Copy link
Author

abmmki commented Jun 27, 2019

rule get_top_barcodes:
localrule multiqc_cutadapt_RNA:
...................
................
Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.

-- why this message don't know because I am using python 3.6

@abmmki
Copy link
Author

abmmki commented Jun 28, 2019

libcrypto.so.1.0.0
error could be solved by installing pysam from bioconda in conda environment.

I see "expression.tsv" file generated initially but replaced by long format file. I wish to keep wide for downstream analysis. Can convert from long to wide using library(tidyr) and other r packages though.

So, "final report" is showing path and names of temporary files as well. That is somewhat misleading.

Still need help on violon plot.

@Hoohm
Copy link
Owner

Hoohm commented Jul 9, 2019

Hey @abmmki

This is gonna be a tough one. The main purpose of conda is to help deal with those problematic package conflicts and all...

The main advice I would give you is to try to create the envs for each step then activate them manually for the specific step.

Is that not working for the violin plots?

@grst
Copy link

grst commented Jul 10, 2019

When installing from bioconda you'll always have to use conda-forge channel, too.
The problem with your missing R dependencies is most likely that they are only on conda-forge and not on bioconda. I.e. use

conda install -c bioconda -c conda-forge r-seurat

instead of just

conda install -c bioconda r-seurat

Also, what is the reason that you install packages manually?
Snakemake should do that for you.

@abmmki
Copy link
Author

abmmki commented Jul 27, 2019

OK, finally Seurat installed following ways (thnks @grst):

conda install -c conda-forge -c r -c bioconda r-seurat
conda install -c conda-forge icu=64.2

I re-run the last part of command ......... But Now it is giving another error:
Registered S3 method overwritten by 'R.oo':
method from
throw.default R.methodsS3
Error in CreateSeuratObject(raw.data = umi_matrix, meta.data = metaData) :
unused argument (raw.data = umi_matrix)
Execution halted

Error in rule violine_plots:
jobid: 55

.................
I see that following files are in place.

barcodes.tsv
expression.mtx
features.tsv

@seb-mueller
Copy link
Collaborator

seb-mueller commented Jul 27, 2019

Sorry for that struggle, it's rather difficult to manage versions with evolving software.
In this case, I suspect conda is automatically installing the latest Seurat (version 3) which has now gone live. However at the moment the pipeline is made for version 2 which has become incompatible. An update is on the todo list, but for the time being, could you try version 2, probably below might do the trick (I haven't tested it though):

conda install -c conda-forge -c r -c bioconda r-seurat=2

@Hoohm , this might maybe also explain the empty seurat objects you mentioned.
Can you check which version you are using? At this point we probably need to specifiy the Seurat version in the here:

TomKellyGenetics added a commit to TomKellyGenetics/dropSeqPipe that referenced this issue Nov 15, 2019
@Hoohm
Copy link
Owner

Hoohm commented Dec 27, 2019

Closing this for now since it seems to be fixed.

@Hoohm Hoohm closed this as completed Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants