Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taiji fails during RNA-seq read of quant data #42

Open
dmalzl opened this issue Sep 1, 2023 · 1 comment
Open

Taiji fails during RNA-seq read of quant data #42

dmalzl opened this issue Sep 1, 2023 · 1 comment

Comments

@dmalzl
Copy link

dmalzl commented Sep 1, 2023

Hi,

I am currently trying to run Taiji on a set of WT and KO RNAseq and ATACseq data. To not mess with previous analyses I decided to use the already existing gene quantification, which I did with subreads featureCounts, and postprocessed it to adhere to the format detailed in the documentation (Here I assumed the gene expression to be raw number of reads judging from the integers used in the format description). ATAC-seq is also supplied as already aligned and duplicate filtered data.

The pipeline starts up and tries to read the RNA-seq data but fails with the following error:

0m31m1m[ERROR][09-01 14:03] 0m0m31mRNA_Read_Input(7785..) Failed: user error (call: remote process died: DiedException "Prelude.read: no parse")
CallStack (from HasCallStack):
  error, called at src/Control/Workflow/Interpreter/Exec.hs:146:37 in SciFlow-0.8.0 IRKsT2ba9M716PeGlwt2FT:Control.Workflow.Interpreter.Exec

I tried to debug it myself but unfortunately couldn't locate the source code for RNA_Read_Input and I have never worked with Haskell or the used workflow manager so I am quite lost here. Could you please look into it?

Please find the used config, input and an example of the RNA-seq quant tables attached (note that I had to change the suffixes to txt because github wouldn't let me upload tsv and yml files). RNA-seq quant results were processed by counting reads per exon and summing them per ensemble gene_id. The resulting table was then filtered to contain only those genes that had at least 1 read count in one of the samples (3 replicates per condition = 6 samples). The remaining genes were then mapped to their gene_name (i.e. gene_name attribute in the gtf file)

rnaseq_KO2.txt
taiji_input.txt
taiji_config.txt

@dmalzl
Copy link
Author

dmalzl commented Sep 1, 2023

Okay it seems I have solved it myself. The culprit here was that I provided the tag information in the format column which is not correct and obviously results in an ill-configured run. Renaming the format column to tags worked such that the pipeline now runs without problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant