Container & documentation #3

Zethson · 2021-10-25T07:55:57Z

Hi,

would it be possible that you provide a container with a full environment to get the pipeline running, please? A small example dataset would also help a lot.
Moreover, the pipeline parameters and flags etc are not document yet.

Thank you very much!

petervangalen · 2021-12-07T15:07:22Z

Do the recent updates answer your question? I believe the small example dataset is present in tests and can be run along these lines:

maegtk bcall -i data/test_maester.bam -o test_maester -z

Zethson · 2021-12-07T15:13:23Z

Currently doing other things. Will revisit this at a later point. Thank you for replying though!

JZL · 2021-12-14T15:12:15Z

On a similar note, adding the R package dependencies to the FAQ would be helpful. I was moving fast so didn't realize it also needed SummarizedExperiment so it almost finished and then failed

For what it's worth, I didn't do a full container but used conda-forge to install the dependencies for only this project, and it was nice. You could even set up a maegatk conda(-forge) package and it would take care of installing all the binary dependencies for future users

caleblareau · 2021-12-17T17:36:37Z

added some notes on the R package install-- thanks for that note @JZL

I don't have any direct experience making a conda package but would welcome a pull request from anyone for this!

JZL · 2021-12-17T18:02:59Z

Hi,

These are just some scattered thoughts getting it to run. I can make a more formal PR with these changes in the future

To get this running for future people. You also need to install samtools and run bwa index ***.fa first, else it fails.

Additionally:

pysam, bwa mem, and samtools sort support a threads argument which seemed to help speed it up a little. (Although I learned the hard way that if you do this within multi_file_manager it causes a thread bomb an crashes the machine!)
In

maegatk/maegatk/bin/python/split_barcoded_bam.py

Line 39 in 6662c7c

if read_barcode in bc:

within writePassingReads where it checks if read_barcode is in bc and then gets its index, when bc is a list it iteratively checks all the items for each call. Which is a lot of iterative checking for every read of big bams. I didn't profile the difference but I thought I might as well define bcDict = dict((x, i) for i, x in enumerate(bc)) and then use the dictionary to check if the read_barcode is in bc and to get the index in constant time

alecpnkw · 2024-03-29T02:59:14Z

Do the recent updates answer your question? I believe the small example dataset is present in tests and can be run along these lines:
maegtk bcall -i data/test_maester.bam -o test_maester -z

I am having a similar issue when trying to run this small test dataset as described. Is there a specific snakemake version that maegatk has been tested with? Trying to isolate what is going wrong for me. The pipeline seems to run fine for the 'splitting' part, but then fails to submit the snakemake job and perform the remaining analyses:

Thu Mar 28 22:36:19 EDT 2024: maegatk v0.2.0
Thu Mar 28 22:36:19 EDT 2024: Found bam file: tests/data/test_maester.bam for genotyping.
Thu Mar 28 22:36:19 EDT 2024: Will determine barcodes with at least: 100 mitochondrial reads.
Thu Mar 28 22:36:19 EDT 2024: User specified mitochondrial genome matches .bam file
Thu Mar 28 22:36:23 EDT 2024: Finished determining/splitting barcodes for genotyping.
Thu Mar 28 22:36:23 EDT 2024: Genotyping samples with 2 threads

where the above is the end of the log message. If I look in "final/" this is all I see:

[final]$ ls
barcodeQuants.tsv  chrM_refAllele.txt  passingBarcodes.tsv

My environment for running maegatk looks like this and is based on your description in the readme.

name: "maegatk"
channels: 
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - openjdk
  - python >= 3.8
  - snakemake == 6.15.5
  - bwa == 0.7.17
  - samtools >= 1.9
  - freebayes
  - r-base == 4.0.3 # dplyr, data.table, Matrix, GenomicRanges, SummarizedExperiment
  - pip:
    - maegatk == 0.2.0
    - ruamel.yaml <= 0.18.0

Thanks!

noranekonobokkusu · 2024-03-29T15:01:10Z

@alecpnkw do you have any log files in logs/? My snakemake version is 7.18.2, but I don't think maegatk was ever tested with a particular version of it.

If nothing informative comes from logs/, try modifying snakecmd_scatter and snakecmd_gather commands in your cli.py by providing a full path to snakemake in case your system doesn't see it, like

snakecmd_scatter = '/full/path/to/snakemake'+snakeclust+' --snakefile ' + script_dir + '/bin/snake/Snakefile.maegatk.Scatter --cores '+ncores+' --config cfp="'  + y_s + '" --stats '+snake_stats + snake_log_out

If that won't work either, try this:
Following this issue, try upgrading ruamel.yaml to the current version, then make the modifications described below, then remove the output directory and rerun it from scratch:

Replace your entire cli.py with the code from https://github.com/caleblareau/maegatk/blob/master/maegatk/cli.py
Modify oneSample_maegatk.py:
change this
from ruamel import yaml
into this
from ruamel.yaml import YAML
then change this
config = yaml.load(stream, Loader=yaml.Loader)
into this

yaml = YAML(typ='rt')
config = yaml.load(stream)

alecpnkw · 2024-03-29T16:14:11Z

Thanks, knowing that a later snakemake versions runs in helpful. I'd like to try to keep the source the same as v0.2.0 for deployment reasons but happy to test if I can't get anything else to work. Based on the issues you linked I tweaked my environment to change the ruamel.yaml version to 0.17 and now am getting better results (!). It looks like the final output is there (final/maegatk.rds), and final/maegatk.depthTable.txt looks like this:

TGGTAGTGAACC-1  57.35
GGGTTCGCCTCC-1  66.34
TTCCTACGCAAT-1  55.25
CCCGTCGTGGTA-1  53.17
CCACAAAACATG-1  48.12
GTACCCACAGCC-1  43.3
GGCGCTAATGAA-1  67.39
ACAATTAGCACT-1  51.39
CTAACCCGGAAT-1  42.79
GACCGTGCATTT-1  52.64

Still, I am not getting any logs inside logs/rmdupslogs and getting dumped cores that I think correspond to the java jobs. Should I be seeing logs for rmdups in the test output?

noranekonobokkusu · 2024-03-29T16:39:40Z

It seems your test job has finished successfully.
When I suggested checking the logs, I was referring to .log files, like maegatk.snakemake_scatter.log or maegatk.snakemake_gather.log which are usually helpful. I am not the original developer of maegatk so my knowledge on this part is rather limited and I cannot comment on the expected content of rmdupslogs/, but my rmdupslogs/ is empty too (at least for the test dataset).

alecpnkw · 2024-03-29T16:47:01Z

Awesome, thanks. Is there someplace I can access a test final/maegatk.rds file to make sure I'm generating all the intended assay slots and coldata?

Spirometra · 2024-04-08T23:20:03Z

@alecpnkw do you have any log files in logs/? My snakemake version is 7.18.2, but I don't think maegatk was ever tested with a particular version of it.

If nothing informative comes from logs/, try modifying snakecmd_scatter and snakecmd_gather commands in your cli.py by providing a full path to snakemake in case your system doesn't see it, like
snakecmd_scatter = '/full/path/to/snakemake'+snakeclust+' --snakefile ' + script_dir + '/bin/snake/Snakefile.maegatk.Scatter --cores '+ncores+' --config cfp="'  + y_s + '" --stats '+snake_stats + snake_log_out
If that won't work either, try this: Following this issue, try upgrading ruamel.yaml to the current version, then make the modifications described below, then remove the output directory and rerun it from scratch:

Replace your entire cli.py with the code from https://github.com/caleblareau/maegatk/blob/master/maegatk/cli.py

Modify oneSample_maegatk.py:
change this
from ruamel import yaml
into this
from ruamel.yaml import YAML
then change this
config = yaml.load(stream, Loader=yaml.Loader)
into this
yaml = YAML(typ='rt')
config = yaml.load(stream)

Hi there @noranekonobokkusu. I came across similar problems using the test data as you did in issue #5 . First I uninstalled ruamel.yaml (current version 0.18.6) and then installed ruamel.yaml==0.17 and still get the same issues, so back to 0.18.6. I've tried snakemake 6.6.1 and 7.18.2. I tried specifying snakemake path in the updated cli.py file, and the modification to the oneSample_maegatk.py as suggested. Still I get the same error output:

Error in checkGrep(grep(".A.txt", files)) :
Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted

base.maegatk.log does not show errors, however I do not see the .rds file in /final. but in maegatk.snakemake_gather.log I get an error about "Error in rule make_final_sparse_matrices"and in maegatk.snakemake_scatter.log some Java and samtools error.
I've tried version Java 1.7.0_261 and Java/1.8.0_144 (this is with 1.8.0_144).
See attached log files.

base.maegatk.log
maegatk.snakemake_gather (1).log
maegatk.snakemake_scatter (1).log

What would your recommendation be to get this to run? I'd really like to use this tool!

noranekonobokkusu · 2024-05-06T18:56:05Z

Hi @Spirometra sorry for the late reply.
I can see bwa is not available in your path (from .scatter.log):
sh: bwa: command not found
Try adding BWA to your environment and rerunning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container & documentation #3

Container & documentation #3

Zethson commented Oct 25, 2021

petervangalen commented Dec 7, 2021

Zethson commented Dec 7, 2021

JZL commented Dec 14, 2021

caleblareau commented Dec 17, 2021

JZL commented Dec 17, 2021

alecpnkw commented Mar 29, 2024

noranekonobokkusu commented Mar 29, 2024

alecpnkw commented Mar 29, 2024

noranekonobokkusu commented Mar 29, 2024

alecpnkw commented Mar 29, 2024

Spirometra commented Apr 8, 2024

noranekonobokkusu commented May 6, 2024

Container & documentation #3

Container & documentation #3

Comments

Zethson commented Oct 25, 2021

petervangalen commented Dec 7, 2021

Zethson commented Dec 7, 2021

JZL commented Dec 14, 2021

caleblareau commented Dec 17, 2021

JZL commented Dec 17, 2021

alecpnkw commented Mar 29, 2024

noranekonobokkusu commented Mar 29, 2024

alecpnkw commented Mar 29, 2024

noranekonobokkusu commented Mar 29, 2024

alecpnkw commented Mar 29, 2024

Spirometra commented Apr 8, 2024

noranekonobokkusu commented May 6, 2024