-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with nf-core pipelines #34
Comments
Dear Gregor, Thanks for your interest in checkatlas ! I would love to make it compatible with your two nf-core pipelines. It is actually a planned development to integrate the spatial. As we are using more and more these type of data in the lab. You are totally right ! Nextflow should be on top of the checkatlas pipeline. I first developed checkatlas as a stand-alone python program and then added nextflow in the last months. Not being a nextflow expert, I implemented it, the fastest way i knew. Sadly, I do not have any grant to found this project. So I am alone on the development. If you feel that checkatlas would be a great addition to your pipeline. I would love to interact with you and make it "nf-core" compatible. Some help would be more than useful ! DM me : becavin AT ipmc DOT cnrs DOT fr |
Hi @drbecavin, I now had a chance to play around with checkatlas a bit more. I now understood that the nextflow part is optional and it also runs quite well without it. For moving forward, we mainly need two things:
Once we have that, I would create a module on |
Hello @grst The nextflow is essentially there when you want to calculate extensive metrics for classification, annotation and dim reduction. In these cases it takes a lot of time and one need to parallelise the datasets. Ok, next week I will work on :
I can also start a draft nextflow pipeline for the whole checkatlas run (replacing my current nextflow.nf). Which you would be able to use and improve. I'll get you posted. |
in that case consider starting from the nf-core template. It may seem a bit overwhelming in the beginning, but the community figured out a lot of things to make it easy to run it seamlessly across different setups. Also it makes it easier to publish it as an nf-core pipeline later. |
Alright ! I will do that ! |
Hi @drbecavin,
I am an active contributor to the nf-core project and have been working on the scRNA-seq and spatialtranscriptomics pipelins in the past. For both pipelines, we are considering to integrate
checkatlas
to generate MultiQC reports (see nf-core/scrnaseq#80 and nf-core/spatialvi#40).From what I understood, the checkatlas architecture is rather complex, consisting of
h5ad
object and computes various QC metricsTo integrate checkatlas in one of our pipelines, we need to define a nextflow module that takes h5ad files as input, and generates files that can be ingested by a downstream MultiQC process. In addition we need a standalone container including all required dependencies (see also #25).
While it would be totally possible to create a container that contains both the Python dependencies, nextflow+java and R dependencies it seems a bit convoluted to run a nextflow workflow that starts a docker container that runs a python script that runs a nextflow workflow that runs another python script. It's also suboptimal in terms of resource management, because the checkatlas-nextflow running in the container cannot make use of the cluster/cloud scheduler the "outer" nextflow pipeline was configured to run with.
From our perspective, it would be better to separate the python library from the nextflow workflow in checkatlas. That way we could have a lightweight container for the python part, and build a "checkatlas" nextflow (sub)workflow that can be integrated in both pipelines. If necessary, conversion from Seurat to h5ad would run in a separate process with a separate container -- avoiding manual installation of R packages (mitigating issues like #24). In general, I think it is best to have nextflow as the outermost layer, to let it handle all dependencies and take advantage of its flexible resource management (local vs. hpc vs cloud).
Let me know what you think!
Cheers,
Gregor
CC @fasterius @cavenel (nf-core/spatialtranscriptomics), @fmalmeida (nf-core/scrnaseq)
The text was updated successfully, but these errors were encountered: