[Review]: Workflows with Nextflow #16

gperu · 2022-12-15T12:29:40Z

Lesson Title

Introduction to Bioinformatics workflows with Nextflow and nf-core

Lesson Repository URL

https://github.com/carpentries-incubator/workflows-nextflow

Lesson Website URL

https://carpentries-incubator.github.io/workflows-nextflow/

Lesson Description

This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Author Usernames

@ggrimes
@ameynert

Zenodo DOI

No response

Differences From Existing Lessons

No response

Confirmation of Lesson Requirements

is the original work of the author(s), or that any content derived from another source is reused with permission and appropriate attribution
aligns with The Carpentries Code of Conduct
is published under a CC-BY or CC0 license
uses The Carpentries lesson template or Carpentries Workbench without significant customisation/adaptation.

JOSE Submission Requirements

the lesson repository includes paper.md and paper.bib files as described in the JOSE submission guide for learning modules

Potential Reviewers

No response

The text was updated successfully, but these errors were encountered:

tobyhodges · 2022-12-16T12:52:39Z

Thanks for submitting this lesson to The Carpentries Lab, @gperu, @ggrimes and @ameynert. I'm excited to see it enter review!

I'll be acting as Editor on the submission. I am just about to go on leave for a few weeks, but I will work through the Editor checklist when I get back in January. When I've completed those checks, and anything they bring up has been addressed, I can begin a search for reviewers.

For now, to ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread.

You can add a badge to display the status of the review in the README of your lesson repository with the following Markdown:

[![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/16_status.svg)](https://github.com/carpentries-lab/reviews/issues/16)

ggrimes · 2022-12-19T13:11:09Z

@tobyhodges can you please pause this review as I need to update the main https://github.com/carpentries-incubator/workflows-nextflow repo with the latest code which has been developed in my personal github repository https://github.com/ggrimes/workflows-nextflow under the branch agnostric.

tobyhodges · 2023-01-09T17:06:30Z

Thanks for the heads-up, @ggrimes. Please let me know when you are ready for me to begin working through the editorial checklist.

ggrimes · 2024-01-08T08:44:47Z

I am ready to start the review process for this lesson. Please let me know the next steps

ggrimes · 2024-01-15T14:54:12Z

Potential Reviewers

Here is a list of potential reviewers, both experts in nextflow and nf-core.
Also have contributed to training materials.

@christopher-hakkaart
@mribeirodantas

tobyhodges · 2024-02-02T15:44:13Z

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

All figures are also described in image alternative text or elsewhere in the lesson body.

The first three figures in Getting Started with Nextflow are lacking alternative text descriptions. These kinds of diagrams can be difficult to capture properly in a text description, but please give it a go and feel free to ask for help here if you need it.

The purpose of alternative text is to communicate the purpose of the image to someone who cannot see it, so I recommend you focus on the key points/concepts these figures are intended to get across to learners.

[Edit: alt-text has now been added]

The lesson uses appropriate heading levels:
- h2 is used for sections within a page.
- no “jumps” are present between heading levels e.g. h2->h4.
- no page contains more than one h1 element i.e. none of the source files include first-level headings.
The contrast ratio of text in all figures is at least 4.5:1.

Content

The lesson teaches data and/or computational skills that could promote efficient, open, and reproducible research.
All exercises have solutions.
Opportunities for formative assessments are included and distributed throughout the lesson sufficiently to track learner progress. (We aim for at least one formative assessment every 10-15 minutes.)
Any data sets used in the lesson are published under a permissive open license i.e. CC0 or equivalent.

The example data is licensed CC-BY, which is not really appropriate for data
but is owned by the lesson author, so I think we can be confident there will not be any problems.

Design

Learning objectives are defined for the lesson and every episode.
The target audience of the lesson is identified specifically and in sufficient detail.

The Learner Profiles are sufficiently detailed.
@ggrimes I recommend you add a link to that page from the introductory text in index.md.

Repository

The lesson repository includes:

a CC-BY or CC0 license.
a CODE_OF_CONDUCT.md file that links to The Carpentries Code of Conduct.
a list of lesson maintainers.
tabs to display Issues and Pull Requests for the project.

Structure

Estimated times are included in every episode for teaching and completing exercises.
Episodes lengths are appropriate for the management of cognitive load throughout the lesson.

Supporting information

The lesson includes:

a list of required prior skills and/or knowledge.
setup and installation instructions.
a glossary of key terms or links out to definitions in an external glossary e.g. Glosario.

I recommend that you open issues on the lesson repositories for each point raised above, to help you maintain a view of what has been addressed as you go along. You do not need to provide a written response to all of the points raised, but please post back here when you think they have been addressed. I can then run through the checklist again and confirm that the lesson is ready to move to review. If you would like to provide additional explanation for any of the points raised, I encourage you to do so.

ggrimes · 2024-02-06T11:27:22Z

Added alt text to all images in first episode issue #106

tobyhodges · 2024-02-07T11:32:02Z

The alternative text descriptions you added are a big improvement, thanks.

I will start contacting reviewers for the lesson.

tobyhodges · 2024-02-23T13:41:53Z

@bobturneruk @HadrienG thank you both for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Introduction to Bioinformatics workflows with Nextflow and nf-core lesson?

You can read more about the lesson review process in our Reviewer Guide.

HadrienG · 2024-02-26T11:40:09Z

confirming I'm happy to review this lesson!

bobturneruk · 2024-02-26T13:10:16Z

I am, too.

tobyhodges · 2024-02-26T13:22:52Z

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.

ggrimes · 2024-02-26T13:49:33Z

Thanks @bobturneruk and @HadrienG for agreeing to review .

ggrimes · 2024-02-26T13:50:22Z

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.
@tobyhodges should I let the other potential reviewers that i no longer need them?

tobyhodges · 2024-02-26T13:52:15Z

@tobyhodges should I let the other potential reviewers that i no longer need them?

Yes, I believe we are covered at this point. @HadrienG & @bobturneruk: of course, if circumstances change and you find that you no longer have capacity to review the lesson, do let me know and I can respond accordingly.

HadrienG · 2024-03-19T13:23:32Z

DRAFT REVIEW

Great job folks! I'm well in my way into the review process, and thought I'd post this even if I'm not done, so you can eventually start addressing the first comments. I'll change the header and this comment when I'm done.

Accessibility

The alternative text of all figures is accurate and sufficiently detailed.
- In episode 12, the firgure depicting the nf-core project is insufficient. I don't think each item of the figure should be described, but a few words about the three main categories would be welcome.
- In episode 12, the alt text for the nfcore_config.png image is insufficient. Since the content of the picture is explained below in the episode, I wonder if something like "a graphical description of different config profiles explained below" is not a better alt text. Feel free to ingore me if the current alt text is in line with the carpentries' policy :-)
The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).

Content

Episode 3 - Channels

First objective: "How do I get data into Nextflow?" is misleading, we've already done that in episode 2: parametrisation

Episode 4 - Processes

process_rscript.nf: including the ShortRead package in the conda environment would let the learners run the script.
I think the "Associated script" note about python scripts could be an exercise: "move the python code block into it's own script"
I don't think it is very useful to learn about "user-specified file names", this paragraph could be removed.

Episode 6 - Workflow

Workflow definition: "[...] Therefore it’s the entry point [...]" is the only mention of "entry point". Maybe the concept needs to be defined somewhere?
collect() should be introduced first in the channels episode, instead of appearing here for the first time.

Episode 7 - Operators

The into operator is deprecated.
Overall, very good chapter!

Episode 9 - Nextflow configuration

Process selectors: there is a mention of "fully qualified name" (NFCORE_RNASEQ:RNA_SEQ:SAMTOOLS_SORT) but scoping is not mentioned or explained previously.

Design

Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
The target audience identified for the lesson is specific and realistic.

Supporting information

The list of required prior skills and/or knowledge is complete and accurate.
The setup and installation instructions are complete, accurate, and easy to follow.
No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. Glosario.

setup

The conda setup and the standalone setup do not install the same version of nextflow, which causes issues down the line. 20.10 is installed standalone, and is assumed to be used throughout the whole lesson, However, conda installs >=20.10. I would suggest to migrate the setup and all episodes to >=22.03. This removes the need for nextflow.enable.dsl=2 in the scripts, and only would require a few small changes in some episodes. DSL1 is old, deprecated and in my opinion unnecessary to bring up / explain.
It is not crystal clear that the "Nextflow install without conda" part is optional.
nf-core is missing in the environment.yml

General

Minor issues and bugs

The points tagged DSL2 are dependent on the above setup issue.

setup

Traning software:conda, broken link to environment.yml
conda env not working natively on M1/M2 macs unless roseta is installed. I guess this will fix itself at some point? or should there be a note for mac users in the setup?
data download: remove the $ sot the command can be copy/pasted more easily

getting started

Nextflow core features: broken image
Your first script: "A multi-line Nextflow comment, written using C style block comments, followed by a single line comment." There is no single line comment.
Your first script: 5. and 6. should be swapped
nextflow run word_count.nf gives 0 because the input file does not exist. Should be data/yeast/reads/ref1_1.fq.gz

workflow parametrisation

cp wc.nf wc-params.nf change wc.nf to word_count.nf
be consistent when naming files (word_count vs wf-params)

Channels

Downloading data from SRA does not require an API key for small downloads. I would remove this part it since it's so little data.

Processes

This part does not have sub-section in the left navigation menu.
DSL2: the processed should be in workflow blocks
Script: remove the ~~~ left in the script
process_multi_line.nf does not print what is indicated
Why mix the use of echo and printf in bash blocks? Pick one and be consistent
The Input Repeaters challenge is wrongly formatted

Processes part 2

tree is not available on macOS by default. Personally, I'd avoid using it

Workflows

workflow_01.nf is fastqc on the website, salmon in the scripts.

Operators

parse a csv file challenge: ~~~{: .language-groovy } present in the solution

nf-core

nf-core lunch rnaseq: outdir is also a required parameter.
DSL2: hlatyping pipeline is too old

ggrimes · 2024-03-19T14:37:45Z

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

HadrienG · 2024-03-20T07:07:05Z

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

Disclaimer/conflict of interest: I have previously published an nf-core pipeline and I'm a member of the nf-core community.

I think the nf-core episode will be helpful for a lot of people. Although the episode is not teaching nextflow per se, nf-core is a valuable resource, and researchers would benefit being aware that it exists, and learning how to launch pipelines.

Secondly, while there are other collections of nextflow pipelines out there, the community aspect of nf-core is quite unique, and I don't think including them is especially unfair to another community. What I would eventually suggest is keeping the nf-core episode but removing it from the lesson title and go with "Introduction to Bioinformatics workflows with Nextflow". You could even rename the nf-core episode "Launching publicly available pipelines" and start the episode with a non nf-core example: nextflow run carpentries/nextflow-example that would download and run a minimal nextflow piepline that you created. That way you illustrate to the leaners that they can run any nextflow pipeline they find on github, and then show nf-core as a community example that has a lot of pipelines available.

bobturneruk · 2024-03-28T11:14:09Z

Hi! I just wanted to say I am working on my review. I've got to ep 7. It's taking a while as I think it deserves to be done in some detail, as overall it seems to be such a well thought out and useful resource. Lots of specific comments to follow once I've had time to look at it all. I'm trying not to look at @HadrienG's report for now.

bobturneruk · 2024-04-09T09:31:12Z

I'm going to post my review below in a moment. I've probably made some mistakes - interpreted things wrongly or misunderstood something technical - and I'm mindful that in some places I'm asking for extra work to be done which means even more of someone's time and effort. Overall I hope it's helpful. I'm happy to discuss further via a call, Slack or on here.

Once again, my overall feeling is that this lesson will be a great help for people getting into Nextflow.

bobturneruk · 2024-04-09T09:33:26Z

Reviewer Checklist

Accessibility

The alternative text of all figures is accurate and sufficiently detailed.
- Large and/or complex figures may not be described completely in the alt text of the image and instead be described elsewhere in the main body of the episode.
The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).
Figures in episode 1 appear blurry and have very small text (Windows 11, Chrome, 100% zoom on browser).
Figures lacking detailed alternative text:

Content

Specific comments follow...

ep1

I get this output from the script, which is different to the course material. Might be good if the output number wasn't 0?

(nf-training) bobturner@tubby:~/nf-training$ nextflow run word_count.nf
N E X T F L O W  ~  version 20.10.0
Launching `word_count.nf` [boring_nobel] - revision: 72656509cb
executor >  local (1)
[14/523859] process > NUM_LINES (1) [100%] 1 of 1 ✔
SRR2584863_1.fastq.gz   0

Perhaps this helps? bobturneruk/workflows-nextflow@6be012f
I think there are some duplicate items in the list of script contents. I suggest bobturneruk/workflows-nextflow@0878421

ep2

The first exercise in ep2 has a lot of content shared directly with the preceding bit of live coding - we've already talked participants through how to add a sleep parameter, then this is repeated as an exercise. An alternative exercise might be to ask for sleep_before and sleep_after parameters?
Might it be easier on learners not to mention YAML and just stick with JSON?

ep3

Can the DSL1 reference be removed? I assume the course is DSL2 (the current DSL) specific and that people doing it are new to Nextflow, so DSL1 might not be relevant to them.
The "reminder" in this section is the first time that lists and maps are explained. file://wsl.localhost/Ubuntu/home/bobturner/workflows-nextflow/site/docs/03-channels.html#the-value-channel-factory Will learners be familiar with those concepts already from the Python or R prerequisites?
I didn't have any chicken data to run the code at the end of this section https://carpentries-incubator.github.io/workflows-nextflow/03-channels.html#the-frompath-channel-factory
I'm not clear on how a tuple is defined here. It's introduced as "a grouping of data, represented as a Groovy List" - could it be called a list? The Nextflow docs really only talk about tuples in the fromFilePairs section "The matching files are emitted as tuples". Maybe there is an important distinction, but I'm a bit confused. I see Tuples are defined here https://www.nextflow.io/docs/latest/process.html#input-type-tuple and in the lesson here https://carpentries-incubator.github.io/workflows-nextflow/05-processes-part2.html#grouped-inputs-and-outputs
I think if users are expected to code-along with the section on fromSRA they will need instructions on how to get an NCBI API key, and time to do this.

ep4

The flag -process.echo is explained before it is used here https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#process-definition - I don't think it's in earlier material?
I don't think users will be able to run the R code in this section without additional setup steps https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#script
Could PYSTUFF be called something more specific, please? Same for RSTUFF. It's better for readers if short, descriptive names are used.
Could myscript.py be called something more specific, please?
I suggest that it's always better to include Python or R scripts in separate files to facilitate automated testing.
I suspect this would be a difficult thing to do, but perhaps the Python and R material should be removed in the interests of time?
Is it necessary to introduce this alternative shell: way of using Bash in Nextflow, please? Perhaps better to stick with script:? https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#shell
Do inputs need to be wrapped in ${...} when used in a script? What is good practise?

ep5

The work directory is mentioned here for the first time, but not explained until ep10. It might be good to offer a very brief explanation of it at this point?
Is it worth making the difference between CPUs, cores and threads clear here? https://carpentries-incubator.github.io/workflows-nextflow/05-processes-part2.html#directives
Will learners be familiar with symbolic links before starting the lesson?

ep6

I get an error when running the first code block:

executor >  local (10)
[e5/e6fded] process > FASTQC (5) [100%] 9 of 9 ✔
[17/30b05f] process > MULTIQC    [100%] 1 of 1, failed: 1 ✘









Error executing process > 'MULTIQC'

Caused by:
  Process `MULTIQC` terminated with an error exit status (1)

Command executed:

  multiqc .

Command exit status:
  1

Command output:
  Searching   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 42/42  

Command error:
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'seqyclean' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module seqyclean raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'optitype' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module optitype raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [WARNING]         multiqc : No analysis results found. Cleaning up..
  [INFO   ]         multiqc : MultiQC complete

Work dir:
  /home/bobturner/nf-training/work/17/30b05f5243e6fd12c0013cc12ea0c8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

This persists with subsequent examples

ep7

Is it safe to assume that learners will be familiar with regular expressions? If not https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#regular-expression may be confusing.
There's a lot of really good content in this episode, but will all of it be relevant to new Nextflow users? https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#grouping-contents-of-a-channel-by-a-key- might be a bit complicated? Or perhaps this is something that users encounter quite early in their Nextflow journey?

ep8

Could this detail be referred to rather than included? Perhaps it is enough to say it's a unique ID? https://carpentries-incubator.github.io/workflows-nextflow/08-reporting.html#task-id
The final exercise in this episode is the same as the preceding content. Maybe best to omit the exercise, or give the markdown example and ask learners to modify the command for given HTML as an exercise?

ep9

When I first tried the Conda exercise I got this error:

Error executing process > 'FASTP (1)'

Caused by:
  Failed to create Conda environment
  command: conda create --mkdir --yes --quiet --prefix /home/bobturner/nf-training/work/conda/env-a7a3a0d820eb46bc41ebf4f72d955e5f bioconda::fastp=0.12.4-0
  status : 1
  message:
    CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic

I guess down to my Conda config. I could fix by:

conda config --set solver classic

I suggest docker.runOptions = '-u $(id -u):$(id -g)' should be explained.

ep10

I don't think learners are asked to create a file called wc.nf before they are asked to resume it in the first exercise here. It's maybe called word_count.nf earlier in the lesson?
I'm not completely clear on what "The command wrapped used to run the job." means. As far as I can tell by searching, the definition of .command.run is not in the Nextflow docs. It might be better described as "The full Nextflow bash script used to run .command.sh."?

ep11

Is it the intention that this episode re-explains lots of content from previous episodes? For example parameters and processes are covered as if new content. Perhaps there is scope to remove some of this e.g.

A process is defined by providing three main declarations:

The process inputs,

The process outputs

Finally the command script.

I suggest the "Recap" sections are reviewed - they may be useful but they introduce a different structure to ep11 compared with earlier episodes. They also often imply that content previously covered is first covered in ep11.
"The FASTQC process will not run as the process has not been declared in the workflow scope." - is this the same definition of "scope" as used earlier? Is there a workflow scope as well as a params scope and an aws scope?
I get this error when running script6:

Error executing process > 'MULTIQC'

Caused by:
  Process `MULTIQC` terminated with an error exit status (1)

Command executed:

  multiqc .

Command exit status:
  1

Command output:
  Searching   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 141/141  

Command error:
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'seqyclean' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module seqyclean raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'optitype' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module optitype raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [WARNING]         multiqc : No analysis results found. Cleaning up..
  [INFO   ]         multiqc : MultiQC complete

Work dir:
  /home/bobturner/nf-training/work/13/2a6ef5a3e567c3f56e41d67eefe205

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Ternary operators should be explained more fully if the term is to be used.
Graphviz installation is not part of the setup instructions, but is needed for the last exercise.
I suggest the key points are reviewed to better match the Questions and Objectives of the lesson. They currently contain points covered earlier e.g. "A Workflow can be parameterise using params .".
I don't think this episode introduces very much new content. An option would be to separate out the new content (e.g. "Metrics and Reports") and set up the episode as an extended exercise for learners - end the content prior to https://carpentries-incubator.github.io/workflows-nextflow/11-Simple_Rna-Seq_pipeline.html#define-the-pipeline-parameters and set an exercise to produce the code in script6.nf? Or maybe the extensive recap is intentional.

ep12

Some of this might be really specific to my setup...

I get an error when pulling rnaseq:

nextflow pull nf-core/rnaseq -revision 3.0
Checking nf-core/rnaseq ...
Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.id() is applicable for argument types: (String) values: [[email protected]]
Possible solutions: is(java.lang.Object), is(java.lang.Object), find(), find(), find(groovy.lang.Closure), find(groovy.lang.Closure)

Seems this issue may be related nextflow pull not working if nextflow.config is a symbolic link nextflow-io/nextflow#888 - I tried making a clean directory to work in. Perhaps it's something to do with installing Nextflow via Conda? It works with my system Nextflow.
Should learners be using the web based launch tool? I guess not.
It might be useful to explain the relationship between a "profile" and a "scope".
I get this error in the final exercise:

nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda  --max_memory 3G -c nfcore-custom.config
N E X T F L O W  ~  version 23.04.2
Pulling nf-core/hlatyping ...
 downloaded from https://github.com/nf-core/hlatyping.git
Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier

nextflow run nf-core/hlatyping -r 2.0.0 -profile test,conda --max_memory '3.GB' --outdir out -c nfcore-custom.config may be better, but needs Conda and I can't run the nf-core commands in Conda.
gitter chat is retired. Google Groups might not still be useful. I think these should be replaced with a link to the Nextflow Slack.

Design

Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
The target audience identified for the lesson is specific and realistic.
I'm not sure where the target audience is defined. I think if it's something like "Biological and Biomedical Scientists, Research Software Engineers and associated professionals" it would be great.

Supporting information

The list of required prior skills and/or knowledge is complete and accurate.
The setup and installation instructions are complete, accurate, and easy to follow.
No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. Glosario.
I think it would be helpful to make it clearer at the beginning of the instructions that this won't work on Windows, or if the participant has Windows, they'll need to use WSL2. Maybe linking to this would be helpful https://learn.microsoft.com/en-us/windows/wsl/install or perhaps that's too involved.
The version of NextFlow specified is two major versions behind the current version (23, on 2024-03-22). Could a more recent version be used?
Python 3.8, listed as a requirement, will be unsupported after quarter 3 of 2024 (https://devguide.python.org/versions/). Perhaps a more recent version could be used?
The supplied environment.yml file (https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml) does not specify a version of Python. If Python version is important, then it should.
For installing Conda, perhaps it would be best to link to the official instructions https://docs.anaconda.com/free/miniconda/miniconda-install/ ?
I suggest re-ordering the instructions and making it clear how the environment.yml file should be obtained like this bobturneruk/workflows-nextflow@8b2a7af
I suggest if VSCode it what's best to use for the lesson, it is made a requirement, not a recommendation. This will mean greater consistency in learner experience.
I suggest removing the instructions to install Nextflow without Conda. Again, I think consistency of setup will lead to less variation in issues experienced by learners.
Installing with Conda went very smoothly for me!
The Glossary only has 3 terms. I think it would benefit by being expanded.

General

Reviewed with Ubuntu 22.04 on WSL2.
I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

ggrimes · 2024-04-23T10:58:49Z

Thank you @HadrienG and @bobturneruk for your reviews. Do you have any suggestions about how I should reply to individual review comments?

Thanks,

bobturneruk · 2024-04-23T11:39:41Z

Hi @ggrimes! @tobyhodges - how is this normally handled, please? I don't think GitHub makes this easy. Maybe we could break things down into issues against https://github.com/carpentries-incubator/workflows-nextflow/issues ?

tobyhodges · 2024-04-23T12:51:07Z

Thanks @bobturneruk and @HadrienG for your detailed reviews. @ggrimes you can choose how to respond to individual comments, based on your personal preference. In the past, other lesson developers have found it helpful to open issues to track the improvements suggested in reviews. If you do that, please link to the issues in this thread when you are ready to respond to the reviews: I would like to make it easy for visitors to the thread to find everything related to the review.

Responding to a few points from @bobturneruk:

Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.

CC-BY is not an appropriate license for data and if you are able to adjust the license on the FigShare record to use CC0 I recommend to do so (bonus points if you add a CFF file as well), but the mistake is common enough and harmless enough that I am generally happy to let it pass if needed. I think the technically correct thing to do with data included in a lesson repository would be to mention in LICENSE.md that the data is available CC0. But honestly we have data files in lesson repositories elsewhere that we are not doing this for, and I would certainly be willing to let this go for the purposes of the review.

I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?

&

I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

As @bobturneruk predicted, GROOVY labels are added to the code blocks because syntax highlighting is being applied by specifying that blocks are groovy in the lesson source. Unfortunately, until a separate specification for nextflow is added to https://github.com/jgm/skylighting this is the best we can get. I agree that a note explaining this early in the lesson would be helpful to anyone following along on their own.

gperu assigned tobyhodges Dec 15, 2022

tobyhodges added the 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers label Dec 16, 2022

tobyhodges removed the 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers label Jan 9, 2023

tobyhodges added the paused Review has been temporarily paused at the request of the author(s) label Jan 9, 2023

tobyhodges added 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson and removed 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers labels Feb 7, 2024

tobyhodges added 3/reviewer(s)-assigned Reviewers have been assigned; review in progress and removed 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson labels Feb 26, 2024

ggrimes mentioned this issue Apr 23, 2024

Nextflow review @bobturneruk ep1 carpentries-incubator/workflows-nextflow#113

Open

tobyhodges added 4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s) and removed 3/reviewer(s)-assigned Reviewers have been assigned; review in progress labels Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Review]: Workflows with Nextflow #16

[Review]: Workflows with Nextflow #16

gperu commented Dec 15, 2022

tobyhodges commented Dec 16, 2022

ggrimes commented Dec 19, 2022

tobyhodges commented Jan 9, 2023

ggrimes commented Jan 8, 2024

ggrimes commented Jan 15, 2024

tobyhodges commented Feb 2, 2024 •

edited

Loading

ggrimes commented Feb 6, 2024

tobyhodges commented Feb 7, 2024

tobyhodges commented Feb 23, 2024

HadrienG commented Feb 26, 2024

bobturneruk commented Feb 26, 2024

tobyhodges commented Feb 26, 2024

ggrimes commented Feb 26, 2024

ggrimes commented Feb 26, 2024

tobyhodges commented Feb 26, 2024 •

edited

Loading

HadrienG commented Mar 19, 2024

ggrimes commented Mar 19, 2024

HadrienG commented Mar 20, 2024

bobturneruk commented Mar 28, 2024

bobturneruk commented Apr 9, 2024

bobturneruk commented Apr 9, 2024

ggrimes commented Apr 23, 2024

bobturneruk commented Apr 23, 2024

tobyhodges commented Apr 23, 2024

[Review]: Workflows with Nextflow #16

[Review]: Workflows with Nextflow #16

Comments

gperu commented Dec 15, 2022

Lesson Title

Lesson Repository URL

Lesson Website URL

Lesson Description

Author Usernames

Zenodo DOI

Differences From Existing Lessons

Confirmation of Lesson Requirements

JOSE Submission Requirements

Potential Reviewers

tobyhodges commented Dec 16, 2022

ggrimes commented Dec 19, 2022

tobyhodges commented Jan 9, 2023

ggrimes commented Jan 8, 2024

ggrimes commented Jan 15, 2024

tobyhodges commented Feb 2, 2024 • edited Loading

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

Content

Design

Repository

Structure

Supporting information

ggrimes commented Feb 6, 2024

tobyhodges commented Feb 7, 2024

tobyhodges commented Feb 23, 2024

HadrienG commented Feb 26, 2024

bobturneruk commented Feb 26, 2024

tobyhodges commented Feb 26, 2024

ggrimes commented Feb 26, 2024

ggrimes commented Feb 26, 2024

tobyhodges commented Feb 26, 2024 • edited Loading

HadrienG commented Mar 19, 2024

DRAFT REVIEW

Accessibility

Content

Episode 3 - Channels

Episode 4 - Processes

Episode 6 - Workflow

Episode 7 - Operators

Episode 9 - Nextflow configuration

Design

Supporting information

setup

General

Minor issues and bugs

setup

getting started

workflow parametrisation

Channels

Processes

Processes part 2

Workflows

Operators

nf-core

ggrimes commented Mar 19, 2024

HadrienG commented Mar 20, 2024

bobturneruk commented Mar 28, 2024

bobturneruk commented Apr 9, 2024

bobturneruk commented Apr 9, 2024

Reviewer Checklist

Accessibility

Content

ep1

ep2

ep3

ep4

ep5

ep6

ep7

ep8

ep9

ep10

ep11

ep12

Design

tobyhodges commented Feb 2, 2024 •

edited

Loading

tobyhodges commented Feb 26, 2024 •

edited

Loading