Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review]: Workflows with Nextflow #16

Open
4 of 5 tasks
gperu opened this issue Dec 15, 2022 · 24 comments
Open
4 of 5 tasks

[Review]: Workflows with Nextflow #16

gperu opened this issue Dec 15, 2022 · 24 comments
Assignees
Labels
4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s)

Comments

@gperu
Copy link

gperu commented Dec 15, 2022

Lesson Title

Introduction to Bioinformatics workflows with Nextflow and nf-core

Lesson Repository URL

https://github.com/carpentries-incubator/workflows-nextflow

Lesson Website URL

https://carpentries-incubator.github.io/workflows-nextflow/

Lesson Description

This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Author Usernames

@ggrimes
@ameynert

Zenodo DOI

No response

Differences From Existing Lessons

No response

Confirmation of Lesson Requirements

JOSE Submission Requirements

Potential Reviewers

No response

@tobyhodges
Copy link
Member

Thanks for submitting this lesson to The Carpentries Lab, @gperu, @ggrimes and @ameynert. I'm excited to see it enter review!

I'll be acting as Editor on the submission. I am just about to go on leave for a few weeks, but I will work through the Editor checklist when I get back in January. When I've completed those checks, and anything they bring up has been addressed, I can begin a search for reviewers.

For now, to ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread.

You can add a badge to display the status of the review in the README of your lesson repository with the following Markdown:

[![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/16_status.svg)](https://github.com/carpentries-lab/reviews/issues/16)

@tobyhodges tobyhodges added the 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers label Dec 16, 2022
@ggrimes
Copy link

ggrimes commented Dec 19, 2022

@tobyhodges can you please pause this review as I need to update the main https://github.com/carpentries-incubator/workflows-nextflow repo with the latest code which has been developed in my personal github repository https://github.com/ggrimes/workflows-nextflow under the branch agnostric.

@tobyhodges tobyhodges removed the 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers label Jan 9, 2023
@tobyhodges
Copy link
Member

Thanks for the heads-up, @ggrimes. Please let me know when you are ready for me to begin working through the editorial checklist.

@tobyhodges tobyhodges added the paused Review has been temporarily paused at the request of the author(s) label Jan 9, 2023
@ggrimes
Copy link

ggrimes commented Jan 8, 2024

I am ready to start the review process for this lesson. Please let me know the next steps

@ggrimes
Copy link

ggrimes commented Jan 15, 2024

Potential Reviewers

Here is a list of potential reviewers, both experts in nextflow and nf-core.
Also have contributed to training materials.

@christopher-hakkaart
@mribeirodantas

@tobyhodges tobyhodges added 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers and removed paused Review has been temporarily paused at the request of the author(s) 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson labels Feb 2, 2024
@tobyhodges
Copy link
Member

tobyhodges commented Feb 2, 2024

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

  • All figures are also described in image alternative text or elsewhere in the lesson body.

The first three figures in Getting Started with Nextflow are lacking alternative text descriptions. These kinds of diagrams can be difficult to capture properly in a text description, but please give it a go and feel free to ask for help here if you need it.

The purpose of alternative text is to communicate the purpose of the image to someone who cannot see it, so I recommend you focus on the key points/concepts these figures are intended to get across to learners.

[Edit: alt-text has now been added]

  • The lesson uses appropriate heading levels:
    • h2 is used for sections within a page.
    • no “jumps” are present between heading levels e.g. h2->h4.
    • no page contains more than one h1 element i.e. none of the source files include first-level headings.
  • The contrast ratio of text in all figures is at least 4.5:1.

Content

  • The lesson teaches data and/or computational skills that could promote efficient, open, and reproducible research.
  • All exercises have solutions.
  • Opportunities for formative assessments are included and distributed throughout the lesson sufficiently to track learner progress. (We aim for at least one formative assessment every 10-15 minutes.)
  • Any data sets used in the lesson are published under a permissive open license i.e. CC0 or equivalent.

The example data is licensed CC-BY, which is not really appropriate for data
but is owned by the lesson author, so I think we can be confident there will not be any problems.

Design

  • Learning objectives are defined for the lesson and every episode.
  • The target audience of the lesson is identified specifically and in sufficient detail.

The Learner Profiles are sufficiently detailed.
@ggrimes I recommend you add a link to that page from the introductory text in index.md.

Repository

The lesson repository includes:

  • a CC-BY or CC0 license.
  • a CODE_OF_CONDUCT.md file that links to The Carpentries Code of Conduct.
  • a list of lesson maintainers.
  • tabs to display Issues and Pull Requests for the project.

Structure

  • Estimated times are included in every episode for teaching and completing exercises.
  • Episodes lengths are appropriate for the management of cognitive load throughout the lesson.

Supporting information

The lesson includes:

  • a list of required prior skills and/or knowledge.
  • setup and installation instructions.
  • a glossary of key terms or links out to definitions in an external glossary e.g. Glosario.

I recommend that you open issues on the lesson repositories for each point raised above, to help you maintain a view of what has been addressed as you go along. You do not need to provide a written response to all of the points raised, but please post back here when you think they have been addressed. I can then run through the checklist again and confirm that the lesson is ready to move to review. If you would like to provide additional explanation for any of the points raised, I encourage you to do so.

@ggrimes
Copy link

ggrimes commented Feb 6, 2024

Added alt text to all images in first episode issue #106

@tobyhodges
Copy link
Member

The alternative text descriptions you added are a big improvement, thanks.

I will start contacting reviewers for the lesson.

@tobyhodges tobyhodges added 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson and removed 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers labels Feb 7, 2024
@tobyhodges
Copy link
Member

@bobturneruk @HadrienG thank you both for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Introduction to Bioinformatics workflows with Nextflow and nf-core lesson?

You can read more about the lesson review process in our Reviewer Guide.

@HadrienG
Copy link

confirming I'm happy to review this lesson!

@bobturneruk
Copy link

I am, too.

@tobyhodges tobyhodges added 3/reviewer(s)-assigned Reviewers have been assigned; review in progress and removed 2/seeking-reviewers Editor is looking for reviewers to assign to this lesson labels Feb 26, 2024
@tobyhodges
Copy link
Member

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.

@ggrimes
Copy link

ggrimes commented Feb 26, 2024

Thanks @bobturneruk and @HadrienG for agreeing to review .

@ggrimes
Copy link

ggrimes commented Feb 26, 2024

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.
@tobyhodges should I let the other potential reviewers that i no longer need them?

@tobyhodges
Copy link
Member

tobyhodges commented Feb 26, 2024

@tobyhodges should I let the other potential reviewers that i no longer need them?

Yes, I believe we are covered at this point. @HadrienG & @bobturneruk: of course, if circumstances change and you find that you no longer have capacity to review the lesson, do let me know and I can respond accordingly.

@HadrienG
Copy link

DRAFT REVIEW

Great job folks! I'm well in my way into the review process, and thought I'd post this even if I'm not done, so you can eventually start addressing the first comments. I'll change the header and this comment when I'm done.

Accessibility

  • The alternative text of all figures is accurate and sufficiently detailed.
    • In episode 12, the firgure depicting the nf-core project is insufficient. I don't think each item of the figure should be described, but a few words about the three main categories would be welcome.
    • In episode 12, the alt text for the nfcore_config.png image is insufficient. Since the content of the picture is explained below in the episode, I wonder if something like "a graphical description of different config profiles explained below" is not a better alt text. Feel free to ingore me if the current alt text is in line with the carpentries' policy :-)
  • The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
  • The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).

Content

  • The lesson content:
    • conforms to The Carpentries Code of Conduct.
    • meets the objectives defined by the authors.
    • is appropriate for the target audience identified for the lesson.
    • is accurate.
    • is descriptive and easy to understand.
    • is appropriately structured to manage cognitive load.
    • does not use dismissive language.
  • Tools used in the lesson are open source or, where tools used are closed source/proprietary, there is a good reason for this e.g. no open source alternatives are available or widely-used in the lesson domain.
  • Any example data sets used in the lesson are accessible, well-described, available under a CC0 license, and representative of data typically encountered in the domain.
  • The lesson does not make use of superfluous data sets, e.g. increasing cognitive load for learners by introducing a new data set instead of reusing another that is already present in the lesson.
  • The example tasks and narrative of the lesson are appropriate and realistic.
  • The solutions to all exercises are accurate and sufficiently explained.
  • The lesson includes exercises in a variety of formats.
  • Exercise tasks and formats are appropriate for the expected experience level of the target audience.
  • All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment.
  • Exercises are designed with diagnostic power.

Episode 3 - Channels

  • First objective: "How do I get data into Nextflow?" is misleading, we've already done that in episode 2: parametrisation

Episode 4 - Processes

  • process_rscript.nf: including the ShortRead package in the conda environment would let the learners run the script.
  • I think the "Associated script" note about python scripts could be an exercise: "move the python code block into it's own script"
  • I don't think it is very useful to learn about "user-specified file names", this paragraph could be removed.

Episode 6 - Workflow

  • Workflow definition: "[...] Therefore it’s the entry point [...]" is the only mention of "entry point". Maybe the concept needs to be defined somewhere?
  • collect() should be introduced first in the channels episode, instead of appearing here for the first time.

Episode 7 - Operators

  • The into operator is deprecated.
  • Overall, very good chapter!

Episode 9 - Nextflow configuration

  • Process selectors: there is a mention of "fully qualified name" (NFCORE_RNASEQ:RNA_SEQ:SAMTOOLS_SORT) but scoping is not mentioned or explained previously.

Design

  • Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
  • The target audience identified for the lesson is specific and realistic.

Supporting information

  • The list of required prior skills and/or knowledge is complete and accurate.
  • The setup and installation instructions are complete, accurate, and easy to follow.
  • No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. Glosario.

setup

  • The conda setup and the standalone setup do not install the same version of nextflow, which causes issues down the line. 20.10 is installed standalone, and is assumed to be used throughout the whole lesson, However, conda installs >=20.10. I would suggest to migrate the setup and all episodes to >=22.03. This removes the need for nextflow.enable.dsl=2 in the scripts, and only would require a few small changes in some episodes. DSL1 is old, deprecated and in my opinion unnecessary to bring up / explain.

  • It is not crystal clear that the "Nextflow install without conda" part is optional.

  • nf-core is missing in the environment.yml

General

Minor issues and bugs

The points tagged DSL2 are dependent on the above setup issue.

setup
  • Traning software:conda, broken link to environment.yml
  • conda env not working natively on M1/M2 macs unless roseta is installed. I guess this will fix itself at some point? or should there be a note for mac users in the setup?
  • data download: remove the $ sot the command can be copy/pasted more easily
getting started
  • Nextflow core features: broken image
  • Your first script: "A multi-line Nextflow comment, written using C style block comments, followed by a single line comment." There is no single line comment.
  • Your first script: 5. and 6. should be swapped
  • nextflow run word_count.nf gives 0 because the input file does not exist. Should be data/yeast/reads/ref1_1.fq.gz
workflow parametrisation
  •  cp wc.nf wc-params.nf change wc.nf to word_count.nf
  • be consistent when naming files (word_count vs wf-params)
Channels
  • Downloading data from SRA does not require an API key for small downloads. I would remove this part it since it's so little data.
Processes
  • This part does not have sub-section in the left navigation menu.
  • DSL2: the processed should be in workflow blocks
  • Script: remove the ~~~ left in the script
  • process_multi_line.nf does not print what is indicated
  • Why mix the use of echo and printf in bash blocks? Pick one and be consistent
  • The Input Repeaters challenge is wrongly formatted
Processes part 2
  •  tree is not available on macOS by default. Personally, I'd avoid using it
Workflows
  •  workflow_01.nf is fastqc on the website, salmon in the scripts.
Operators
  • parse a csv file challenge: ~~~{: .language-groovy } present in the solution
nf-core
  • nf-core lunch rnaseq: outdir is also a required parameter.
  • DSL2: hlatyping pipeline is too old

@ggrimes
Copy link

ggrimes commented Mar 19, 2024

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

@HadrienG
Copy link

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

Disclaimer/conflict of interest: I have previously published an nf-core pipeline and I'm a member of the nf-core community.

I think the nf-core episode will be helpful for a lot of people. Although the episode is not teaching nextflow per se, nf-core is a valuable resource, and researchers would benefit being aware that it exists, and learning how to launch pipelines.

Secondly, while there are other collections of nextflow pipelines out there, the community aspect of nf-core is quite unique, and I don't think including them is especially unfair to another community. What I would eventually suggest is keeping the nf-core episode but removing it from the lesson title and go with "Introduction to Bioinformatics workflows with Nextflow". You could even rename the nf-core episode "Launching publicly available pipelines" and start the episode with a non nf-core example: nextflow run carpentries/nextflow-example that would download and run a minimal nextflow piepline that you created. That way you illustrate to the leaners that they can run any nextflow pipeline they find on github, and then show nf-core as a community example that has a lot of pipelines available.

@bobturneruk
Copy link

Hi! I just wanted to say I am working on my review. I've got to ep 7. It's taking a while as I think it deserves to be done in some detail, as overall it seems to be such a well thought out and useful resource. Lots of specific comments to follow once I've had time to look at it all. I'm trying not to look at @HadrienG's report for now.

@bobturneruk
Copy link

I'm going to post my review below in a moment. I've probably made some mistakes - interpreted things wrongly or misunderstood something technical - and I'm mindful that in some places I'm asking for extra work to be done which means even more of someone's time and effort. Overall I hope it's helpful. I'm happy to discuss further via a call, Slack or on here.

Once again, my overall feeling is that this lesson will be a great help for people getting into Nextflow.

@bobturneruk
Copy link

Reviewer Checklist

Accessibility

Content

  • The lesson content:

    • conforms to The Carpentries Code of Conduct.
    • meets the objectives defined by the authors.
    • is appropriate for the target audience identified for the lesson.
    • is accurate.
    • is descriptive and easy to understand.
    • is appropriately structured to manage cognitive load.
    • does not use dismissive language.
  • Tools used in the lesson are open source or, where tools used are closed source/proprietary, there is a good reason for this e.g. no open source alternatives are available or widely-used in the lesson domain.

  • Any example data sets used in the lesson are accessible, well-described, available under a CC0 license, and representative of data typically encountered in the domain.

  • The lesson does not make use of superfluous data sets, e.g. increasing cognitive load for learners by introducing a new data set instead of reusing another that is already present in the lesson.

  • The example tasks and narrative of the lesson are appropriate and realistic.

  • The solutions to all exercises are accurate and sufficiently explained.

  • The lesson includes exercises in a variety of formats.

  • Exercise tasks and formats are appropriate for the expected experience level of the target audience.

  • All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment.

  • Exercises are designed with diagnostic power.

  • Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.

  • I don't have the expertise to comment on whether exercises are designed with diagnostic power.

  • I've made a PR with several minor comments which the maintainers might accept or reject individually.

Specific comments follow...

ep1

  • I get this output from the script, which is different to the course material. Might be good if the output number wasn't 0?
(nf-training) bobturner@tubby:~/nf-training$ nextflow run word_count.nf
N E X T F L O W  ~  version 20.10.0
Launching `word_count.nf` [boring_nobel] - revision: 72656509cb
executor >  local (1)
[14/523859] process > NUM_LINES (1) [100%] 1 of 1 ✔
SRR2584863_1.fastq.gz   0

ep2

  • The first exercise in ep2 has a lot of content shared directly with the preceding bit of live coding - we've already talked participants through how to add a sleep parameter, then this is repeated as an exercise. An alternative exercise might be to ask for sleep_before and sleep_after parameters?
  • Might it be easier on learners not to mention YAML and just stick with JSON?

ep3

ep4

ep5

ep6

  • I get an error when running the first code block:
executor >  local (10)
[e5/e6fded] process > FASTQC (5) [100%] 9 of 9 ✔
[17/30b05f] process > MULTIQC    [100%] 1 of 1, failed: 1 ✘









Error executing process > 'MULTIQC'

Caused by:
  Process `MULTIQC` terminated with an error exit status (1)

Command executed:

  multiqc .

Command exit status:
  1

Command output:
  Searching   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 42/42  

Command error:
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'seqyclean' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module seqyclean raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'optitype' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module optitype raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [WARNING]         multiqc : No analysis results found. Cleaning up..
  [INFO   ]         multiqc : MultiQC complete

Work dir:
  /home/bobturner/nf-training/work/17/30b05f5243e6fd12c0013cc12ea0c8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
  • This persists with subsequent examples

ep7

ep8

ep9

When I first tried the Conda exercise I got this error:

Error executing process > 'FASTP (1)'

Caused by:
  Failed to create Conda environment
  command: conda create --mkdir --yes --quiet --prefix /home/bobturner/nf-training/work/conda/env-a7a3a0d820eb46bc41ebf4f72d955e5f bioconda::fastp=0.12.4-0
  status : 1
  message:
    CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic

I guess down to my Conda config. I could fix by:

conda config --set solver classic
  • I suggest docker.runOptions = '-u $(id -u):$(id -g)' should be explained.

ep10

  • I don't think learners are asked to create a file called wc.nf before they are asked to resume it in the first exercise here. It's maybe called word_count.nf earlier in the lesson?
  • I'm not completely clear on what "The command wrapped used to run the job." means. As far as I can tell by searching, the definition of .command.run is not in the Nextflow docs. It might be better described as "The full Nextflow bash script used to run .command.sh."?

ep11

  • Is it the intention that this episode re-explains lots of content from previous episodes? For example parameters and processes are covered as if new content. Perhaps there is scope to remove some of this e.g.

A process is defined by providing three main declarations:

  1. The process inputs,
  2. The process outputs
  3. Finally the command script.
  • I suggest the "Recap" sections are reviewed - they may be useful but they introduce a different structure to ep11 compared with earlier episodes. They also often imply that content previously covered is first covered in ep11.
  • "The FASTQC process will not run as the process has not been declared in the workflow scope." - is this the same definition of "scope" as used earlier? Is there a workflow scope as well as a params scope and an aws scope?
  • I get this error when running script6:
Error executing process > 'MULTIQC'

Caused by:
  Process `MULTIQC` terminated with an error exit status (1)

Command executed:

  multiqc .

Command exit status:
  1

Command output:
  Searching   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 141/141  

Command error:
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'seqyclean' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module seqyclean raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [ERROR  ]         multiqc : Oops! The 'optitype' MultiQC module broke... 
    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
    If possible, please include a log file that triggers the error - the last file found was:
      None
  ============================================================
  Module optitype raised an exception: Traceback (most recent call last):
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run
      output = mod()
               ^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in __init__
      super(MultiqcModule, self).__init__(
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in __init__
      config.update({anchor: mod_cust_config.get("custom_config", {})})
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update
      return update_dict(globals(), u)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict
      if isinstance(val, collections.Mapping):
                         ^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'collections' has no attribute 'Mapping'
  ============================================================
  [WARNING]         multiqc : No analysis results found. Cleaning up..
  [INFO   ]         multiqc : MultiQC complete

Work dir:
  /home/bobturner/nf-training/work/13/2a6ef5a3e567c3f56e41d67eefe205

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
  • Ternary operators should be explained more fully if the term is to be used.
  • Graphviz installation is not part of the setup instructions, but is needed for the last exercise.
  • I suggest the key points are reviewed to better match the Questions and Objectives of the lesson. They currently contain points covered earlier e.g. "A Workflow can be parameterise using params .".
  • I don't think this episode introduces very much new content. An option would be to separate out the new content (e.g. "Metrics and Reports") and set up the episode as an extended exercise for learners - end the content prior to https://carpentries-incubator.github.io/workflows-nextflow/11-Simple_Rna-Seq_pipeline.html#define-the-pipeline-parameters and set an exercise to produce the code in script6.nf? Or maybe the extensive recap is intentional.

ep12

Some of this might be really specific to my setup...

  • I get an error when pulling rnaseq:
nextflow pull nf-core/rnaseq -revision 3.0
Checking nf-core/rnaseq ...
Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.id() is applicable for argument types: (String) values: [[email protected]]
Possible solutions: is(java.lang.Object), is(java.lang.Object), find(), find(), find(groovy.lang.Closure), find(groovy.lang.Closure)
  • Seems this issue may be related nextflow pull not working if nextflow.config is a symbolic link nextflow-io/nextflow#888 - I tried making a clean directory to work in. Perhaps it's something to do with installing Nextflow via Conda? It works with my system Nextflow.
  • Should learners be using the web based launch tool? I guess not.
  • It might be useful to explain the relationship between a "profile" and a "scope".
  • I get this error in the final exercise:
nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda  --max_memory 3G -c nfcore-custom.config
N E X T F L O W  ~  version 23.04.2
Pulling nf-core/hlatyping ...
 downloaded from https://github.com/nf-core/hlatyping.git
Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier
  • nextflow run nf-core/hlatyping -r 2.0.0 -profile test,conda --max_memory '3.GB' --outdir out -c nfcore-custom.config may be better, but needs Conda and I can't run the nf-core commands in Conda.
  • gitter chat is retired. Google Groups might not still be useful. I think these should be replaced with a link to the Nextflow Slack.

Design

  • Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”

  • The target audience identified for the lesson is specific and realistic.

  • I'm not sure where the target audience is defined. I think if it's something like "Biological and Biomedical Scientists, Research Software Engineers and associated professionals" it would be great.

Supporting information

  • The list of required prior skills and/or knowledge is complete and accurate.

  • The setup and installation instructions are complete, accurate, and easy to follow.

  • No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. Glosario.

  • I think it would be helpful to make it clearer at the beginning of the instructions that this won't work on Windows, or if the participant has Windows, they'll need to use WSL2. Maybe linking to this would be helpful https://learn.microsoft.com/en-us/windows/wsl/install or perhaps that's too involved.

  • The version of NextFlow specified is two major versions behind the current version (23, on 2024-03-22). Could a more recent version be used?

  • Python 3.8, listed as a requirement, will be unsupported after quarter 3 of 2024 (https://devguide.python.org/versions/). Perhaps a more recent version could be used?

  • The supplied environment.yml file (https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml) does not specify a version of Python. If Python version is important, then it should.

  • For installing Conda, perhaps it would be best to link to the official instructions https://docs.anaconda.com/free/miniconda/miniconda-install/ ?

  • I suggest re-ordering the instructions and making it clear how the environment.yml file should be obtained like this bobturneruk/workflows-nextflow@8b2a7af

  • I suggest if VSCode it what's best to use for the lesson, it is made a requirement, not a recommendation. This will mean greater consistency in learner experience.

  • I suggest removing the instructions to install Nextflow without Conda. Again, I think consistency of setup will lead to less variation in issues experienced by learners.

  • Installing with Conda went very smoothly for me!

  • The Glossary only has 3 terms. I think it would benefit by being expanded.

General

  • Reviewed with Ubuntu 22.04 on WSL2.
  • I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
  • I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

@ggrimes
Copy link

ggrimes commented Apr 23, 2024

Thank you @HadrienG and @bobturneruk for your reviews. Do you have any suggestions about how I should reply to individual review comments?

Thanks,

@bobturneruk
Copy link

Hi @ggrimes! @tobyhodges - how is this normally handled, please? I don't think GitHub makes this easy. Maybe we could break things down into issues against https://github.com/carpentries-incubator/workflows-nextflow/issues ?

@tobyhodges
Copy link
Member

Thanks @bobturneruk and @HadrienG for your detailed reviews. @ggrimes you can choose how to respond to individual comments, based on your personal preference. In the past, other lesson developers have found it helpful to open issues to track the improvements suggested in reviews. If you do that, please link to the issues in this thread when you are ready to respond to the reviews: I would like to make it easy for visitors to the thread to find everything related to the review.

Responding to a few points from @bobturneruk:

Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.

CC-BY is not an appropriate license for data and if you are able to adjust the license on the FigShare record to use CC0 I recommend to do so (bonus points if you add a CFF file as well), but the mistake is common enough and harmless enough that I am generally happy to let it pass if needed. I think the technically correct thing to do with data included in a lesson repository would be to mention in LICENSE.md that the data is available CC0. But honestly we have data files in lesson repositories elsewhere that we are not doing this for, and I would certainly be willing to let this go for the purposes of the review.

I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?

&

I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

As @bobturneruk predicted, GROOVY labels are added to the code blocks because syntax highlighting is being applied by specifying that blocks are groovy in the lesson source. Unfortunately, until a separate specification for nextflow is added to https://github.com/jgm/skylighting this is the best we can get. I agree that a note explaining this early in the lesson would be helpful to anyone following along on their own.

@tobyhodges tobyhodges added 4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s) and removed 3/reviewer(s)-assigned Reviewers have been assigned; review in progress labels Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s)
Projects
None yet
Development

No branches or pull requests

5 participants