Add multiple regions analysis / 5R / SMURF / q2-sidle #702

d4straub · 2024-02-06T09:48:49Z

Addresses #701

Status:

Potential problem:

All processes using Sidle use container 'docker.io/d4straub/pipesidle:0.1.0-beta', a container I build with https://github.com/d4straub/pipesidle/blob/main/Dockerfile and hosted in my docker hub account. The author of the pypi package doesnt want the software in conda. So thats the way I went for now. Any suggestions welcome.
Failing linting is due to template update 2.13 (that restructures pipeline functions in folder lib, therefore postponed for now).

Ideas postponed:

optional per-region read trimming definitions (trunclenf & trunlenr)
In case there are multiple regions, in overall_summary.tsv & *.stats.tsv remove from sample the region info and aggregate to obtain numbers per-sample instead of currently per-sample-per-region.
Pipeline summary report isnt compatible to multi-region analysis, this could be improved in the future
use appropriate taxlevels (conf/ref_databases.conf see taxlevels) & output tab separated taxonomies

PR checklist

…ltiple-regions-analysis

github-actions · 2024-02-06T09:51:58Z

`nf-core lint` overall result: Failed ❌

Posted for pipeline commit 8c711a7

+| ✅ 180 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   2 tests had warnings |!
-| ❌  10 tests failed       |-

❌ Test failures:

files_exist - File must be removed: lib/Utils.groovy
files_exist - File must be removed: lib/WorkflowMain.groovy
files_exist - File must be removed: lib/NfcoreTemplate.groovy
files_exist - File must be removed: lib/WorkflowAmpliseq.groovy
files_unchanged - .github/CONTRIBUTING.md does not match the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md does not match the template
files_unchanged - .github/workflows/branch.yml does not match the template
files_unchanged - .github/workflows/linting_comment.yml does not match the template
files_unchanged - .github/workflows/linting.yml does not match the template
files_unchanged - pyproject.toml does not match the template

❗ Test warnings:

readme - README did not have a Nextflow minimum version badge.
schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config
nextflow_config - Config default ignored: params.report_template
nextflow_config - Config default ignored: params.report_css
nextflow_config - Config default ignored: params.report_logo
files_unchanged - File ignored due to lint config: .gitattributes
actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-ampliseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-ampliseq_logo_light.png
files_exist - File found: docs/images/nf-core-ampliseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-ampliseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.9.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.extension= /*_R{1,2}_001.fastq.gz
nextflow_config - Config default value correct: params.min_read_counts= 1
nextflow_config - Config default value correct: params.cutadapt_min_overlap= 3
nextflow_config - Config default value correct: params.cutadapt_max_error_rate= 0.1
nextflow_config - Config default value correct: params.trunc_qmin= 25
nextflow_config - Config default value correct: params.trunc_rmin= 0.75
nextflow_config - Config default value correct: params.max_ee= 2
nextflow_config - Config default value correct: params.min_len= 50
nextflow_config - Config default value correct: params.sample_inference= independent
nextflow_config - Config default value correct: params.vsearch_cluster_id= 0.97
nextflow_config - Config default value correct: params.orf_start= 1
nextflow_config - Config default value correct: params.stop_codons= TAA,TAG
nextflow_config - Config default value correct: params.dada_ref_taxonomy= silva=138
nextflow_config - Config default value correct: params.pplace_alnmethod= hmmer
nextflow_config - Config default value correct: params.kraken2_confidence= 0.0
nextflow_config - Config default value correct: params.cut_its= none
nextflow_config - Config default value correct: params.its_partial= 0
nextflow_config - Config default value correct: params.exclude_taxa= mitochondria,chloroplast
nextflow_config - Config default value correct: params.min_frequency= 1
nextflow_config - Config default value correct: params.min_samples= 1
nextflow_config - Config default value correct: params.diversity_rarefaction_depth= 500
nextflow_config - Config default value correct: params.ancom_sample_min_count= 1
nextflow_config - Config default value correct: params.tax_agglom_min= 2
nextflow_config - Config default value correct: params.tax_agglom_max= 6
nextflow_config - Config default value correct: params.report_title= Summary of analysis results
nextflow_config - Config default value correct: params.seed= 100
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (310 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: linting.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.13.1
Run at 2024-03-13 15:41:17

d4straub · 2024-02-23T07:23:36Z

Failing linting is due to template update 2.13 (that restructures pipeline functions in folder lib, therefore postponed for now).

d4straub · 2024-03-08T14:39:08Z

The implemented test_multiregion seems to run quite long, here are the processes with longest walltimes when I run it on my laptop:

process	min	sec
NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION	16	14
NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_TREERECON	13	21
NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT	5	37
NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBFILT	5	29

SIDLE_DBEXTRACT is repeated for each region, i.e. in this example 5 times.

Possible changes to reduce runtime:

16min: --skip_alpha_rarefaction, minimal disadvantage (channel connection not tested) --> DONE
13min+: SIDLE_TREERECON, currently cannot be omitted; disadvantage if omitted: phylogenetic tree isnt constructed, this will lead omission of downstream processes involving alpha & beta diversity --> DONE*
10min+: SIDLE_DBEXTRACT, using less regions, i.e. less repetitions of this process, potentially downstream processes faster
10min+: SIDLE_DBEXTRACT, using smaller database, i.e. greengenes 85% or such, disadvantage: minimal! --> DONE, 88% (85% failed due to no kmer matches)

*: Due to the smaller reference taxonomy database and the tiny dataset, alpha diversity plugin fails. Therefore the complete diversity calculation would need to be omitted, and therefore the tree reconstruction becomes obsolete.

edit: Now test_multiregion requires 17 minutes, which is in line with test (19min) & test_sintax (18min).

…ltiregion

erikrikarddaniel

Two tiny comments. Otherwise 👍

bin/taxref_reformat_sidle.sh

modules/local/dada2_err.nf

d4straub · 2024-03-18T12:19:41Z

Thanks!

d4straub added 2 commits February 5, 2024 16:26

move primers to meta map

117e426

Merge branch 'dev' of https://github.com/nf-core/ampliseq into add-mu…

c26cbac

…ltiple-regions-analysis

d4straub and others added 7 commits February 6, 2024 16:25

process regions with --input_mutiregion

3f16376

fix when no primers are given

45fefd4

produce per-region ASV tables and fasta

715ae05

adjust container and input

0f04676

add SIDLE workflow from d4straub/pipesidle

aa1ff5a

add sidle reference taxonomy entries & custom input

afbb18f

fix prettier

a8bc971

d4straub mentioned this pull request Feb 12, 2024

(Bio)conda jwdebelius/q2-sidle#34

Open

plugin sidle output to downstream analysis

0185e0c

--sidle_ref_taxonomy greengenes works

5b4d4d0

d4straub mentioned this pull request Mar 8, 2024

Add multiregion test data nf-core/test-datasets#1116

Merged

d4straub added 4 commits March 8, 2024 14:58

Add multiregion test

fdb8d68

update documentation and changelog

74de9fb

Fix prettier

2b72e2a

update README

1617780

d4straub added 10 commits March 8, 2024 16:16

add smaller test database

f68b18a

make sidle_ref_taxonomy entry tree_qza optional

c26a2c2

Fix prettier

6b7a5a4

update multiregion nf.test

2370c40

update multiregion.nf.test.snap

35b0ddc

correct multiregion.nf.test

5bdb033

fix sidle silva ref db

cd30e46

adjust settings based on ref db

cef0875

silva ref db works

d6a6939

cleanup

221a554

d4straub added 3 commits March 13, 2024 16:29

check incompatible params with sidle

dd3698e

fix overzealous check

8c711a7

re-arrange param documentation and rename --input_multiregion to --mu…

77e03b1

…ltiregion

d4straub mentioned this pull request Mar 18, 2024

Remove phytoref #710

Merged

11 tasks

prevent execution with conda

8575090

d4straub marked this pull request as ready for review March 18, 2024 09:24

d4straub self-assigned this Mar 18, 2024

erikrikarddaniel approved these changes Mar 18, 2024

View reviewed changes

bin/taxref_reformat_sidle.sh Outdated Show resolved Hide resolved

modules/local/dada2_err.nf Show resolved Hide resolved

remove empty lines

dc5e8a3

d4straub merged commit 6e727a4 into nf-core:dev Mar 18, 2024
16 of 17 checks passed

d4straub deleted the add-multiple-regions-analysis branch March 18, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiple regions analysis / 5R / SMURF / q2-sidle #702

Add multiple regions analysis / 5R / SMURF / q2-sidle #702

d4straub commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading

❌ Test failures:

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub commented Feb 23, 2024

d4straub commented Mar 8, 2024 •

edited

Loading

erikrikarddaniel left a comment

d4straub commented Mar 18, 2024

Add multiple regions analysis / 5R / SMURF / q2-sidle #702

Add multiple regions analysis / 5R / SMURF / q2-sidle #702

Conversation

d4straub commented Feb 6, 2024 • edited Loading

PR checklist

github-actions bot commented Feb 6, 2024 • edited Loading

nf-core lint overall result: Failed ❌

❌ Test failures:

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub commented Feb 23, 2024

d4straub commented Mar 8, 2024 • edited Loading

erikrikarddaniel left a comment

Choose a reason for hiding this comment

d4straub commented Mar 18, 2024

d4straub commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading

`nf-core lint` overall result: Failed ❌

d4straub commented Mar 8, 2024 •

edited

Loading