diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index cd46aca6..12ccbdca 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,9 +17,10 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-cmgg/stru - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-cmgg/structural/tree/master/.github/CONTRIBUTING.md) - [ ] Make sure your code lints (`nf-core lint`). -- [ ] Ensure the test suite passes (`nf-test test main.nf.test -profile test,docker`). +- [ ] Ensure the test suite passes (`nf-test test`). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. +- [ ] Parameters Documentation is updated with `nf-core schema docs --format markdown --output docs/parameters.md --force` - [ ] `CHANGELOG.md` is updated. - [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.nf-core.yml b/.nf-core.yml index 4204f64b..81e1d16d 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,5 +1,7 @@ lint: files_exist: + - CITATIONS.md + - docs/README.md - CODE_OF_CONDUCT.md - .github/ISSUE_TEMPLATE/config.yml - .github/workflows/awstest.yml @@ -12,6 +14,7 @@ lint: - manifest.homePage files_unchanged: - LICENSE + - .github/PULL_REQUEST_TEMPLATE.md - .github/CONTRIBUTING.md - .github/ISSUE_TEMPLATE/bug_report.yml - .github/workflows/linting.yml @@ -22,6 +25,6 @@ lint: repository_type: pipeline template: author: nvnieuwk - description: A nextflow pipeline for calling structural variants + description: A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads. name: structural prefix: nf-cmgg diff --git a/CITATIONS.md b/CITATIONS.md deleted file mode 100644 index 8da89510..00000000 --- a/CITATIONS.md +++ /dev/null @@ -1,41 +0,0 @@ -# nf-cmgg/structural: Citations - -## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) - -> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. - -## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) - -> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. - -## Pipeline tools - -- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - - > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - -- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) - - > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - -## Software packaging/containerisation tools - -- [Anaconda](https://anaconda.com) - - > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - -- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) - - > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - -- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) - - > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - -- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - - > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - -- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) - - > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/README.md b/README.md index 48bd78c9..121ca4ed 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ [![GitHub Actions CI Status](https://github.com/nf-cmgg/structural/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-cmgg/structural/actions/workflows/ci.yml) -[![GitHub Actions Linting Status](https://github.com/nf-cmgg/structural/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-cmgg/structural/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) +[![GitHub Actions Linting Status](https://github.com/nf-cmgg/structural/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-cmgg/structural/actions/workflows/linting.yml) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.0-23aa62.svg)](https://www.nextflow.io/) @@ -10,62 +10,6 @@ ## Introduction -**nf-cmgg/structural** is a bioinformatics best-practice analysis pipeline for calling structural variants from short reads. +**nf-cmgg/structural** is a bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads. The pipeline handles the calling of the variants and postprocessing (filtering, annotating...) -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! - -![metro map](docs/images/metro_map.png) - -## Usage - -> **Note** -> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how -> to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) -> with `-profile test` before running the workflow on actual data. - -Now, you can run the pipeline using: - -```bash -nextflow run nf-cmgg/structural \ - -profile \ - --input samplesheet.csv \ - --outdir -``` - -> **Warning:** -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those -> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). - -## Documentation - -The CenterForMedicalGenetics/structural pipeline comes with documentation about the pipeline [usage](https://github.com/nf-cmgg/structural/tree/master/docs/usage.md) and [output](https://github.com/nf-cmgg/structural/tree/master/docs/output.md). - -> [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). - -## Credits - -nf-cmgg/structural was originally written by Nicolas Vannieuwkerke and Mattias Van Heetvelde. - -## Contributions and Support - -If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). - -## Citations - - - - - - -An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. - -You can cite the `nf-core` publication as follows: - -> **The nf-core framework for community-curated bioinformatics pipelines.** -> -> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. -> -> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +Please have a look at the [documentation](https://nf-cmgg.github.io/structural/latest/) on how to run the pipeline diff --git a/assets/schema_input.json b/assets/schema_input.json index 93475717..358cd2d1 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -9,11 +9,15 @@ "properties": { "sample": { "type": "string", - "meta": ["id", "sample"] + "meta": ["id", "sample"], + "pattern": "^\\S+$", + "errorMessage": "The sample name must be a string and cannot contain spaces." }, "family": { "type": "string", - "meta": ["family"] + "meta": ["family"], + "pattern": "^\\S+$", + "errorMessage": "The family name must be a string and cannot contain spaces." }, "cram": { "type": "string", diff --git a/conf/test.config b/conf/test.config index face16c4..715a83b6 100644 --- a/conf/test.config +++ b/conf/test.config @@ -32,13 +32,11 @@ params { qdnaseq_male = params.test_data["homo_sapiens"]["genome"]["genome_qdnaseq"] qdnaseq_female = params.test_data["homo_sapiens"]["genome"]["genome_qdnaseq"] igenomes_ignore = true - genomes_ignore = false + genomes_ignore = true genome = 'GRCh38' - genomes_base = "s3://reference-data/genomes" vep_cache = null annotsv_annotations = null - annotate = true concat_output = true // Pipeline parameters diff --git a/docs/CITATIONS.md b/docs/CITATIONS.md new file mode 100644 index 00000000..b38d5279 --- /dev/null +++ b/docs/CITATIONS.md @@ -0,0 +1,99 @@ +# nf-cmgg/structural: Citations + +## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. + +## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) + +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools + +- [AnnotSV](https://pubmed.ncbi.nlm.nih.gov/29669011/) + + > Geoffroy V, Herenger Y, Kress A, Stoetzel C, Piton A, Dollfus H, Muller J. AnnotSV: an integrated tool for structural variations annotation. Bioinformatics. 2018 Oct 15;34(20):3572-3574. doi: 10.1093/bioinformatics/bty304. PMID: 29669011. + +- [BCFTools](https://pubmed.ncbi.nlm.nih.gov/21903627/) + + > Li H: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. PubMed PMID: 21903627; PubMed Central PMCID: PMC3198575. + +- [bedgovcf](https://github.com/nvnieuwk/bedgovcf) + +- [DELLY](https://academic.oup.com/bioinformatics/article/28/18/i333/245403) + + > Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stütz, Vladimir Benes, Jan O. Korbel, DELLY: structural variant discovery by integrated paired-end and split-read analysis, Bioinformatics, Volume 28, Issue 18, September 2012, Pages i333–i339, https://doi.org/10.1093/bioinformatics/bts378 + +- [EnsemblVEP](https://pubmed.ncbi.nlm.nih.gov/27268795/) + + > McLaren W, Gil L, Hunt SE, et al.: The Ensembl Variant Effect Predictor. Genome Biol. 2016 Jun 6;17(1):122. doi: 10.1186/s13059-016-0974-4. PubMed PMID: 27268795; PubMed Central PMCID: PMC4893825. + +- [ExpansionHunter](https://academic.oup.com/bioinformatics/article/35/22/4754/5499079) + + > Egor Dolzhenko, Viraj Deshpande, Felix Schlesinger, Peter Krusche, Roman Petrovski, Sai Chen, Dorothea Emig-Agius, Andrew Gross, Giuseppe Narzisi, Brett Bowman, Konrad Scheffler, Joke J F A van Vugt, Courtney French, Alba Sanchis-Juan, Kristina Ibáñez, Arianna Tucci, Bryan R Lajoie, Jan H Veldink, F Lucy Raymond, Ryan J Taft, David R Bentley, Michael A Eberle, ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions, Bioinformatics, Volume 35, Issue 22, November 2019, Pages 4754–4756, https://doi.org/10.1093/bioinformatics/btz431 + +- [Gawk](https://www.gnu.org/software/gawk/) + +- [GNU sed](http://www.gnu.org/software/sed/) + +- [GNU tar](https://www.gnu.org/software/tar/) + +- [Jasmine](https://pubmed.ncbi.nlm.nih.gov/36658279/) + + > Kirsche M, Prabhu G, Sherman R, Ni B, Battle A, Aganezov S, Schatz MC. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods. 2023 Mar;20(3):408-417. doi: 10.1038/s41592-022-01753-3. Epub 2023 Jan 19. PMID: 36658279; PMCID: PMC10006329. + +- [Manta](https://pubmed.ncbi.nlm.nih.gov/26647377/) + + > Chen X, Schulz-Trieglaff O, Shaw R, et al.: Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016 Apr 15;32(8):1220-2. doi: 10.1093/bioinformatics/btv710. PubMed PMID: 26647377. + +- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) + + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +- [ngs-bits](https://github.com/imgag/ngs-bits) + +- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/) + + > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. + +- [QDNAseq](https://pubmed.ncbi.nlm.nih.gov/25236618/) + + > Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, van Essen HF, Eijk PP, Rustenburg F, Meijer GA, Reijneveld JC, Wesseling P, Pinkel D, Albertson DG, Ylstra B. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 2014 Dec;24(12):2022-32. doi: 10.1101/gr.175141.114. Epub 2014 Sep 18. PMID: 25236618; PMCID: PMC4248318. + +- [smoove](https://github.com/brentp/smoove) + +- [svync](https://github.com/nvnieuwk/svync) + +- [Tabix](https://academic.oup.com/bioinformatics/article/27/5/718/262743) + + > Li H, Tabix: fast retrieval of sequence features from generic TAB-delimited files, Bioinformatics, Volume 27, Issue 5, 1 March 2011, Pages 718–719, doi: 10.1093/bioinformatics/btq671. PubMed PMID: 21208982. PubMed Central PMCID: PMC3042176. + +- [Vcfanno](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5) + + > Pedersen, B.S., Layer, R.M. & Quinlan, A.R. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol 17, 118 (2016). https://doi.org/10.1186/s13059-016-0973-5 + +- [WisecondorX](https://academic.oup.com/nar/article/47/4/1605/5253050) + + > Lennart Raman, Annelies Dheedene, Matthias De Smet, Jo Van Dorpe, Björn Menten, WisecondorX: improved copy number detection for routine shallow whole-genome sequencing, Nucleic Acids Research, Volume 47, Issue 4, 28 February 2019, Pages 1605–1614, https://doi.org/10.1093/nar/gky1263 + +## Software packaging/containerisation tools + +- [Anaconda](https://anaconda.com) + + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) + + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + +- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + + > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. + +- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index bcd8d9d0..00000000 --- a/docs/README.md +++ /dev/null @@ -1,8 +0,0 @@ -# nf-cmgg/structural: Documentation - -The nf-cmgg/structural documentation is split into the following pages: - -- [Usage](usage.md) - - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. -- [Output](output.md) - - An overview of the different results produced by the pipeline and how to interpret them. diff --git a/docs/index.md b/docs/index.md index cad95adc..4dddc4a1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,10 +1,56 @@ --- title: nf-cmgg/structural -description: A bioinformatics best-practice analysis pipeline for calling structural variants from short reads. +description: A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads --- ---8<-- "README.md:10:16" +## Introduction + +**nf-cmgg/structural** is a bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads. The pipeline handles the calling of the variants and postprocessing (filtering, annotating...) + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all Nextflow pipelines! ![metro map](images/metro_map.png) ---8<-- "README.md:18:" +## Usage + +!!! note + + If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how + to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) + with `-profile test` before running the workflow on actual data. + +Now, you can run the pipeline using: + +```bash +nextflow run nf-cmgg/structural \ + -profile \ + --input samplesheet.csv \ + --outdir +``` + +!!! warning +Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those +provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; +see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). + +## Documentation + +The nf-cmgg/structural pipeline comes with documentation about the pipeline [usage](usage.md) and [output](output.md). + +!!! warning +Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; +see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). + +## Credits + +nf-cmgg/structural was originally written by Nicolas Vannieuwkerke and Mattias Van Heetvelde. + +## Contributions and Support + +If you would like to contribute to this pipeline, please see the [contributing guidelines](https://github.com/nf-cmgg/structural/blob/dev/.github/CONTRIBUTING.md). + +## Citations + + + +An extensive list of references for the tools used by the pipeline can be found in the [`Citations`](CITATIONS.md) section. diff --git a/docs/output.md b/docs/output.md index b2a8f031..a60124b5 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,4 +1,4 @@ -# nf-cmgg/structural: Output +# Output ## Introduction diff --git a/docs/parameters.md b/docs/parameters.md index d430abf1..ba9b8576 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -1,4 +1,4 @@ -# nf-cmgg/structural pipeline parameters +# Parameters A nextflow pipeline for calling structural variants diff --git a/docs/usage.md b/docs/usage.md index c94bf35b..33979fac 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,54 +1,77 @@ -# nf-cmgg/structural: Usage +# Usage -> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ +> _Documentation of pipeline parameters can be found in the [Parameters](parameters.md) section_ ## Introduction - +Try out the pipeline right now with following command (make sure you have [Docker](https://docs.docker.com/get-docker/) and [Nextflow](https://www.nextflow.io/) installed): + +```bash +nextflow run nf-cmgg/structural -profile test,docker --outdir results +``` ## Samplesheet input -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It can be a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) (comma separated values), [TSV](https://en.wikipedia.org/wiki/Tab-separated_values) (tab separated values), [JSON](https://www.json.org/json-en.html) (javascript object notation) or [YAML](https://en.wikipedia.org/wiki/YAML) file. ```bash --input '[path to samplesheet file]' ``` -### Multiple runs of the same sample +### Minimum required samplesheet -The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: +Following samplesheets contain all required columns needed to run the pipeline for two samples. ```csv title="samplesheet.csv" -sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz +sample,cram +ID01234,/path/to/ID01234.cram +ID56789,/path/to/ID56789.cram ``` -### Full samplesheet - -The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. +```tsv title="samplesheet.tsv" +sample cram +ID01234 /path/to/ID01234.cram +ID56789 /path/to/ID56789.cram +``` -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. +```yaml title="samplesheet.yaml" +- sample: ID01234 + cram: /path/to/ID01234.cram +- sample: ID56789 + cram: /path/to/ID56789.cram +``` -```csv title="samplesheet.csv" -sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz -CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz -TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, -TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, +```json title="samplesheet.json" +[ + { + "sample": "ID01234", + "cram": "/path/to/ID01234.cram" + }, + { + "sample": "ID56789", + "cram": "/path/to/ID56789.cram" + } +] ``` -| Column | Description | -| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | -| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | -| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +### All samplesheet options + +Following table contains all possible values for the samplesheet. -An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. +| Column | Description | Type | Required | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------------------ | +| `sample` | The custom sample name. Cannot contain spaces and doesn't have to unique. When the same sample has been given multiple times, the CRAM files will be merged. | string | :heavy_check_mark: | +| `family` | The family name of the current sample. All samples in the same family will be merged together at the end of the pipeline. Cannot contain spaces | string | :x: | +| `cram` | Path to the CRAM file to be used by the pipeline for the current sample. | string | :heavy_check_mark: | +| `crai` | Path to the CRAM index file | string | :x: | +| `small_variants` | A VCF containing the SNV (small nucleotide variants) for the current sample to be used by AnnotSV | string | :x: | +| `sex` | The sex of the sample to be used by QDNAseq. Sex will be imputed when missing (Options: `male` or `female`) | string | :x: | + +See following samplesheet for a working example of a samplesheet (used by the `test` profile of the pipeline): + +```csv title="example_samplesheet.csv" +--8<-- "assets/samplesheet.csv" +``` ## Running the pipeline @@ -63,19 +86,26 @@ This will launch the pipeline with the `docker` configuration profile. See below Note that the pipeline will create the following files in your working directory: ```bash -work # Directory containing the nextflow working files - # Finished results in specified location (defined with --outdir) -.nextflow_log # Log file from Nextflow -# Other nextflow hidden files, eg. history of pipeline runs and old logs. +work #(1)! +results #(2)! +.nextflow_log #(3)! +... #(4)! ``` +1. Directory containing the nextflow working files + +2. Finished results in specified location (defined with --outdir) + +3. Log file from Nextflow + +4. Other nextflow hidden files, eg. history of pipeline runs and old logs. + If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file. Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `. -:::warning +!!!warning Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args). -::: The above pipeline run specified with a params file in yaml format: @@ -92,8 +122,6 @@ genome: 'GRCh37' <...> ``` -You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). - ### Updating the pipeline When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: @@ -112,15 +140,13 @@ This version number will be logged in reports when you run the pipeline, so that To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter. -:::tip +!!!tip If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles. -::: ## Core Nextflow arguments -:::note +!!!note These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). -::: ### `-profile` @@ -128,9 +154,8 @@ Use this parameter to choose a configuration profile. Profiles can give configur Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below. -:::info +!!!info We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -::: The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). @@ -140,22 +165,22 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles. If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment. - `test` - - A profile with a complete configuration for automated testing - - Includes links to test data so needs no other parameters + > A profile with a complete configuration for automated testing + > Includes links to test data so needs no other parameters - `docker` - - A generic configuration profile to be used with [Docker](https://docker.com/) + > A generic configuration profile to be used with [Docker](https://docker.com/) - `singularity` - - A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) + > A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) - `podman` - - A generic configuration profile to be used with [Podman](https://podman.io/) + > A generic configuration profile to be used with [Podman](https://podman.io/) - `shifter` - - A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) + > A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) - `charliecloud` - - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) + > A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) - `apptainer` - - A generic configuration profile to be used with [Apptainer](https://apptainer.org/) + > A generic configuration profile to be used with [Apptainer](https://apptainer.org/) - `conda` - - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer. + > A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer. ### `-resume` @@ -177,13 +202,13 @@ To change the resource requests, please see the [max resources](https://nf-co.re ### Custom Containers -In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date. +In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default this pipeline uses containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date. To use a different container from the default container or conda environment specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website. ### Custom Tool Arguments -A pipeline might not always support every possible argument or option of a particular tool used in pipeline. Fortunately, nf-core pipelines provide some freedom to users to insert additional parameters that the pipeline does not include by default. +A pipeline might not always support every possible argument or option of a particular tool used in pipeline. Fortunately, this pipeline provides some freedom to users to insert additional parameters that the pipeline does not include by default. To learn how to provide additional arguments to a particular tool of the pipeline, please see the [customising tool arguments](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) section of the nf-core website. diff --git a/mkdocs.yml b/mkdocs.yml index 7c0147d8..efb31cb1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -8,6 +8,7 @@ nav: - Usage: usage.md - Output: output.md - Parameters: parameters.md + - Citations: CITATIONS.md theme: name: material @@ -45,6 +46,12 @@ theme: - toc.follow markdown_extensions: + - attr_list + - md_in_html + - admonition + - pymdownx.emoji: + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg - pymdownx.highlight: anchor_linenums: true line_spans: __span diff --git a/nextflow.config b/nextflow.config index 3dbb3e7c..3fcb4721 100644 --- a/nextflow.config +++ b/nextflow.config @@ -278,7 +278,7 @@ manifest { name = 'nf-cmgg/structural' author = 'Nicolas Vannieuwkerke & Mattias Van Heetvelde' homePage = 'https://github.com/nf-cmgg/structural' - description = 'A nextflow pipeline for calling structural variants' + description = 'A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads' mainScript = 'main.nf' nextflowVersion = '!>=23.10.0' version = '0.1.0dev'