Talk:Nextflow

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Regarding "Undisclosed payments" allegation[edit]

Thanks for being so observant to notice that I have made a major overhaul of this article draft. I see, why this edit might raise some concerns regarding undisclosed payments, but rest assured that I have not received any payments or other benefits in exchange for this edit. Therefore, I deny those accusations.

Please allow me to elaborate on my background and the motivation for working on this article: While I do not have a user page on the English Wikipedia, I do have one on the German Wikipedia as well as on Wikimedia Commons. I have been contributing to both since 2007, when I was an undergrad student at the university. I have mostly ceased from contributing to Wikipedia due to time constraints by now, but of course continue to cherish the project.

When I learned that an academic collaborator of mine wanted to add an article for Nextflow to Wikipedia, I was thrilled. It had never occurred to me, but I was immediately convinced that this is a good idea, because it is a very important tool for our research work and students might want to look it up when it is mentioned e.g. in the method section of scientific publications. However, I had to agree with Onel5969, that the article in its previous form was written too promotional and partly incomprehensible. So I sacrificed a free afternoon (and unintentionally also the evening) to fix the (in my opinion) most blatant issues. Admittedly, it was way more work than what I initially wanted to put into the edit, but I also couldn't stop halfway through either, since I eventually ended up changing the entire structure.

I have tried to incorporate the criticisms expressed to the best of my ability, but agree that there should be an external review to ensure that haven't been too sympathetic with the subject. However, the suspicion that

Conflict of interest statement: I am employed by the Swedish National Genomics Infrastructure, which co-founded the nf-core community. Phil Ewels is a former colleague of mine, and I have met other core Nextflow/nf-core contributors at scientific conferences. I work with Nextflow daily, and it is fundamental for my work. However, I have not received any payments or other benefits for editing the article, and it was my own decision to devote my spare time to edit. I have no investments in or affiliations with Sequera, the spin-off company that maintains Nextflow. My real name is Matthias Zepper, which you can use to verify this information.

--Curnen (talk) 17:23, 6 December 2022 (UTC)[reply]

Additional references[edit]

Since the lack of references to substantiate the importance of Nextflow as notable scientific workflow system has been criticized twice, I have spent a whole Saturday gathering the scientific publications of numerous pipelines written in Nextflow. I appreciate that this is too much clutter for the main article, but before anyone criticizes for the third time that sources don't exist, I'd like to keep them around here for future reference.

The claim that Over the last five years, numerous pipelines for many different applications and analyses in the field of genomics have been published is backed by those references:

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] Curnen (talk) 10:35, 2 February 2023 (UTC)[reply]

References

  1. ^ Song, Zeyuan; Gurinovich, Anastasia; Federico, Anthony; Monti, Stefano; Sebastiani, Paola (2021). "Nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline". Journal of Open Source Software. 6 (59): 2957. Bibcode:2021JOSS....6.2957S. doi:10.21105/joss.02957. PMC 9137404. PMID 35647481.
  2. ^ Twesigomwe, David; Drögemöller, Britt I.; Wright, Galen E.B.; Siddiqui, Azra; Rocha, Jorge; Lombard, Zané; Hazelhurst, Scott (2021). "StellarPGx: A Nextflow Pipeline for Calling Star Alleles in Cytochrome P450 Genes". Clinical Pharmacology & Therapeutics. 110 (3): 741–749. doi:10.1002/cpt.2173. PMID 33492672. S2CID 231704161.
  3. ^ Hölzer, Martin; Marz, Manja (2021). "PoSei Don: A Nextflow pipeline for the detection of evolutionary recombination events and positive selection". Bioinformatics. 37 (7): 1018–1020. doi:10.1093/bioinformatics/btaa695. PMID 32735310.
  4. ^ Hu, Kai; Liu, Haibo; Lawson, Nathan D.; Zhu, Lihua Julie (2022). "ScATACpipe: A nextflow pipeline for comprehensive and reproducible analyses of single cell ATAC-seq data". Frontiers in Cell and Developmental Biology. 10: 981859. doi:10.3389/fcell.2022.981859. PMC 9551270. PMID 36238687.
  5. ^ Mpangase, Phelelani; Frost, Jacqueline; Tikly, Mohammed; Ramsay, Michèle; Hazelhurst, Scott (2021). "Nf-rnaSeq Count: A Nextflow pipeline for obtaining raw read counts from RNA-seq data". South African Computer Journal. 33 (2). doi:10.18489/sacj.v33i2.830. PMC 9097006. PMID 35574063.
  6. ^ Bao, Xiaoqiong; Zhu, Kaiyu; Liu, Xuefei; Chen, Zhihang; Luo, Ziwei; Zhao, Qi; Ren, Jian; Zuo, Zhixiang (2022). "MeRIPseq Pipe: An integrated analysis pipeline for MeRIP-seq data based on Nextflow". Bioinformatics. 38 (7): 2054–2056. doi:10.1093/bioinformatics/btac025. PMID 35022687.
  7. ^ Van De Sande, Bram; Flerin, Christopher; Davie, Kristofer; De Waegeneer, Maxime; Hulselmans, Gert; Aibar, Sara; Seurinck, Ruth; Saelens, Wouter; Cannoodt, Robrecht; Rouchon, Quentin; Verbeiren, Toni; De Maeyer, Dries; Reumers, Joke; Saeys, Yvan; Aerts, Stein (2020). "A scalable SCENIC workflow for single-cell gene regulatory network analysis". Nature Protocols. 15 (7): 2247–2276. doi:10.1038/s41596-020-0336-2. PMID 32561888. S2CID 219935802.
  8. ^ Liu, Xiaochuan; Bienkowska, Jadwiga R.; Zhong, Wenyan (2020). "GeneTEFlow: A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data". PLOS ONE. 15 (8): e0232994. Bibcode:2020PLoSO..1532994L. doi:10.1371/journal.pone.0232994. PMC 7458328. PMID 32866155.
  9. ^ Lataretu, Marie; Hölzer, Martin (2020). "RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow". Genes. 11 (12): 1487. doi:10.3390/genes11121487. PMC 7763471. PMID 33322033.
  10. ^ Zhao, Qi; Sun, Yu; Wang, Dawei; Zhang, Hongwan; Yu, Kai; Zheng, Jian; Zuo, Zhixiang (2018). "LNC Pipe: A Nextflow-based pipeline for identification and analysis of long non-coding RNAs from RNA-Seq data". Journal of Genetics and Genomics. 45 (7): 399–401. doi:10.1016/j.jgg.2018.06.005. PMID 30055874. S2CID 51865348.
  11. ^ Gordon, M. Grace; Inoue, Fumitaka; Martin, Beth; Schubach, Max; Agarwal, Vikram; Whalen, Sean; Feng, Shiyun; Zhao, Jingjing; Ashuach, Tal; Ziffra, Ryan; Kreimer, Anat; Georgakopoulos-Soares, Ilias; Yosef, Nir; Ye, Chun Jimmie; Pollard, Katherine S.; Shendure, Jay; Kircher, Martin; Ahituv, Nadav (2020). "LentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements". Nature Protocols. 15 (8): 2387–2412. doi:10.1038/s41596-020-0333-5. PMC 7550205. PMID 32641802.
  12. ^ Mousavi‐Derazmahalleh, Mahsa; Stott, Audrey; Lines, Rose; Peverley, Georgia; Nester, Georgia; Simpson, Tiffany; Zawierta, Michal; de la Pierre, Marco; Bunce, Michael; Christophersen, Claus T. (2021). "EDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity" (PDF). Molecular Ecology Resources. 21 (5): 1697–1704. doi:10.1111/1755-0998.13356. PMID 33580619. S2CID 231910408.
  13. ^ Armstrong, Ellie E.; Campana, Michael G. (2023). "Rates Tools: A Nextflow pipeline for detecting de novo germline mutations in pedigree sequence data". Bioinformatics. 39 (1). doi:10.1093/bioinformatics/btac784. PMID 36469327.
  14. ^ Steinig, Eike; Duchêne, Sebastián; Aglua, Izzard; Greenhill, Andrew; Ford, Rebecca; Yoannes, Mition; Jaworski, Jan; Drekore, Jimmy; Urakoko, Bohu; Poka, Harry; Wurr, Clive; Ebos, Eri; Nangen, David; Manning, Laurens; Laman, Moses; Firth, Cadhla; Smith, Simon; Pomat, William; Tong, Steven Y C.; Coin, Lachlan; McBryde, Emma; Horwood, Paul (2022). "Phylodynamic Inference of Bacterial Outbreak Parameters Using Nanopore Sequencing". Molecular Biology and Evolution. 39 (3). doi:10.1093/molbev/msac040. PMC 8963328. PMID 35171290.
  15. ^ Rodríguez-Pérez, Héctor; Ciuffreda, Laura; Flores, Carlos (2022). "NanoRTax, a real-time pipeline for taxonomic and diversity analysis of nanopore 16S rRNA amplicon sequencing data". Computational and Structural Biotechnology Journal. 20: 5350–5354. doi:10.1016/j.csbj.2022.09.024. PMC 9522874. PMID 36212537.
  16. ^ Brandenburg, Jean-Tristan; Clark, Lindsay; Botha, Gerrit; Panji, Sumir; Baichoo, Shakuntala; Fields, Christopher; Hazelhurst, Scott (2022). "H3AGWAS: A portable workflow for genome wide association studies". BMC Bioinformatics. 23 (1): 498. doi:10.1186/s12859-022-05034-w. PMC 9675212. PMID 36402955.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  17. ^ Bremges, Andreas; Fritz, Adrian; McHardy, Alice C. (2020). "CAMITAX: Taxon labels for microbial genomes". GigaScience. 9 (1). doi:10.1093/gigascience/giz154. PMC 6946028. PMID 31909794.
  18. ^ Talenti, Andrea; Prendergast, James (2021). "Nf-LO: A Scalable, Containerized Workflow for Genome-to-Genome Lift over". Genome Biology and Evolution. 13 (9). doi:10.1093/gbe/evab183. PMC 8412297. PMID 34383887.
  19. ^ Cornet, Luc; Ahn, Anne-Catherine; Wilmotte, Annick; Baurain, Denis (2021). "ORPER: A Workflow for Constrained SSU rRNA Phylogenies". Genes. 12 (11): 1741. doi:10.3390/genes12111741. PMC 8623055. PMID 34828348.
  20. ^ Marquet, Mike; Hölzer, Martin; Pletz, Mathias W.; Viehweger, Adrian; Makarewicz, Oliwia; Ehricht, Ralf; Brandt, Christian (2022). "What the Phage: A scalable workflow for the identification and analysis of phage sequences". GigaScience. 11. doi:10.1093/gigascience/giac110. PMC 9673492. PMID 36399058.
  21. ^ Schmal, Matthias; Girod, Crystal; Yaver, Debbie; Mach, Robert L; Mach-Aigner, Astrid R (2022). "A bioinformatic-assisted workflow for genome-wide identification of ncRNAs". Nar Genomics and Bioinformatics. 4 (3): lqac059. doi:10.1093/nargab/lqac059. PMC 9376865. PMID 35979446.
  22. ^ Albanese, Davide; Donati, Claudio (2021). "Large-scale quality assessment of prokaryotic genomes with metashot/Prok-quality". F1000Research. 10: 822. doi:10.12688/f1000research.54418.1. PMC 8804904. PMID 35136576.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  23. ^ Carpanzano, Simone; Santorsola, Mariangela; Lescai, Francesco; Lescai, F. (2022). "Hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer". International Journal of Molecular Sciences. 23 (23): 14512. doi:10.3390/ijms232314512. PMC 9738810. PMID 36498841.
  24. ^ Hadish, John A.; Biggs, Tyler D.; Shealy, Benjamin T.; Bender, M. Reed; McKnight, Coleman B.; Wytko, Connor; Smith, Melissa C.; Feltus, F. Alex; Honaas, Loren; Ficklin, Stephen P. (2022). "GEMmaker: Process massive RNA-seq datasets on heterogeneous computational infrastructure". BMC Bioinformatics. 23 (1): 156. doi:10.1186/s12859-022-04629-7. PMC 9063052. PMID 35501696.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  25. ^ Cope, Alexander L.; Anderson, Felicity; Favate, John; Jackson, Michael; Mok, Amanda; Kurowska, Anna; Liu, Junchen; MacKenzie, Emma; Shivakumar, Vikram; Tilton, Peter; Winterbourne, Sophie M.; Xue, Siyin; Kavoussanakis, Kostas; Lareau, Liana F.; Shah, Premal; Wallace, Edward W J. (2022). "Riboviz 2: A flexible and robust ribosome profiling data analysis and visualization workflow". Bioinformatics. 38 (8): 2358–2360. doi:10.1093/bioinformatics/btac093. PMC 9004635. PMID 35157051.
  26. ^ Rivera‐Vicéns, Ramón E.; Garcia‐Escudero, Catalina A.; Conci, Nicola; Eitel, Michael; Wörheide, Gert (2022). "Trans Pi—a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly". Molecular Ecology Resources. 22 (5): 2070–2086. doi:10.1111/1755-0998.13593. PMID 35119207. S2CID 231980763.
  27. ^ Márquez, Yamile; Mantica, Federica; Cozzuto, Luca; Burguera, Demian; Hermoso-Pulido, Antonio; Ponomarenko, Julia; Roy, Scott W.; Irimia, Manuel (2021). "Ex Orthist: A tool to infer exon orthologies at any evolutionary distance". Genome Biology. 22 (1): 239. doi:10.1186/s13059-021-02441-9. PMC 8379844. PMID 34416914.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  28. ^ Sensalari, Cecilia; Maere, Steven; Lohaus, Rolf (2022). "Ksrates: Positioning whole-genome duplications relative to speciation events in KS distributions". Bioinformatics. 38 (2): 530–532. doi:10.1093/bioinformatics/btab602. PMID 34406368.
  29. ^ Vromman, Marieke; Anckaert, Jasper; Vandesompele, Jo; Volders, Pieter-Jan (2022). "CIRCprimerXL: Convenient and High-Throughput PCR Primer Design for Circular RNA Quantification". Frontiers in Bioinformatics. 2: 834655. doi:10.3389/fbinf.2022.834655. PMC 9580850. PMID 36304334.
  30. ^ Brandt, Christian; Krautwurst, Sebastian; Spott, Riccardo; Lohde, Mara; Jundzill, Mateusz; Marquet, Mike; Hölzer, Martin (2022). "Corrigendum: Pore Cov - an Easy to Use, Fast, and Robust Workflow for SARS CoV-2 Genome Reconstruction via Nanopore Sequencing". Frontiers in Genetics. 13: 875644. doi:10.3389/fgene.2022.875644. PMC 8964395. PMID 35368706.
  31. ^ Van Damme, Renaud; Hölzer, Martin; Viehweger, Adrian; Müller, Bettina; Bongcam-Rudloff, Erik; Brandt, Christian (2021). "Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN)". PLOS Computational Biology. 17 (2): e1008716. Bibcode:2021PLSCB..17E8716V. doi:10.1371/journal.pcbi.1008716. PMC 7899367. PMID 33561126.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  32. ^ Titmuss, Emma; Corbett, Richard D.; Davidson, Scott; Abbasi, Sanna; Williamson, Laura M.; Pleasance, Erin D.; Shlien, Adam; Renouf, Daniel J.; Jones, Steven J. M.; Laskin, Janessa; Marra, Marco A. (2022). "TMBur: A distributable tumor mutation burden approach for whole genome sequencing". BMC Medical Genomics. 15 (1): 190. doi:10.1186/s12920-022-01348-z. PMC 9450342. PMID 36071521.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  33. ^ Vasilopoulou, Christina; Wingfield, Benjamin; Morris, Andrew P.; Duddy, William (2021). "SNPQT: Flexible, reproducible, and comprehensive quality control and imputation of genomic data". F1000Research. 10: 567. doi:10.12688/f1000research.53821.2. PMC 8637247. PMID 34900230.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  34. ^ Senkin, Sergey (2021). "MSA: Reproducible mutational signature attribution with confidence based on simulations". BMC Bioinformatics. 22 (1): 540. doi:10.1186/s12859-021-04450-8. PMC 8567580. PMID 34736398.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  35. ^ Shafranskaya, Daria; Kale, Varsha; Finn, Rob; Lapidus, Alla L.; Korobeynikov, Anton; Prjibelski, Andrey D. (2022). "MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data". Frontiers in Microbiology. 13: 981458. doi:10.3389/fmicb.2022.981458. PMC 9651917. PMID 36386613.
  36. ^ Bryzghalov, Oleksii; Makałowska, Izabela; Szcześniak, Michał Wojciech (2021). "LNC Evo: Automated identification and conservation study of long noncoding RNAs". BMC Bioinformatics. 22 (1): 59. doi:10.1186/s12859-021-03991-2. PMC 7871587. PMID 33563213.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  37. ^ Farkas, Carlos; Recabal, Antonia; Mella, Andy; Candia-Herrera, Daniel; Olivero, Maryori González; Haigh, Jody Jonathan; Tarifeño-Saldivia, Estefanía; Caprile, Teresa (2022). "Annotate_my_genomes: An easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing". GigaScience. 11. doi:10.1093/gigascience/giac099. PMC 9724561. PMID 36472574.
  38. ^ Kässens, Jan Christian; Wienbrandt, Lars; Ellinghaus, David (2021). "BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data". GigaScience. 10 (6). doi:10.1093/gigascience/giab047. PMC 8239664. PMID 34184051.
  39. ^ Donovan, Paul D.; McHale, Natalie M.; Venø, Morten T.; Prehn, Jochen H M. (2021). "TsRNAsearch: A pipeline for the identification of tRNA and ncRNA fragments from small RNA-sequencing data". Bioinformatics. 37 (23): 4424–4430. doi:10.1093/bioinformatics/btab515. PMID 34255836.
  40. ^ Lexa, Matej; Cechova, Monika; Nguyen, Son Hoang; Jedlicka, Pavel; Tokan, Viktor; Kubat, Zdenek; Hobza, Roman; Kejnovsky, Eduard (2022). "HiC-TE: A computational pipeline for Hi-C data analysis to study the role of repeat family interactions in the genome 3D organization". Bioinformatics. 38 (16): 4030–4032. doi:10.1093/bioinformatics/btac442. PMID 35781332.
  41. ^ Murigneux, Valentine; Roberts, Leah W.; Forde, Brian M.; Phan, Minh-Duy; Nhu, Nguyen Thi Khanh; Irwin, Adam D.; Harris, Patrick N. A.; Paterson, David L.; Schembri, Mark A.; Whiley, David M.; Beatson, Scott A. (2021). "MicroPIPE: Validating an end-to-end workflow for high-quality complete bacterial genome construction". BMC Genomics. 22 (1): 474. doi:10.1186/s12864-021-07767-z. PMC 8235852. PMID 34172000.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  42. ^ Grassi, Luigi; Harris, Claire; Zhu, Jie; Hardman, Colin; Hatton, Diane (2021). "DetectIS: A pipeline to rapidly detect exogenous DNA integration sites using DNA or RNA paired-end sequencing data". Bioinformatics. 37 (22): 4230–4232. doi:10.1093/bioinformatics/btab366. PMC 9502153. PMID 33978747.
  43. ^ Crook, Derrick; Volk, Denis; Yang-Turner, Fan; Xu, Yifei (2020). "NanoSPC: A scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline". Nucleic Acids Research. 48 (W1): W366–W371. doi:10.1093/nar/gkaa413. PMC 7319573. PMID 32442274.
  44. ^ Miller, Brecca R.; Morse, Alison M.; Borgert, Jacqueline E.; Liu, Zihao; Sinclair, Kelsey; Gamble, Gavin; Zou, Fei; Newman, Jeremy R B.; León-Novelo, Luis G.; Marroni, Fabio; McIntyre, Lauren M. (2021). "Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE)". G3 Genes|Genomes|Genetics. 11 (5). doi:10.1093/g3journal/jkab096. PMC 8104932. PMID 33772539.
  45. ^ Guerra-Assunção, José Afonso; Conde, Lucia; Moghul, Ismail; Webster, Amy P.; Ecker, Simone; Chervova, Olga; Chatzipantsiou, Christina; Prieto, Pablo P.; Beck, Stephan; Herrero, Javier (2020). "Genome Chronicler: The Personal Genome Project UK Genomic Report Generator Pipeline". Frontiers in Genetics. 11: 518644. doi:10.3389/fgene.2020.518644. PMC 7541957. PMID 33193602.
  46. ^ Espinosa-Carrasco, Jose; Erb, Ionas; Hermoso Pulido, Toni; Ponomarenko, Julia; Dierssen, Mara; Notredame, Cedric (2018). "Pergola: Boosting Visualization and Analysis of Longitudinal Data by Unlocking Genomic Analysis Tools". iScience. 9: 244–257. Bibcode:2018iSci....9..244E. doi:10.1016/j.isci.2018.10.023. PMC 6231116. PMID 30419504.
  47. ^ Schön, Max E.; Eme, Laura; Ettema, Thijs J G. (2019). "Phylo Magnet: Fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics". Bioinformatics. 36 (6): 1718–1724. doi:10.1093/bioinformatics/btz799. PMC 7703773. PMID 31647547.
  48. ^ Cozzuto, Luca; Liu, Huanle; Pryszcz, Leszek P.; Pulido, Toni Hermoso; Delgado-Tejedor, Anna; Ponomarenko, Julia; Novoa, Eva Maria (2020). "MasterOf Pores: A Workflow for the Analysis of Oxford Nanopore Direct RNA Sequencing Datasets". Frontiers in Genetics. 11: 211. doi:10.3389/fgene.2020.00211. PMC 7089958. PMID 32256520.
  49. ^ Roe, David; Kuang, Rui (2020). "Accurate and Efficient KIR Gene and Haplotype Inference from Genome Sequencing Reads with Novel K-mer Signatures". Frontiers in Immunology. 11: 583013. doi:10.3389/fimmu.2020.583013. PMC 7727328. PMID 33324401.