Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 4,15 MB, PDF-dokument

Julia Koehler Leman
Sergey Lyskov
Steven M. Lewis
Jared Adolf-Bryfogle
Rebecca F. Alford
Kyle Barlow
Ziv Ben-Aharon
Daniel Farrell
Jason Fell
William A. Hansen
Ameya Harmalkar
Jeliazko Jeliazkov
Georg Kuenze
Justyna D. Krys
Ajasja Ljubetič
Amanda L. Loshbaugh
Jack Maguire
Rocco Moretti
Vikram Khipple Mulligan
Morgan L. Nance
Phuong T. Nguyen
Shane Ó Conchúir
Shourya S. Roy Burman
Rituparna Samanta
Shannon T. Smith
Frank Teets
Andrew Watkins
Hope Woods
Brahm J. Yachnin
Christopher D. Bahl
Chris Bailey-Kellogg
David Baker
Rhiju Das
Frank DiMaio
Sagar D. Khare
Tanja Kortemme
Jason W. Labonte
Jens Meiler
William Schief
Ora Schueler-Furman
Justin B. Siegel
Vladimir Yarov-Yarovoy
Brian Kuhlman
Andrew Leaver-Fay
Dominik Gront
Jeffrey J. Gray
Richard Bonneau

Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework, and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Originalsprog	Engelsk
Artikelnummer	6947
Tidsskrift	Nature Communications
Vol/bind	12
Antal sider	15
ISSN	2041-1723
DOI	https://doi.org/10.1038/s41467-021-27222-7
Status	Udgivet - 2021

Bibliografisk note

Funding Information:
ARO MURI W911NF-16-1-0372 to Watkins; American Heart Association 18POST34080422 to Kuenze; BSF 2015207 to Schueler-Furman, Ben-Aharon; Cancer Research Institute Irvington Postdoctoral Fellowship (CRI 3442) to Roy Burman; Can-dian Institutes of Health Research Postdoctoral Fellowship to Yachnin; Cyrus Biotechnology to Lewis; Simons Foundation to Bonneau, Koehler Leman, Mulligan; German Research Foundation KU 3510/1-1 to Kuenze; H2020 MSCA IF CC-LEGO 792305 to Ljubetic; HHMI to Baker; Hertz Foundation Fellowship to Alford; ISF 717/2017 to Schueler-Furman, Ben-Aharon; Lundbeck Foundation Fellowship R272-2017-4528 to Stein; Mistletoe Research Foundation Fellowship to Yachnin; NCN 2018/29/B/ST6/01989 to Gront, Krys; NIAID R01AI113867 to Schief, Adolf-Bryfogle; NIEHS P42ES004699 to Siegel; NIH 1R01GM123089 to Farrell, DiMaio; NIH 2R01GM098977 to Bailey-Kellogg; NIH F31-CA243353 to Smith; NIH F31-GM123616 to Jeliazkov; NIH GM067553 to Maguire; NIH NCI R21 CA219847 and NIH R01 GM121487 to Das, Watkins; NIH NHLBI 2R01HL128537 to Yarov-Yarovoy; NIH NIAID R21 AI156570 and NIH NIBIB R21 EB028342 to Bahl; NIH NIAID U01 AI150739, NIH NIDA R01 DA046138 to Meiler, Moretti; NIH NIGMS R01 GM080403 to Meiler, Moretti and Kuenze; NIH NIGMS R01 GM073151 to Kuhlman, Gray, Leaver-Fay, Lyskov, Moretti, Meiler; NIH NIGMS R01 GM121487 and NIH NIGMS R35 GM122579 to Das; NIH NIGMS 1R01GM132110 and NIH NINDS 1R01NS103954 to Yarov-Yarovoy; NIH NINDS UG3NS114956 to Nguyen, Yarov-Yarovoy; NIH F32 CA189246 to Labonte; NIH R01 GM 076324-11 to Siegel; NIH R01 GM129261 to Woods; NIH R01 GM078221 to Harmalkar, Roy Burman, Jeliazkov, Nance, Samanta, and Gray; NIH R01 GM127578 to Gray and Labonte; NIH R01 GM110089 to Loshbaugh, Kortemme, Barlow; NIH R35 GM131923 to Leaver-Fay, Teets, Kuhlman; NIH R01 GM132565 to Hansen, Khare; NSF 1507736 to Gray, Roy Burman; NSF 1627539 and NSF 1827246 to Siegel; NSF 1805510 to Siegel, Fell; NSF 2031785 to Bahl; NSF DBI‐1564692 to Loshbaugh, Kortemme, Barlow and O’Connor; NSF GRFP Fellowship to Alford; NSF CBET1923691 to Hansen, Khare; Novo Nordisk Foundation NNF18OC0033950 to Tiemann, Stein, Lindorff-Larsen; RosettaCommons Licensing Fund RC8010 to Bahl; RosettaCommons to Hansen, Mor-etti, Lyskov, Khare, Gray; NIH NRSA T32AI007244 and NIH U19AI117905 to Schief, Adolf-Bryfogle. The authors further thank Matt Mulqueen for expert administration of the multiple benchmark testing servers and cluster, RosettaCommons for hardware and staff support after the NIH ended their software infrastructure program, and companies that license Rosetta, providing support for critical software sustainability practices.

Publisher Copyright:
© 2021, The Author(s).

ID: 288711473