A proteomics sample metadata representation for multiomics integration and big data analysis

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Dokumenter

A proteomics sample metadata representation for multiomics integration and big data analysis
Forlagets udgivne version, 2,76 MB, PDF-dokument

Chengxin Dai
Anja Füllgrabe
Julianus Pfeuffer
Elizaveta M. Solovyeva
Jingwen Deng
Pablo Moreno
Selvakumar Kamatchinathan
Deepti Jaiswal Kundu
Nancy George
Silvie Fexova
Björn Grüning
Melanie Christine Föll
Johannes Griss
Marc Vaudel
Enrique Audain
Michael Turewicz
Martin Eisenacher
Julian Uszkoreit
Tim Van Den Bossche
Veit Schwämmle
Stefan Schulze
David Bouyssié
Savita Jayaram
Vinay Kumar Duggineni
Patroklos Samaras
Mathias Wilhelm
Meena Choi
Mingxun Wang
Oliver Kohlbacher
Alvis Brazma
Irene Papatheodorou
Nuno Bandeira
Eric W. Deutsch
Juan Antonio Vizcaíno
Mingze Bai
Timo Sachsenberg
Lev I. Levitsky
Yasset Perez-Riverol

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Originalsprog	Engelsk
Artikelnummer	5854
Tidsskrift	Nature Communications
Vol/bind	12
Udgave nummer	1
Antal sider	8
ISSN	2041-1723
DOI	https://doi.org/10.1038/s41467-021-26111-3
Status	Udgivet - 2021

Bibliografisk note

Funding Information:
YPR, SK, DJK, and JAV would like to acknowledge funding from the Wellcome Trust grant number 208391/Z/17/Z and EMBL core funding. MLP is supported financially by the Novo Nordisk Foundation (Grant agreement NNF14CC0001). MT and TS are supported by de.NBI, a project of the German Federal Ministry of Education and Research (BMBF) [grant number FKZ 031 A 534 A and FKZ 031 A 535 A]. ME and JU are members of the Center for Protein Diagnostics (PRODI), a grant from the Ministry of Innovation, Science, and Research of North-Rhine Westphalia, Germany. TVDB is supported by the Research Foundation—Flanders (SB grant 1S90918N). EWD acknowledges NIGMS grants R01GM087221, R24GM127667, and NSF grants 1933311 and 1922871. SS was supported by the NSF grant 1817518. CD and MB are supported by the National Key Research and Development Program of China (2017YFC0908404, 2017YFC0908405) and the Natural Science Foundation of Chongqing, China (cstc2018jcyjAX0225). Funds for the overall project were also made available by an ELIXIR Implementation Study. LIL and EMS are supported by the Russian Basic Science Foundation (grant #18-29-13015).

Publisher Copyright:
© 2021, The Author(s).

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 283756960