Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning. / Webel, Henry; Niu, Lili; Nielsen, Annelaura Bach; Locard-Paulet, Marie; Mann, Matthias; Jensen, Lars Juhl; Rasmussen, Simon.

In: Nature Communications, Vol. 15, 5405, 2024.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Webel, H, Niu, L, Nielsen, AB, Locard-Paulet, M, Mann, M, Jensen, LJ & Rasmussen, S 2024, 'Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning', Nature Communications, vol. 15, 5405. https://doi.org/10.1038/s41467-024-48711-5

APA

Webel, H., Niu, L., Nielsen, A. B., Locard-Paulet, M., Mann, M., Jensen, L. J., & Rasmussen, S. (2024). Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning. Nature Communications, 15, [5405]. https://doi.org/10.1038/s41467-024-48711-5

Vancouver

Webel H, Niu L, Nielsen AB, Locard-Paulet M, Mann M, Jensen LJ et al. Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning. Nature Communications. 2024;15. 5405. https://doi.org/10.1038/s41467-024-48711-5

Author

Webel, Henry ; Niu, Lili ; Nielsen, Annelaura Bach ; Locard-Paulet, Marie ; Mann, Matthias ; Jensen, Lars Juhl ; Rasmussen, Simon. / Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning. In: Nature Communications. 2024 ; Vol. 15.

Bibtex

@article{8258110c7b8045098e1a65064a675b92,
title = "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning",
abstract = "Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these.",
keywords = "Proteomics/methods, Deep Learning, Humans, Mass Spectrometry/methods, Supervised Machine Learning, Male",
author = "Henry Webel and Lili Niu and Nielsen, {Annelaura Bach} and Marie Locard-Paulet and Matthias Mann and Jensen, {Lars Juhl} and Simon Rasmussen",
note = "{\textcopyright} 2024. The Author(s).",
year = "2024",
doi = "10.1038/s41467-024-48711-5",
language = "English",
volume = "15",
journal = "Nature Communications",
issn = "2041-1723",
publisher = "nature publishing group",

}

RIS

TY - JOUR

T1 - Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning

AU - Webel, Henry

AU - Niu, Lili

AU - Nielsen, Annelaura Bach

AU - Locard-Paulet, Marie

AU - Mann, Matthias

AU - Jensen, Lars Juhl

AU - Rasmussen, Simon

N1 - © 2024. The Author(s).

PY - 2024

Y1 - 2024

N2 - Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these.

AB - Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these.

KW - Proteomics/methods

KW - Deep Learning

KW - Humans

KW - Mass Spectrometry/methods

KW - Supervised Machine Learning

KW - Male

U2 - 10.1038/s41467-024-48711-5

DO - 10.1038/s41467-024-48711-5

M3 - Journal article

C2 - 38926340

VL - 15

JO - Nature Communications

JF - Nature Communications

SN - 2041-1723

M1 - 5405

ER -

ID: 396732953