DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 1,1 MB, PDF-dokument

  • Vineet Thumuluri
  • José Juan Almagro Armenteros
  • Alexander Rosenberg Johansen
  • Henrik Nielsen
  • Winther, Ole

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

OriginalsprogEngelsk
TidsskriftNucleic Acids Research
Vol/bind50
Udgave nummerW1
Sider (fra-til)W228-W234
ISSN0305-1048
DOI
StatusUdgivet - 2022

Bibliografisk note

© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk


Ingen data tilgængelig

ID: 306112192