A combined approach for genome wide protein function annotation/prediction.

Research output: Contribution to journal › Conference article › Research › peer-review

Documents

A combined approach for genome wide protein function annotation/prediction
Final published version, 1.06 MB, PDF document

Alfredo Benso
Stefano Di Carlo
Hafeez Ur Rehman
Gianfranco Politano
Alessandro Savino
Prashanth Suravajhala

BACKGROUND: Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions.

METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO).

RESULTS: We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.

Original language	English
Article number	S1
Journal	Proteome Science
Volume	11
Issue number	Suppl 1
Number of pages	12
ISSN	1477-5956
DOIs	https://doi.org/10.1186/1477-5956-11-S1-S1
Publication status	Published - 7 Nov 2013
Externally published	Yes

ID: 136722675

Forskning

A combined approach for genome wide protein function annotation/prediction.

Documents