Mixture Models for Spherical Data with Applications to Protein Bioinformatics

Publikation: Bidrag til bog/antologi/rapportBidrag til bog/antologiForskningfagfællebedømt

Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algorithm to estimate the parameters in a mixture model. Hence, the exact maximum likelihood estimator is used here for the individual components. This paper is motivated by a challenging prize problem in structural bioinformatics of how proteins fold. It is known that hydrogen bonds play a key role in the folding of a protein. We explore this hydrogen bond geometry using a data set describing bonds between two amino acids in proteins. An appropriate coordinate system to represent the hydrogen bond geometry is proposed, with each bond represented as a point on a sphere. We fit mixtures of Kent distributions to different subsets of the hydrogen bond data to gain insight into how the secondary structure elements bond together, since the distribution of hydrogen bonds depends on which secondary structure elements are involved.

OriginalsprogEngelsk
TitelDirectional Statistics for Innovative Applications : A Bicentennial Tribute to Florence Nightingale
ForlagSpringer
Publikationsdato2022
Sider15-32
ISBN (Trykt)978-981-19-1043-2
ISBN (Elektronisk)978-981-19-1044-9
DOI
StatusUdgivet - 2022
NavnForum for Interdisciplinary Mathematics
ISSN2364-6748

Bibliografisk note

Funding Information:
Acknowledgements PMB was funded by a DTG award from the UK Engineering and Physical Sciences Research Council. The authors wish to thank G.J. McLachlan for helpful discussions and the referees for helpful comments. KVM thanks the Leverhulme Trust for an Emeritus Fellowship grant.

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

ID: 314302529