Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

Research output: Contribution to journal › Journal article › Research › peer-review

Documents

Sulltext
Final published version, 855 KB, PDF document

Vincent Cohen-Addad
Debarati Das
Kipouridis, Evangelos
Nikos Parotsidis
Thorup, Mikkel

We consider the numerical taxonomy problem of fitting a positive distance function D: ^(S₂⁾ → R_>0 by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”

Original language	English
Article number	10
Journal	Journal of the ACM
Volume	71
Issue number	2
Number of pages	41
ISSN	0004-5411
DOIs	https://doi.org/10.1145/3639453
Publication status	Published - 2024

Bibliographical note

Research areas

Approximation algorithms, hierarchical clustering, phylogenic reconstructions, tree metrics, ultrametrics

ID: 391118904

Forskning