Poster
Sample Complexity of Branch-length Estimation by Maximum Likelihood
David Clancy · Hanbaek Lyu · Sebastien Roch
East Exhibition Hall A-B #E-1306
In evolutionary biology, researchers typically use trees to describe how species evolve from common ancestors. Each branching in the tree represents a point where one species splits into two. Here we are interested in estimating how much genetic change happens along each branch of the tree. One challenge is that we can usually only observe the “leaves” — the current species — and not what happened inside the tree. Simple algorithms based on making one small improvement at a time often work surprisingly well in practice, although establishing this mathematically remains difficult because the landscape of possible choices is complex and full of local traps.Our research makes progress towards this goal. We show that if you collect enough data from the leaves, there’s a universal condition — no matter how big or complex the tree is — where the landscape becomes smooth and well-behaved. In this case, the simple algorithm converges quickly and reliably to the correct answer. This helps provide support for widely used tools in evolutionary analysis and latent tree models in machine learning.