Gabor J. Szekely
DMS Division of Mathematical Sciences
MPS Directorate for Mathematical & Physical Sciences
Start Date:
October 1, 2006
Expires:
July 31, 2009 (Estimated)
Awarded Amount to Date:
$54454
Investigator(s):
Laura Kubatko lkubatko@stat.ohio-state.edu (Principal Investigator)
Sponsor:
Ohio State University Research Foundation
1960 KENNY RD
Columbus, OH 43210 614/292-3732
NSF Program(s):
STATISTICS
Field Application(s):
0000099 Other Applications NEC
Program Reference Code(s):
OTHR, 9150, 0000
Program Element Code(s):
1269
ABSTRACT
Abstract
Prop ID: DMS-0505265
Prev Awd: 0104290
PI: Salter, Laura
Institution: University of New Mexico
Title: Gene Tree-Species Tree Relationships Under the Coalescent Process
In this proposal, incongruence in gene trees and species trees is
examined using data from multiple genes in the context of the
coalescent process. First, the investigator will show
that one currently advocated approach for the analysis of data
from multiple genes, the concatenation approach, can be
statistically inconsistent, even when a consistent method of
phylogenetic tree estimation is used. Second, an algorithm for
maximum likelihood (ML) estimation of species trees from data on
multiple genes under the coalescent model will be developed and
implemented, and will be made freely available via the internet.
The availability of a method for ML species tree estimation will
allow for likelihood-based hypothesis testing of phylogeographic
and population genetic hypotheses. Further, methods for assessing
uncertainty in the species tree estimates will be developed by
extending traditional bootstrapping methods in phylogenetics to
the case in which data have been collected for multiple genes
sampled randomly throughout the genome. Finally, tests for the
adequacy of the coalescent model will be developed by examining
whether the observed gene trees are consistent with a given
species tree using several metrics to measure levels of
incongruence.
The inference of the evolutionary history of a collection of organisms
based on the information contained in their DNA sequences is a problem
of fundamental importance in evolutionary biology. The abundance of DNA
sequence data arising from genome sequencing projects has led to
significant challenges in the inference of these phylogenetic
relationships. Among these challenges is the inference of the evolutionary
history of a collection of species based on DNA sequence information
from several distinct genes sampled throughout the genome. This project
studies the effect of the coalescent process on the inference of species
phylogenies using data from multiple genes. This work will first
demonstrate that failure to model the coalescent process can lead to
incorrect inferences of species relationships. The investigator will then
develop methods that can accurately estimate species phylogenies through
explicitly modeling the coalescent process, and will apply these
estimation procedures to construct techniques for hypothesis testing and
for measuring uncertainty in estimated species phylogenies.
--
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
C. Meng and L. Kubatko. "Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model," Theoretical Population Biology, v.75, 2009.
Efromovich, S. and L. Salter Kubatko. "Coalescent Time Distributions in Trees of Arbitrary Size," Statistical Applications in Genetics and Molecular Biology, v.7, 2008, p. Issue 1,.
Efromovich, S. and L. Salter Kubatko. "Coalescent Time Distributions in Trees of Arbitrary Size," Statistical Applications in Genetics and Molecular Biology, v.7, 2008, p. Issue 1,.
Kubatko, LS; Carstens, BC; Knowles, LL. "STEM: species tree estimation using maximum likelihood for gene trees under coalescence," BIOINFORMATICS, v.25, 2009, p. 971-973.
Kubatko, LS; Degnan, JH. "Inconsistency of phylogenetic estimates from concatenated data under coalescence," SYSTEMATIC BIOLOGY, v.56, 2007, p. 17-24.
Please report errors in award information by writing to: awardsearch@nsf.gov.