Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species tree analysis

John E. McCormack1,8,9,* Brant C. Faircloth,2,9 Nicholas G. Crawford,3,9 Patricia Adair Gowaty,4,5 Robb T. Brumfield1,6 & Travis C. Glenn7,9

1 Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803
1 Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
3 Department of Biology, Boston University, Boston, MA 02215
4 Smithsonian Tropical Research Institute, MRC 0580-11 Unit 9100, Box 0948, DPO, AA 34002-9998
5 Institute of the Environment, University of California, Los Angeles, CA 90095
6 Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803
7 Department of Environmental Health Science and Georgia Genomics Facility, University of Georgia, Athens, GA 30602
8 Corresponding author: Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd., Los Angeles, CA 90041; Tel: 734-358-6886
9 These authors contributed equally to this work

* jmccormack _at_ lsu *dot* edu;

Abstract

Phylogenomics offers the potential to fully resolve the Tree of Life, but increasing genomic coverage also reveals conflicting evolutionary histories among genes, demanding new analytical strategies for elucidating a single history of life. Here, we outline a phylogenomic approach using a novel class of phylogenetic markers derived from ultraconserved elements and flanking DNA. Using species-tree analysis that accounts for discord among hundreds of independent loci, we show that this class of marker is useful for recovering deep-level phylogeny in placental mammals. In broad outline, our phylogeny agrees with recent phylogenomic studies of mammals, including several formerly controversial relationships. Our results also inform two outstanding questions in placental mammal phylogeny involving rapid speciation, where species tree methods are particularly needed. Contrary to most phylogenomic studies, our study supports a first-diverging placental mammal lineage that includes elephants and tenrecs (Afrotheria). The level of conflict among gene histories is consistent with this basal divergence occurring in or near a phylogenetic ‘anomaly zone’ where a failure to account for coalescent stochasticity will mislead phylogenetic inference. Addressing a long-standing phylogenetic mystery, we find support from a high genomic coverage data set for a traditional placement of bats (Chiroptera) sister to a clade containing Perissodactyla, Cetartiodactyla, and Carnivora, and not nested within the latter clade, as has been suggested recently, although support for this relationship was not strong. One of the most remarkable findings of our study is that ultraconserved elements and their flanking DNA are a rich source of phylogenetic information with strong potential for application across Amniotes.

License

Data generated as part of the research listed above data are available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. We denote data that we have generated using an asterisk (*) below.

Data from other research projects (genome sequences) are available under their respective license. Please see the license terms for these data sources by visiting the URLs listed in Supplementary Table S1.

The CC0 license does not waive our expectation that you will cite either these data or their associated publication following normal, scholarly practices.

Data

All data are compressed using bz2 or tar.bz2.

Probes

Databases

McCormack et al. RDB1 is an sqlite database containing UCE and probe location information, created when we were identifying UCE sequences between chicken, lizard, and finch.

McCormack et al. RDB2 is a mysql dumpfile holding Lastz output from alignments of McCormack et al. UCE probes to the genome sequences presented in Table S1. Stephen et al. RDB2 is a mysql dumpfile holding Lastz output from alignments of probes designed from the UCE sequences identified by [Stephen et al.][stephenscite] to the genome sequences presented in table S1.

Sequence Data

Sequence data are those non-duplicate regions from genomes presented in Table S1 that matched a probe in McCormack et al. UCE probes plus approximately 500 bp flanking sequence to each side of the match. These are the sequences that we aligned to produce the alignment data. Note that the [Stephen et al.][stephenscite] sequence data are actually for 261 loci, which we combined with the 183-loci data set to form the 444-loci data set. We refer to these data below as the 444-loci data set.

Alignment data

We provide alignment data as large archives for specific trees. Each archive contains:

Archives

Genetrees

Checksums

Probes

Databases

Sequence data

Alignment data

Computer code

We are actively updating our computer code for public release and moving items from private into public view. Much of this code is currently available from the following site, although we will likely provide an updated URL shortly:

Acknowledgments

We thank M. Springer for sharing the 20-locus mammal data set and J. Mattick for sharing UCE and exon data. S.P. Hubbell, J. Degnan, M. Sheehan, M. Alfaro, B. Carstens, and three anonymous reviewers provided helpful comments. One reviewer suggested the point about selection potentially improving the phylogenetic utility of UCEs. H. Hoekstra provided access to the Odyssey cluster supported by the Harvard FAS Sciences Division Research Computing Group to conduct phylogenetic analysis. A research grant from Amazon Web Services (Amazon.com) also supported phylogenetic computation. We thank the many scientists, institutions, and funding agencies who have contributed genomic data available via the UCSC Genome Browser.

Author Contributions

J.E.M., B.C.F., N.G.C., and T.C.G. designed the study; B.C.F. designed ultraconserved probes and created data sets and performed phylogenetic analysis; N.G.C. performed phylogenetic analysis; J.E.M. performed gene-tree frequency analysis; P.A.G. provided analytical resources; J.E.M., B.C.F., N.G.C., R.T.B., and T.C.G. wrote the manuscript. J.E.M., B.C.F., N.G.C., and T.C.G. contributed equally to the study. All authors discussed results and commented on the manuscript.