Why DNA barcoding works as well as it does is an unsolved scientific puzzle. It is long observed that mitochondrial DNA differences within animal species are generally much smaller than those among species and, in the landscape of phylogenetic trees, mitochondrial DNA sequences of most species form single clusters distinct from those of other species. As a result “mtDNA data and traditional taxonomic assignments tend to converge on what may be “real” biotic units in nature” (Avise and Walker 1999 Proc Natl Acad Sci USA 96:992). Although Avise and Walker’s original observation was largely based on terrestrial, temperate zone vertebrates, growing barcode libraries demonstrate similar patterning in diverse invertebrates, vertebrates, and protists in marine and terrestrial environments, and in tropical and temperate zones, and in at least some fungi and plants (see last week’s post on COI barcodes in red algae)
What underlies the usual patterning of small differences within and large differences among most animal species? The unsolved puzzle is how to reconcile these two findings. Large differences among closely-related species indicates mitochondrial DNA undergoes rapid sequence evolution, and there are reasonable mechanistic explanations for why this might be so. On the other hand, rapid sequence evolution should also lead to accumulation of sequence diversity within species over time and in those with large populations. Instead the data shows a relative absence of variation within most species, including those thought to be ancient and those with enormous population sizes. I will set aside two of the usual suspects: population bottlenecks and small effective population size. Population bottlenecks are implausible given the diversity of species showing this pattern. Postulating a small effective population size is a restatement of the finding of absence of variation, not an explanation.
This table-napkin analysis leads me to selective sweeps as pruning mitochondrial diversity within species (eg Bazin et al 2006 Science 312:570, see also editorial and reader commentary). If selective sweeps restrict mitochondrial diversity, then the question becomes what is being selected for? Environmental adaptation seems unlikely, as restricted variation is seen in species that are as best one can tell morphologically and ecologically unchanged (eg see earlier posts on horseshoe crabs, salamanders). It might be there is little tolerance for genetic variation due to interactions of mitochondrial proteins with other cellular components, but if so there should be species with genetic stasis in mitochondrial DNA, just as there are many species with apparent morphologic stasis. However, in simple distance trees most species show roughly similar genetic distances.
I am intrigued by a time series of influenza A hemagglutinin gene evolution which reflects competition between virus and host and wonder if there might be some kind of competition that helps drive mitochondrial sequence evolution forward and at the same time suppresses variation. It is exciting there will be an EMBO workshop “Molecular Biodiversity and DNA Barcodes” May 2007 in Rome which may help answer scientific questions posed by DNA barcode data.