The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Goldilocks finds mtDNA COI barcode length “just right” for distinguishing most animal species, asks why

The standard animal barcode 648 bp of mitochondrial gene COI seems “just right” for delimiting most animal species. If it were “too short”, then closely-related species would not be resolved. If it were “too long” then sequencing effort would be wasted. Here I examine what might underlie the Goldilocks effect.

The following figure looks at how often closely-related species (differing by .5%, 1%, or 2%) are predicted to have overlapping sequences. With the assumptions examined below, above 600 base pairs all but the most-closely-related species will be distinguished, and above 800 base pairs, there is little gain in sensitivity.

 

The assumptions underlying this table-napkin analysis appear supported by data so far:  

First, mitochondrial DNA sequence differences between closely-related species are widely and relatively evenly distributed throughout the protein coding and ribosomal genes. For example, see an earlier post with percent identity plots comparing whole mitochondrial genomes for congeneric salamanders. Further support is provided by a plot of parallel sequence differences in the 2 most commonly utilized mitochondrial genes, COI and cytB. 

Second, most closely-related animal species have COI sequences that differ by at least 1%. For example more than 98% of 13,320 congeneric pairs from a wide array of invertebrate and vertebrate species showed greater than 2% sequence difference (Hebert et al 2003 Proc Biol Sci 270:S96).  

Third, intraspecific sequence variation in mtDNA is generally very low, less than 1% in most animal species.

If most closely-related species can be distinguished by short mtDNA sequences, then recognizing the sets of mtDNA sequences that make up species, ie species delimitation, should at least sometimes be simple.  Using the neighbor joining tree of mtDNA barcodes below, an untrained person might pick out the groups of sequences that correspond to species. The top 5 groups represent previously unrecognized cryptic species of scorched mussel Brachidontes exustus (Lee and Foighil 2004 Mol Ecol 13:3527)

Goldilocks leaves us with the scientific questions: why are differences within most species so small, and why are the distances between most nearest neighbor species so large?

 

This entry was posted on Friday, August 25th, 2006 at 10:28 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.

Contact: mark.stoeckle@rockefeller.edu

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.