The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Visualizing birds so far

In 2004 PLoS Biology, Hebert and colleagues (I am a co-author) observed that differences in COI barcodes among 260 species of North American birds were generally much larger than those within species, with the result that “distinguishing species was generally straightforward.” In addition, we noted 4 birds with large intraspecific divergences that likely represented overlooked species. Our study included only about 1/50 of world birds (out of approximately 10,000 named species) and modest sampling of differences within species (multiple individuals (average 2.4, range 2-10) for 130 species), so not surprisingly some scientists wondered about the generalizability of the findings in birds in particular and animals in general. In an accompanying commentary, Cicero and Moritz wrote “…a true test of the precision of mtDNA barcodes to assign individuals to species…would require that all members of a genus be examined, rather than a random sample of imprecisely-defined close relatives, and that taxa be included from more than one geographic region.” They concluded their essay:

“But to determine when and where this approach [i.e., DNA barcoding] is applicable, we now need to discover the boundary conditions. The real challenge lies with tropical taxa and those with limited dispersal and thus substantial phylogeographic structure. Such analyses need to be taxonomically broad and need to extend beyond the focal geographic region to ensure that potential sister taxa are evaluated and can be discriminated. There is also the need to examine groups with frequent (possibly cryptic) hybridization, recent radiations, and high rates of gene transfer from mtDNA to the nucleus.”

As of today, the BOLD taxonomy browser at Phylum Chordata, Class Aves (www.boldsystems.org/views/taxbrowser.php?taxid=51 indicates over 24,000 barcoded avian specimens representing over 3,800 avian species, nearing 40% of world avifauna. By my count there are over 30 publications on DNA barcoding in birds, including large surveys in North America, Scandinavia, Argentina, Brazil, and Korea.

In the next few posts, I try to look at what we have learned so far, with an emphasis on visual representation. The short answer to the technical question of barcoding effectiveness in birds is that the early observations are borne out, with a few interesting exceptions. My rough summary is that about 95% of bird species can be distinguished by DNA barcode, the remainder are sorted into pairs or small sets of closely-related species, and about 10% of named species show large divergences that likely represent unrecognized species. This last observation brings up an important point–taxonomy is undergoing constant revision, even in a group as well-studied as birds. For example, over the past 30 years, about 10% of the roughly 2000 bird species on the American Ornithologists’ Union Check-list have had species limits revised, and this process is not near closure. So when we compare to sequence data to taxonomic classification, we have to keep in mind the latter is a moving target.

Looked at more broadly, the central finding is that mtDNA sequence differences in birds partition into distinct clusters. Most mtDNA sequence clusters correspond to a single named species, and the ongoing process of taxonomic revision is tightening the one-to-one correspondence between clusters and species designations. In fact, a person with no knowledge of avian biology could closely approximate species numbers and limits simply by sorting COI barcodes into sequence clusters. Of course, the concordance of species limits and mtDNA sequence differences is not a new observation (see for example Avise et al 1987, 1999; Moore 1995), but it is now backed up by much more data. An important but unsolved question is why mtDNA partitions into narrow clusters in birds and other animals. One or more of the proposed mechanisms may turn out to be correct but none has been proven so far.

To begin visualization survey:

Pairwise sequence differences within most bird species are small, usually much less than 1%. (“Pairwise sequence differences” means comparing each individual to every other individual of the same species; for n individuals there are n(n-1)/2 comparisons.) In looking at this data, I think that the absolute scale is important. One of the great benefits of working with a standard barcode region is that we can compare results across diverse taxa. To get into particulars, genetic differences are roughly similar in mtDNA protein coding genes (e.g. COI, cytb), but divergences in the mitochondrial control region are an order of magnitude greater.

Here is a look at MAXIMUM intraspecific distances (K2P metric) among some of the larger geographically-based surveys published so far.  (Most of this information can also be found in published papers cited below.) The aim is to see what we can learn from outliers to the general observation of narrow differences within species. For these illustrations, I went to the Public Projects section of BOLD www.barcodinglife.org, selected a project (Birds of North America Phase II, Kerr 2007; Birds of Argentina Phase I, Kerr et al 2009a; Birds of Scandinavia, Johnsen et al 2010; Birds of the eastern Palearctic, Kerr et al 2009b) and ran a “Nearest Neighbor” analysis with BOLD software, which calculates average and maximum intraspecific distance, as well as identity of and distance to the “nearest neighbor”. The results were copied and pasted into an Excel spreadsheet, sorted by increasing maximum intraspecific distance, and displayed in a graph as shown below (note different y-axis scale for eastern Palearctic). The total number of species are noted on the x-axis; yellow marks those with >1% maximum intraspecific distance. The curves are roughly similar among the regions except that the proportion >1% differs.

In the next post, I look more carefully at the apparent outliers. What makes them different–biology or taxonomy?

Note added 23 March 2011: Kevin Kerr points out that “eastern Palearctic” refers to entire region east of Europe, and thus the eastern Palearctic survey referred to above includes sites spanning most of Russia, Kazakhstan, and Mongolia (not just the eastern half of Russia as highlighted in map).

Note added 24 March 2011: Map corrected to show collecting region for eastern Palearctic survey.

This entry was posted on Monday, March 21st, 2011 at 9:59 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.

Contact: mark.stoeckle@rockefeller.edu

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.