The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Avian genetic data legacy relatively modest, jumbled

Birds have been the subject of intense genetic study for over 20 years. How large is the legacy of avian genetic data? Researchers studying species-level differences in birds and other vertebrates have most often used cytochrome b (cyt b). Some have worried that analyzing COI in birds is redundant as there are already cyt b sequences for many avian species. Of course, even if there were a complete set of cyt b sequences for the approximately 10,000 species of world birds, it would still be beneficial to collect COI barcodes to enable wide comparisons across the diversity of life.  Here I look briefly at what is in GenBank for birds. To do so, I prepared a spreadsheet “avian name look-up.xls”, which recognizes 2,347 synonyms, alternate and mis-spellings, and extinct species, producing an output list of extant bird names harmonized to Clements.

There are more than 1 million sequences in GenBank, but over 900,000 are from the Jungle Fowl (ie chicken, Gallus gallus), and another 85,000 from Zebra finch (Taeniopyga guttata) and Wild turkey (Meleagris gallopavo).  That leaves about 67,000 sequences in total representing the rest of the approximately 10,000 species of world birds. According to Clements’ Birds of the World (including updates through 2006), there are 9,919 recognized species. The other world lists are very similar, and differ primarily in whether certain forms are recognized as species or subspecies and in assignment of generic names. I find it surprising there is not a single global taxonomic authority for bird species status, names, spelling, generic and family classification.  As a comparison, medicine would be in great difficulty if there were not a single standard nomenclature for pathogenic bacteria

62,571 of the remaining 66,969 sequences are in the “CoreNucleotide” database (the others are unnamed genetic loci, either Expressed Sequence Tag (EST) or Genome Survey Sequence (GSS) records, and these will not be considered further here).  Only 4,951 bird species are represented by any sequence (50% of world birds), and there are cytochrome b sequences for only 2,751 species (28% of world birds). Of species with cyt b sequences, 60% are represented by single sequences.

How does this compare to COI barcode data so far? As tracked on the All Birds Barcoding Initiative website, researchers have collected 8,353 COI barcode records from 1,730 species, including 2 or more sequences from approximately 80% of species analyzed to date.

Virtues of the DNA barcode data set include that sequences are linked to vouchered museum specimens and their associated collecting data, sequence records include trace files to confirm sequencing accuracy, and most important all sequences can be directly compared because they derive from a standardized region. GenBank cyt b files include sequences of varying length and position along the gene. An alignment of 1000 avian COI barcodes and 1000 avian cyt b sequences hints at the power of a standardized approach.

.

.

.

.

.

.

.

.

.

This entry was posted on Sunday, May 20th, 2007 at 8:59 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.

Contact: mark.stoeckle@rockefeller.edu

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.