The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Learning about lichens with DNA

June 1st, 2011

In 1867 Swiss botanist Simon Schwendener was the first to recognize that lichens were symbiotic associations of fungi and algae (or, as subsequently discovered, cyanobacteria) (for more info, try this EOL podcast on lichen–“a tropical rainforest in miniature”).  Today about 13,500 species are described (lichens are named for fungal component), representing 18% of the 74,000 known fungi. It is remarkable that so few fungi have been named, given that estimated diversity is 1.5 million (Hawksworth, 2001). This presumably reflects difficulty of morphologic diagnosis of often microscopic, unculturable organisms with diverse life forms and highlights a need for molecular methods. Several recent epidemics causing serious animal and plant mortality have turned out to be newly recongized fungi [including Batrochochytrium dendrobatidis (chytridiomycosis in amphibians), Geomyces destructans (White-nose Syndrome in bats), Cryophonectria parasitica (Chestnut blight), and Ophiostoma spp. (Dutch elm disease)], hinting at the hidden diversity and importance of fungi.

Back to lichens–in March 2011 New Phytologist, researchers from Royal Botanic Garden (RBG) at Edinburgh and Kew report on DNA barcoding of lichenized fungi using internal transcribed spacer (ITS) region. ITS has been widely used in fungal taxonomy and has been proposed as a standard barcode region for this group (the standard barcode for animals, COI, has so far been difficult to reliably amplify from the diversity of fungi either due to variability at primer binding sites or introns). ITS refers to 2 regions in the nuclear ribosomal RNA gene complex (5′ external transcribed sequence—18s rRNA—ITS1–5.8s rRNA—ITS2—28s rRNA—3′ external transcribed sequence), which is present in several thousand copies in each cell. Advantages of ITS as a barcode region include availability of broad-range primers that bind to conserved regions in 18s and 28s rRNA; presence of multiple copies per cell, facilitating recovery from small or degraded samples; and the legacy of ITS fungal sequences in GenBank. Disadvantages of ITS as barcode locus are that is a non-coding region, making it more difficult to align and compare sequences; multiple copies per cell of which may differ from one another; and presence of misidentified sequences in the legacy data.

Kelly and colleagues sampled 112 freshly collected and herbarium specimens from one genus (Usnea) including 16 of the 19 species occurring in the British Isles and 248 specimens from native woodland habitats in Britain, comprising “94 species from 55, 28, and 8 genera, families and orders, respectively.” In the latter floristic set, 66.0% of species were represented by 3 or more samples and 77.7% by 2 or more samples. DNA was extracted using DNAeasy Plant Mini Kit and amplifications were performed with sets of standard primers that amplify the entire ITS segment (ITS1-5.8s rRNA-ITS); nested PCR was performed on “a small number of samples that failed to yield a single discrete product with standard PCR.” If these failed to generate a suitable product for sequencing, then a “thin slice of a single apothecium” was placed directly into the PCR mix and amplified as above or using primers for ITS2 only.  The full ITS region was obtained from 80.9% of combined 351 samples (75.9% of Usnea and 83.9% floristic). 22 (6.3%) of products showed heterogeneity on direct sequencing and required cloning to obtain suitable products for sequencing. The commonest regions for failure were no amplification [7.1% overall, largely with older (>3y) specimens; and amplification of non-target fungi (2.0% overall, only with field samples from floristic dataset)].

Is there a “barcode gap” (intraspecific<<interspecific distance) among fungal ITS sequences? In this study at least, usually yes. The RBG researchers defined clusters as nodes with ≥ 70 BP under BIONJ method or PP ≥ 0.95 under Bayesian inference. Under these criteria, species discrimination was 73.3% for Usnea dataset and 92.1% for floristic dataset. Simple BLAST analysis was also usually accurate–80% of Usnea species and 92.1 of floristic species were correctly assigned. This bodes well for cataloging the “dark matter” of fungal biodiversity using ITS DNA barcodes. So little is now known, it is exciting to contemplate what will be learned!

What you can learn from a tiny bit of DNA

May 18th, 2011

Infectious diseases may determine survival of individuals, entire species, and perhaps even large branches on the Tree of Life. Beginning in the late 1970’s, rapid declines in amphibian populations around the globe were noted and today about 40% of world’s 6,671 amphibian species are threatened with extinction (e.g. Stuart et al 2004). The major cause appears to global dissemination of a pathogenic chytrid fungus, Batrachochytrium dendrobatidis, first reported  in 1998 and formally described in 1999.

Although the global pattern is clear, many local population declines remain enigmatic due to absence of histologic data. In addition, the pattern of spread of the fungus and its timing in relation to mortality are not known. In April 2011 Proc Natl Acad Sci USA (open access), researchers from San Francisco State University and University of California, Berkeley, describe a non-invasive, DNA-based method for detecting B. dendrobatidis (Bd) in formalin-preserved specimens. Although exceptions are reported, DNA recovery after formalin treatment usually fails,  so these are remarkable results.

Cheng and colleagues analyzed formalin-preserved salamander and frog specimens collected in Mexico, Guatemala, and Costa Rica in areas where population declines had occurred. Specimens were rinsed in 70% ethanol, then, using a skin swab or dental brush, “stroked 30 times over the ventral surface…from neck to vent” [salamanders] or “on the ventral surface, including the inner thighs, abdomen, and between toes” [frogs]; the swab/brush was then stored in a microfuge tube at 4 oC until processing. DNA was extracted with a standard kit (Prepman Ultra or Qiagen DNeasy), and a 146-bp segment of Bd ITS-1 region was amplified, using 1/80th of recovered DNA for each amplification, run in triplicate using real-time PCR along with positive and negative controls.

Initial trials were done with 29 Bd-infected (as determined by histology) and 9 Bd-uninfected formalin-preserved Batrochoseps salamander specimens. Bd was detected in in 24 (84%) of infected specimens and none of uninfected  specimens. They suggest that their success with such unlikely specimens may reflect “(i) the very short length (146 bp) of the target sequence for Bd amplification, (ii) the presence of many copies per Bd cell of the ITS-1 region being targeted in our assay, and (iii) recovery of many cells  of Bd in our swabbing technique because Bd grows on the skin surface of the host.”

The researchers then applied this assay  to frogs and salamanders collected in Mexico (n=537), Costa Rica (n=74), and Guatemala (n=615) between 1964 and 2010. They found Bd as early as 1972, with a large increase (>50% prevalence) beginning in 1980, coincident with the observed population declines (see figure above). Combining their results with those of Lips et al 2006 indicated a steady southward movement of Bd from southern Mexico in 1972 to Panama in 2004. They interpret this remarkably slow expansion to mean that the pathogen is spread by the animals themselves, perhaps as they move between the tiny pools of water that collect in the crowns of bromeliads. The near coincident appearance of Bd around the world suggests additional modes of spread, possibly including human activities. I look forward to additional studies that will shed light on the global dissemination of Bd and point to interventions to limit this ongoing disaster for amphibians.

U Adelaide, CBOL to host IBOL 4 (abstracts by 15 may!)

May 8th, 2011

From the conference website:

The Consortium for the Barcode of Life and the University of Adelaide invite you to join us in Adelaide, Australia from 28 November – 3 December 2011 for the Fourth International Barcode of Life Conference. Barcoding has seen extraordinary growth since the Mexico City Conference in November 2009 so join participants from around the world for the biggest barcoding event ever!

The organizers have developed this website to provide potential participants, co-sponsors, and other stakeholders with information about the conference. The conference organizers are also eager to have your feedback as we plan the conference so please share your ideas through Connect, the DNA Barcoding network. You can do this by using the links found throughout this website.

Important Dates

  • Preliminary agenda available: 1 April
  • Online abstract submission system opens: 1 April
  • Sponsorship opportunities open: 1 April
  • Travel bursary applications open: 15 April
  • Online registration and hotel reservation site opens: 1 May
  • Deadline for submission of Abstracts: 15 May
  • Deadline travel bursary applications: 19 May
  • Agenda with speakers available: 1 August

Make a lasting contribution

April 15th, 2011

In December 2010 Mol Ecol, researchers from University of Alaska Museum compare mitochondrial and nuclear DNA differences among 9 pairs of bird populations, subspecies, or species, with a total of 162 individuals from 12 species analyzed. What did they find? Their gloomy conclusion is “our results suggest that using a genetic divergence estimate from part of an organism’s genome does not accurately represent organismal divergence and that commonly used measures are not strongly correlated with the speciation process.” I translate this as “DNA barcoding is not reliable.” Since we already have large surveys demonstrating effectiveness of DNA barcoding in more than two thousand bird species, their findings are surprising. Let’s go to the data.

For mtDNA, Humphries and Winker analyzed 1037 bp of ND2 (why not COI!) and employed Amplified Fragment Length Polymorphism PCR (AFLP) to assess differences in nuclear DNA. AFLP is a widely-used, indirect method of assessing nuclear genome differences that to my knowledge has never been compared to whole genome sequencing. Counting differences among aligned mtDNA sequences is straightforward. For AFLP, interpretation is more complex–in this study the banding patterns were converted to FST (fixation index) values using “AFLP-SURV 1.0 with the Bayesian method with uniform priors and 10,000 random permutations to test for significant levels of differentiation.”

The researchers chose 3 pairs of populations, subspecies, and species in 3 orders of birds that live in Alaska or Russia. The study design had two aims, first, do levels of genetic distances follow taxonomic categories, i.e. are differences among species > subspecies > populations? Contrary to their conclusion, my answer is yes, as there were no mtDNA differences between populations, and differences among subspecies and species ranged from 1.99-5.48%. Two of the three subspecies pairs are already recognized as different species by some authors, and the third pair is divergent enough (3.02%) to likely represent different species. So I conclude there are really just two categories–populations, which had no mtDNA differences, and species or candidate species, which showed a typical range of divergences. It is puzzling that the discussion did not include the possibility that taxonomy is imperfect rather than DNA data being misleading. The second question addressed by this study is: do nuclear DNA distances co-vary with mtDNA differences? The answer turned out to be no, which I find interesting but of uncertain significance. It may be that AFLP analysis of nuclear DNA is not a reliable indicator of divergence time or species status, at least in comparisons across lineages. Here more data is needed. To my mind, AFLP is a little bit like acupuncture–it may work, but we don’t understand why, so it’s hard to be confident in its application. Patterns of variation in the human genome revealed by whole genome sequencing have turned out to be much more complicated than expected, and I expect there will be a flood of data using whole genome sequencing to look at species boundaries.

For a lasting, publicly useful contribution to science, I hope that the many researchers who are analyzing mtDNA differences among animal species will include barcode region COI if not doing so already. The mitochondrial genome evolves in close but not exact parallel, and there is no particular reason to pick one coding region over another. By analyzing barcode region COI and depositing their sequences and associated collecting data in BOLD and GenBank, researchers can amplify the value of their work. For birds, the BOLD COI database helps identify remnants from bird-airplane collisions, leading to improved airline safety. For studies such as this, by analyzing COI the researchers can easily combine their results with existing records, adding power and potentially new insights to their analysis.

To give an idea, I went to BOLD, merged the “Birds of North America” and “Birds of Eastern Palearctic” projects, selected all records for the 12 species in this study, and generated an NJ tree (blue highlighting added to species branches) and a Distribution Map of where the specimens were collected. The highly divergent subspecies pairs are immediately evident. It would be of interest to see where the specimens from this study fit, and this would help build a highly detailed online map of genotype distribution, something that does not yet exist for any animal species. An exciting prospect!

Visualizing birds: 4. Diagnostic differences

March 26th, 2011

In 2004, the American Ornithologists’ Union (AOU) (Banks et al. 45th supplement) recognized most of the smaller-bodied forms of Canada Goose (Branta canadensis) as a separate species, Cackling Goose (B. hutchinsii). It can be difficult to distinguish these birds in the field including for banders, as there is overlap between some of the smaller forms of candensis and the larger forms of hutchinsii (see for example David Sibley’s account). Given morphologic approximation in some cases, one might also expect a range of genetic differences between the species, with some Canada geese being very similar to some Cackling Geese.

Using COI as a genetic flashlight, a surprising finding to me is that sequence differences between species are generally fixed. Individuals of a species do differ from each other in COI, but they usually differ in ways that do not change the distances between species. There are exceptions, particularly with species that hybridize regularly, and perhaps with very young species, but these are a small minority of birds analyzed so far. Stated another way, for most species there are no genetically intermediate forms. One important corollary is that early results with small numbers of individuals are likely to be indicative of results with more comprehensive sampling, which is what has been seen with All Birds Barcoding Initiative (ABBI) to date.

Where’s the data? Here are some illustrations of results so far. For figure at left, I downloaded one B. canadensis and one B. hutchinsii barcode from public records section of BOLD (, and printed the map showing where each specimen was collected (a very useful tool in BOLD). For this and subsequent sequence analyses, I used publicly-available MEGA software to highlight all sites at which the two sequences differed. (In MEGA you click “variable” to highlight and then “export highlighted sites to Excel.” I then used Excel’s “conditional formatting” color the cells according to letter). These two sequences differed at 13 out of 653 COI positions, 11 of which were 3rd codon position (codon position may turn out to be interesting later on).

Now what happens if you analyze a larger number of individuals? For next illustration, I used BOLD Taxonomy Browser, navigated to Chordata-Aves-Anseriformes-Branta, downloaded all public sequences, and used MEGA as above to highlight and export all sites that differed among the set (there are several other Branta species; these were deselected for this analysis).

With over 100 individuals for each taxa collected at widely dispersed sites (including some canadensis in Norway and Sweden), variation within both species was observed. Most of this was scattered differences found in one or a few individuals, although there did appear to be a number of canadensis individuals with a shared variant, which might be of interest for further study.

However, the intraspecific variation rarely involved diagnostic sites, with the result that all pairwise comparisons between canadensis and hutchinsii differed at 12 or 13 sites.

I close with another slightly more complex example. There are 5 Catharus thrushes in North America. These are relatively small, drab woodland birds with haunting, ethereal songs (you can listen to Hermit Thrush (C. guttatus) song on Cornell Laboratory of Ornithology site). One bird, Bicknell’s Thrush (C. bicknelli) was first recognized by AOU as a species distinct from Gray-cheeked Thrush (C. minimus) in 1998, and distinguishing individuals except by song is difficult even for experts with hand-held birds.

I downloaded all public Catharus barcodes using BOLD Taxonomy browser, and analyzed as described above. In comparing single sequences from the 5 species, these differed at 6 to 52 sites. With larger sample sizes (12-34/species), some intraspecific variation was observed, particularly in Hermit and Swainson’s Thrushes (see NJ tree at left of larger alignment), but diagnostic differences were mostly unchanged, even for very closely related minimus-bicknelli-fuscescens group.

[On a separate note–the nature of intraspecific variation might be of interest–a disproproportionate number are singletons (present in one individual in the set) and are codon first or second position substitutions (whereas most interspecific differences among closely-related birds are at codon third position). No doubt evolutionary biologists have investigated this previously, but perhaps not with such a large number and diversity of species with multiple individuals analyzed.]

These figures help illustrate the nucleotide sequence differences that distinguish species. In the language of evolutionary biology, these sequence differences are diagnostic characters. An NJ tree is a powerful shorthand way of representing these differences. In some situations, analyzing the actual diagnostic characters will be important. It might be a useful exercise for the scientific community to compile and display on the web diagnostic differences, at least for groups in which most or all the closely-related species have been surveyed.

Visualizing birds: Part 3. DNA barcode’s-eye view of taxonomic practice

March 23rd, 2011

Who decides what is a species and how do they do so? The primary source of information related to species is the peer-reviewed scientific literature, with standards of evidence presumably applied by appropriate experts before an article is accepted for publication. For most groups of animals, once a new species description or revision is published, then it is considered a valid species.

For birds, there is often an additional layer of review in the form of expert committees and handbook authors. Committees and their geographic domains include the American Ornithologists’ Union (AOU) (North and South America), British Ornithologists’ Union (Britain), International Ornithological Congress (IOC) (world), and International Taxonomic Information System (ITIS) (world), plus many nations maintain their own lists; handbooks include Howard and Moore Complete Checklist of the Birds of the World (most recent edition published in 2003), The Clements’ Checklist of Birds of the World (most recent edition 2007, with updates available online), and Handbook of Birds of the World (first volume covering ostriches to ducks published in 1992, 16th and final volume covering tanagers to blackbirds to be published this year).  Phew, it’s tiring just listing the lists! Although by my assessment the various lists are about 90% concordant at species level, they do differ, only partly because some have been updated more recently, so we can observe that experts sometimes disagree on species limits in birds, and conclude that taxonomy, like medical diagnosis, involves human judgment.

Here I focus on AOU Check-list of North American Birds, picking out just those species that have been revised since the 6th edition (1983). The current Check-list is the 7th edition (1998) plus updates which are published annually since 2003.  Over this time by my count there have been 274 changes in species definitions; this includes 6 species lumped into 3, and 121 species split into 268 taxa (note: splitting changes both halves–one is new, and the “parent” taxa has been pared down). The 50:1 predominance of splits suggests a bias against lumping species, perhaps analogous to the bias in medical research against negative studies.

For the figure at left, I compiled all revised species for which COI barcodes were available for both sides of a split or lump, which worked out to 68 species by 2010 definitions (all the available sets were splits), picked two representatives for each, and generated an NJ tree. This represents about 1/4 of all revisions so presumably is representative. Species differences according to 2010 or 1983 definitions are highlighted in blue. Red asterisks mark 3 splits not distinguishable by COI barcode.

Viewed through the lens of DNA barcodes, 91% of revisions involved assigning different names to distinct clusters previously grouped under one name. None of revisions led to taxa with larger intraspecific distances.

Below is another way of looking at same data–a before and after graph of maximum intraspecific distances (similar to layout in yesterday’s figure, maximum 2010 distances appear below that for the 1983 “parent” taxa; the yellow line highlights one such set;  there are two 3-way split in the NJ tree, each of which is shown here as two separate splits.).

One of my reactions to these figures is that taxonomic revision looks pretty simple! On a more helpful note, I think we can observe that what taxonomists consider species based on traditional biological criteria (differences in morphology, song, range, and relative absence of interbreeding) are generally visible with a “COI flashlight” as distinct clusters. As noted in the first post, why this is so is an important unsolved question.

From the above I surmise that essentially all species with unusually large intraspecific distances will eventually be recognized as comprised of distinct species. (Of course there are exceptions, which are interesting.) This echoes an assessment by Zink in 2004 (Proc Biol Sci 2004 271: 561–564). He noted the widespread discordance of mitochondrial DNA divergences and species-level classifications, concluding “a massive reorganization of classifications is required so that the lowest ranks, be they species or subspecies, reflect evolutionary diversity.” Looking at revisionary progress over the past 30 years, I think we are moving very slowly toward that goal, raising the possibility of a more dedicated effort to speed species-level avian taxonomy.

In closing this post, I look at whether there is anything unusual about divergent taxa that might lead them to be overlooked. After all, the world’s ornithologists have expended a lot of effort to uncover hidden diversity. I see most divergent species as falling into one of two categories: 1) inconspicuous birds, usually small, drab, secretive, or nocturnal species and 2) birds with large breeding ranges, particularly those that extend across different countries or islands.

The first group are difficult for visually-oriented, diurnal humans to distinguish and with the second group it is difficult to assemble sufficient specimens collected at widely dispersed sites. Where’s  the evidence?

For small, drab, secretive birds, in figure at right I look at wrens, which based on results so far, have an exceptional degree of intra-specific diversity (highlighted in blue). In 2010 the Winter Wren Troglodytes troglodytes was split into 3 species by AOU, but there remains a lot of diversity in 5 of the 12 species with 2 or more records, including the newly named Eurasian Winter Wren  T. troglodytes.

As evidence for hidden diversity in species with large ranges, I illustrate the findings from Johnsen et al 2010 “DNA barcoding of Scandinavian birds reveals divergences in trans-Atlantic species” . In this study 78 Scandinavian species had ranges that extended to North America; of these 24 (19%) showed large trans-Atlantic divergences in COI. In the figure, separate NJ trees for N American, Scandinavian, and combined data sets are shown with intraspecific differences highlighted in blue; red asterisks mark species with large trans-Atlantic divergences; green asterisks mark species with large divergences within N America. A small version of figure is shown, for larger version, click on picture.

In the next post, I look at the effect of sample size on intra- and inter-specific distances.

Visualizing birds: Part 2. Distant clusters, unfinished taxonomy

March 22nd, 2011

In 1911, Rutherford proposed correctly that essentially all the mass of an atom is concentrated in a tiny “central charge” (what we now call the nucleus) and that the rest of an atom was essentially empty space, devoid of mass ( This comes to mind in looking at results so far with birds, which overwhelmingly show that mtDNA differences are partitioned into tight clusters, and conversely most of the nearby genetic “space” is empty. In the language of evolutionary science, living organisms are narrow discontinuities without intermediate forms.

In yesterday’s post I noted that a minority of avian species exhibit large intra-specific distances. One possibility is that these represent species with a wide and more or less continuous variation, like the distribution of height in humans, for example. A quick perusal of an NJ (neighbor-joining) tree shows this is not the case. Rather, as noted in all published surveys so far, species with large intraspecific distances are composed of distinct clusters. As an alternative to an NJ tree, here is another way of looking at this data. For the illustration at left I took all species in N American project (Kerr et al 2007) with maximum distances of 2% or more, sorted sequences into sets as indicated by the NJ tree, calculated the maximum distances within each component cluster, and graphed these so that maximum distances within component clusters appear below the respective point for the species. In this analysis, all species with large intraspecific distances were composed of 2 clusters with much lower variation. In all cases, large intraspecific values reflected comparisons across the branches of the tree. One way of looking at this is that mtDNA sequence clustering is same in species with high and low maximum distances. What differs is that species with large intraspecific distances include multiple clusters.

At right is another way of looking at this. Here I used all species in Argentinian dataset (Kerr et al 2009) with maximum intraspecific distances of 1% or greater. For each species, the graph shows ALL pairwise distances ranked in increasing order, and a yellow line connects lower and upper pairwise values for each species. If species exhibit a range of differences, then there should be a more or less continuous range of pairwise values. On the other hand, if species are composed of clusters, then there will be one set of small pairwise distances from comparisons within clusters, and a set of larger distances from comparisons between clusters. With one exception (the second species from the left) large intraspecific distances reflected the presence of distinct clusters included under a single umbrella species designation.

So where are we? Can we conclude that there is a minority of species that are genetically polytypic?  One way to answer this is to look at recent taxonomic revisions in birds, taking advantage of the extremely well-documented historical record in the form of updates to the American Ornithologists’ Union (AOU) Check-list. In the next post I will look at refinements to avian species taxonomy through the lens of COI barcodes.

High school students to explore wilds of New York City with DNA, win prizes

March 22nd, 2011

On March 8, 2011, scientists and science educators at Dolan DNA Learning Center, Cold Spring Harbor Laboratories, announced the “Urban Barcode Project.” From the website :

The Urban Barcode Project (UBP) is a science competition spanning the five boroughs of New York City made possible by funding from the Alfred P. Sloan Foundation. Just as a unique pattern of bars in a universal product code identifies each item for sale in a store, a DNA barcode is a DNA sequence that uniquely identifies each species of living thing. In the project, student research teams use DNA barcoding to explore biodiversity in NYC.

Projects can use DNA barcodes to examine any aspect of the NYC environment, such as:

  • Sampling biodiversity in a park, garden, office, or school.
  • Checking for invasive plant or animal species.
  • Monitoring animal movements or migrations.
  • Identitying exotic or endangered food products in markets.
  • Detecting food or product fraud.

On the website there is a neat 1 min video, a helpful informational brochure, FAQs, and details on $20,000 in prize money!

I am the Scientific Advisor on this project and I think this is a wonderful way for high school students to do science.  And one that is likely to inspire efforts elsewhere.

Visualizing birds so far

March 21st, 2011

In 2004 PLoS Biology, Hebert and colleagues (I am a co-author) observed that differences in COI barcodes among 260 species of North American birds were generally much larger than those within species, with the result that “distinguishing species was generally straightforward.” In addition, we noted 4 birds with large intraspecific divergences that likely represented overlooked species. Our study included only about 1/50 of world birds (out of approximately 10,000 named species) and modest sampling of differences within species (multiple individuals (average 2.4, range 2-10) for 130 species), so not surprisingly some scientists wondered about the generalizability of the findings in birds in particular and animals in general. In an accompanying commentary, Cicero and Moritz wrote “…a true test of the precision of mtDNA barcodes to assign individuals to species…would require that all members of a genus be examined, rather than a random sample of imprecisely-defined close relatives, and that taxa be included from more than one geographic region.” They concluded their essay:

“But to determine when and where this approach [i.e., DNA barcoding] is applicable, we now need to discover the boundary conditions. The real challenge lies with tropical taxa and those with limited dispersal and thus substantial phylogeographic structure. Such analyses need to be taxonomically broad and need to extend beyond the focal geographic region to ensure that potential sister taxa are evaluated and can be discriminated. There is also the need to examine groups with frequent (possibly cryptic) hybridization, recent radiations, and high rates of gene transfer from mtDNA to the nucleus.”

As of today, the BOLD taxonomy browser at Phylum Chordata, Class Aves ( indicates over 24,000 barcoded avian specimens representing over 3,800 avian species, nearing 40% of world avifauna. By my count there are over 30 publications on DNA barcoding in birds, including large surveys in North America, Scandinavia, Argentina, Brazil, and Korea.

In the next few posts, I try to look at what we have learned so far, with an emphasis on visual representation. The short answer to the technical question of barcoding effectiveness in birds is that the early observations are borne out, with a few interesting exceptions. My rough summary is that about 95% of bird species can be distinguished by DNA barcode, the remainder are sorted into pairs or small sets of closely-related species, and about 10% of named species show large divergences that likely represent unrecognized species. This last observation brings up an important point–taxonomy is undergoing constant revision, even in a group as well-studied as birds. For example, over the past 30 years, about 10% of the roughly 2000 bird species on the American Ornithologists’ Union Check-list have had species limits revised, and this process is not near closure. So when we compare to sequence data to taxonomic classification, we have to keep in mind the latter is a moving target.

Looked at more broadly, the central finding is that mtDNA sequence differences in birds partition into distinct clusters. Most mtDNA sequence clusters correspond to a single named species, and the ongoing process of taxonomic revision is tightening the one-to-one correspondence between clusters and species designations. In fact, a person with no knowledge of avian biology could closely approximate species numbers and limits simply by sorting COI barcodes into sequence clusters. Of course, the concordance of species limits and mtDNA sequence differences is not a new observation (see for example Avise et al 1987, 1999; Moore 1995), but it is now backed up by much more data. An important but unsolved question is why mtDNA partitions into narrow clusters in birds and other animals. One or more of the proposed mechanisms may turn out to be correct but none has been proven so far.

To begin visualization survey:

Pairwise sequence differences within most bird species are small, usually much less than 1%. (“Pairwise sequence differences” means comparing each individual to every other individual of the same species; for n individuals there are n(n-1)/2 comparisons.) In looking at this data, I think that the absolute scale is important. One of the great benefits of working with a standard barcode region is that we can compare results across diverse taxa. To get into particulars, genetic differences are roughly similar in mtDNA protein coding genes (e.g. COI, cytb), but divergences in the mitochondrial control region are an order of magnitude greater.

Here is a look at MAXIMUM intraspecific distances (K2P metric) among some of the larger geographically-based surveys published so far.  (Most of this information can also be found in published papers cited below.) The aim is to see what we can learn from outliers to the general observation of narrow differences within species. For these illustrations, I went to the Public Projects section of BOLD, selected a project (Birds of North America Phase II, Kerr 2007; Birds of Argentina Phase I, Kerr et al 2009a; Birds of Scandinavia, Johnsen et al 2010; Birds of the eastern Palearctic, Kerr et al 2009b) and ran a “Nearest Neighbor” analysis with BOLD software, which calculates average and maximum intraspecific distance, as well as identity of and distance to the “nearest neighbor”. The results were copied and pasted into an Excel spreadsheet, sorted by increasing maximum intraspecific distance, and displayed in a graph as shown below (note different y-axis scale for eastern Palearctic). The total number of species are noted on the x-axis; yellow marks those with >1% maximum intraspecific distance. The curves are roughly similar among the regions except that the proportion >1% differs.

In the next post, I look more carefully at the apparent outliers. What makes them different–biology or taxonomy?

Note added 23 March 2011: Kevin Kerr points out that “eastern Palearctic” refers to entire region east of Europe, and thus the eastern Palearctic survey referred to above includes sites spanning most of Russia, Kazakhstan, and Mongolia (not just the eastern half of Russia as highlighted in map).

Note added 24 March 2011: Map corrected to show collecting region for eastern Palearctic survey.

DNA barcoding maps unknowns in Iraq

March 19th, 2011

Rivers and streams are listening devices for watersheds. The best way to assess watershed health is to survey freshwater life downstream. In particular, benthic (bottom-dwelling) macroinvertebrates (visible without magnification) are widely-used indicators of freshwater quality (see for example US EPA page). A challenge for freshwater biomonitoring programs is to rapidly identify the multitude of benthic invertebrate species potentially present in a water sample, and to repeat that for hundreds or thousands of samples. Now imagine you need to assess freshwater quality where few taxonomists have ever ventured.

In January 2011 J N Am Benthol Soc, researchers from U.S. National Museum of Natural History, The American University of Iraq-Sulaimani, and University of Guelph report on DNA barcoding to facilitate biomonitoring in the headwaters of the Tigris River, Iraq. Geraci and colleagues focused on Trichoptera (caddisflies) a group widely used as water quality indicator species. Trichoptera are small, winged insects (approximately 12,000 named caddisfly species worldwide) related to moths and butterflies, with larval stages that develop in freshwater. Sometimes emulated by trout fishermen making lures, caddisfly larvae construct “mobile homes” by gluing together bits of stone, sticks, or other material, with architectural details that help distinguish species. At the time of this study, the world literature on Iraqi Trichoptera consisted of 3 published reports describing 6 species in 7 genera (some larvae were identified only to genus) based on specimens collected between 1919 and 1987.

As part of a larger “Key Biodiversity Areas” survey (for more info, see note below) conducted from 2007 to 2009 by Nature Iraq Organization, the researchers collected benthic macroinvertebrates at twenty sites in three watersheds of the Tigris River during May-June 2008 and January 2009. Four to six replicate samples were obtained at each site, samples were washed with 70% ethanol using a 0.5 mm mesh in the field and again in the laboratory, and caddisfly larvae were removed and stored in 70% ethanol. At two sites adults were collected and placed in separate vials with 70% ethanol. Larvae were sorted using keys for Nearctic and Palearctic Trichoptera, and adults were identified to genus following a key to European species.

Following morphologic sorting, DNA barcoding using standard primers (LepF1/LepR1) was performed on 144 larvae and 6 adults, focusing on individuals in family Hydropsychidae as these were collected in large numbers. Successful amplification was obtained on the first pass with 81.3% of specimens, which is a nice demonstration of robust nature of DNA itself and of amplification protocols, as storage conditions were not what is considered optimal (optimal storage for insects usually means dried immediately after collection, whereas these specimens were stored in 70% ethanol for 2-3 years at the time of analysis).

DNA barcodes of Iraqi specimens were compared to the existing Trichoptera barcode library which so far includes records for about 2500 named species (~19% of world fauna) and many undescribed species. Combining morphologic and DNA data, the researchers identified 16 species in 11 genera and 9 families, with only one of the putative species matching to a previously named organism. There is a lot of Trichoptera taxonomy and molecular phylogeny here, but I will skip to conclusion. This study demonstrates how DNA barcoding, applied to a “virtually unknown fauna”, can build on an existing barcode library to speed species recognition, establish a practical identification method for general use, and link new discoveries to known genera and families.  The authors conclude that “DNA barcoding of benthic macroinvertebrates will be crucial in developing countries that are trying to overcome a lack of knowledge of aquatic-insect taxonomy and trained taxonomists. DNA barcoding will help aquatic scientists in these countries generate the empirical data needed to implement sound bioassessment and monitoring protocols to protect and manage their water resources.”

More generally, I think we can stop looking back longingly at past taxonomic practices and, DNA tools in hand, start helping society and science discover just what is out there, with all the intellectual excitement that entails.

Note added 20 March 2011: Co-author Mohammed Al-Saffar writes: “Key Biodiversity Areas ( is an ongoing project conducted biannually by Nature Iraq, and we (Nature Iraq, Miami University, Smithsonian NMNH, Guelph University, as well as Clemson University) are in the process of working on the DNA barcoding of all the insects important for monitoring water quality in Iraq such as the Mayflies, Dragonflies, etc.”


About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.