The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications


FDA certifies barcoding for seafood ID, opening commercial, educational opportunities

January 5th, 2012

Seafood is often mislabeled–in the past year, barcode surveys in Canada (Hanner et al 2011), Ireland (Miller et al 2011), Spain (ICIJ 2011), United Kingdom, and United States (Boston Globe, October 2011; Consumer Reports, December 2011) documented 10-50 percent mislabeling of fish items, always as more expensive or more desirable species, including those sold at prominent restaurants and stores. As highlighted in 2011 Oceana report, mislabeled seafood is commercial fraud, exposes consumers to health risks, and hides unsustainable fishing practices. However, identifying seafood is challenging–hundreds of species from around the world enter the marketplace, often as filets or steaks lacking distinguishing external features. In October 2011, US Food and Drug Administration (FDA) formally adopted DNA barcoding for seafood identification, the culmination of validation studies conducted by FDA beginning in 2008. The summary states:

“Substituted and/or mislabeled seafood is considered to be misbranded by the FDA and is a violation of Federal law.”

FDA adoption of DNA barcoding as an identification standard opens commercial opportunities. On January 2, Vancouver Sun reported that Tradex Foods, a Canadian frozen seafood importer, is using DNA barcoding to help eliminate what their spokesperson described as “rampant” mislabeling in the industry. Tradex collects 10 to 30 samples a month at overseas processing facilities, flies these to US for testing by ACTG, Inc. in Illinois at $70 a sample with turnaround time of 2-3 days, while the frozen fish itself is in transit by ship. The article reports that Canadian Food Inspection Agency (CFIA), the federal agency responsible for verifying quality and labeling of seafood imports, expects to begin employing DNA barcoding in 2012. SGS Group, a global testing company, including food product safety, recently posted a press release on The Open Press highlighting the need for seafood testing and the FDA adoption of DNA barcoding, as well as the company’s capability. Applied Food Technologies, in Florida, is a molecular diagnostics company for food industry, specializing in seafood identification, with turnaround time of 5-10 days according to their website.

Routine testing of food and biologicals such as herbal medicines seems likely to be one of the largest and most visible applications of DNA barcoding. I expect that other companies are in or will enter this market.

I look forward to incorporation of DNA barcoding in forensic certification programs, with applications in marketplace fraud as with food, illegal trade of wildlife, and murder investigation, by dating time of death by identifying insect larvae in corpses. Already effective, DNA barcoding including for forensic applications is poised to expand, thanks to strong trends improving speed and sensitivity in DNA recovery and decreasing costs of DNA analysis.

Update 9 jan 2012: My comments above on food authentication echoed in  “Will DNA barcoding revolutionise the food industry” article in yesterday’s Metro, distributed free to commuters in 50 UK cities , circulation 1.3 million.

Rapid data release for barcode data

December 20th, 2011

At the Fourth International Barcode of Life Conference in Adelaide, there was general recognition that the initiative’s remarkable success in generating barcodes is outstripping the relatively slow process of releasing experimental data after academic publication. Of approximately 1.4 million barcode records in BOLD at the time, fewer than 300 thousand sequences with species names were publicly available, and the proportion of barcode sequences that are published and have species names appears to be be falling further behind over time, as the rate of barcoding specimens increases. Given that privately held sequence data does not contribute to the overarching goal of creating a community resource for society and science, this stimulated many discussions on how to proceed. Many cited the rapid data release policies hammered out by the genomics community as a precedent.

At a 1996 summit in Bermuda, leaders of the scientific community agreed on a groundbreaking set of principles requiring that all DNA sequence data be released in publicly-accessible databases within twenty-four hours after generation. These “Bermuda Principles” (also known as the “Bermuda Accord”) contravened the typical practice in the sciences of making experimental data available only after publication. These principles represent a significant achievement…and have established rapid pre-publication data release as the norm in genomics and other fields.

Human genomics is not the same as biodiversity genomics–barcodes are derived from a multitude of often irreplaceable specimens for one, but the general principle of rapid data-release contributing to a community resource, for what is after all, an enterprise funded by society, surely holds.

What follows is one strategy for academic publication AND rapid data release which we hope will encourage others. With the assistance of ZooKeys (open access), GenBank, and BOLD, on December 8, 2011, a brief “Project Description” of a barcode dataset (see below), completed just two weeks earlier, was published coincident with release of sequence data in GenBank and BOLD, with a full descriptive paper summarizing the dataset to follow in the next six months. A set of explicit statements regarding use of early release data (see below) is included in the Project Description.

Title: Project Description: DNA Barcodes of Bird Species in the National Museum of Natural History, Smithsonian Institution, USA

Abstract: The Division of Birds, National Museum of Natural History, Smithsonian Institution in Washington, DC, has obtained and released DNA barcodes for 2,808 frozen tissue samples. Of the 1,403 species represented by these samples, 1,147 species have not been barcoded previously. This data release increases the number of bird species with standard barcodes by 91%. These records meet the data standard of the Consortium for the Barcode of Life and they have the reserved keyword BARCODE in GenBank. The data are now available on GenBank and the Barcode of Life Data Systems.

Excerpt regarding use of early release data:

The authors invite the research community to examine and analyze the data in their current form with the following understandings:

• As with all data released on GenBank, the National Center for Biotechnology Information places no restriction on their use or distribution.

• The authors intend to publish a descriptive paper summarizing the dataset and its implications for bird barcoding and any taxonomic issues arising from the data. Publication of this data release paper is anticipated by 1 June 2012. In accordance with the Fort Lauderdale Principles (Welcome Trust 2011), the authors ask the community to respect our intent to publish on these topics and not to submit manuscripts for this purpose based on this dataset.

• Use of this dataset for purposes other than those described above are welcome and encouraged, contingent on proper citation of this publication.

• The authors invite members of the community to examine the data and test their accuracy relative to other datasets. We welcome your comments, suggestions and corrections. BOLD 3.0 includes the capability to submit annotations to data submitters and we encourage readers to use this new system to submit observations on this dataset.

• The species determinations are not yet final. Some of the species identification may be change by the time of publication of the data release paper (anticipated by 1 June 2012).

I hope to soon see more public barcode data, following this and other pathways!

DNA Barcoding Prizes for first Nature, Science publications

December 19th, 2011

First proposed in 2003, the DNA barcoding initiative has generated more than 1000 scientific publications, but none so far in the de facto top science journals, Nature and Science. The barcode library contains over 1 million records from over 100 thousand species, suggesting opportunities for new insights into large-scale patterns and processes in biodiversity. Yet so far relatively few papers have attempted synthetic exploration of this unprecedented genetic resource beyond species identification. To encourage high-profile discovery, Program for the Human Environment is offering $5000 prizes for the first DNA barcoding papers in Nature and Science, as announced earlier this month at the close of Fourth International Barcode of Life Conference, University of Adelaide, Australia. To qualify, the paper must embrace DNA barcoding either in the title or abstract, and cite CBOL and iBOL in the acknowledgments.

Time away

September 18th, 2011

My apologies for absence of recent posts–I am working to get ready for Adelaide Barcode IV conference and will be away from the Barcode Blog for a while.

Tea time for DNA

July 22nd, 2011

What’s in your favorite tea? The dried and sometimes cooked or fermented bits of plants used to make teas are not easily identified to species by appearance. Over the past year I have been involved in a project testing whether DNA barcoding can identify the ingredients in commercial tea products, working with three New York City high school students and plant experts from Tufts University (Selena Ahmed) and The New York Botanical Garden (Damon Little). Student investigators Katie Gamble, Rohan Kirpekar, and Grace Young collected 146 tea products from 25 NYC locations, representing 33 manufacturers, 17 countries, and 82 plant common names–73 products were regular teas (prepared from Camellia sinensis, the tea plant) and 73 were herbal products prepared from other plant species.

Our findings are published in 21 July 2011 Scientific Reports, (Nature Publishing Group’s open access journal). About 1/3 of herbal teas generated DNA identifications indicating unlisted ingredients including weeds like annual bluegrass (Poa annua) and white goosefoot (Chenopodium album) and herbal plants like chamomile (Matricaria recutita). Matching DNA ingredients to listed ingredients was sometimes challenging–we observe that “broad-scale adoption of plant DNA barcoding may require algorithms that place search results in context of standard plant names and character-based keys for distinguishing closely-related species.”

We are pleased that our investigation has attracted press coverage including New York Times print and online editions and internationally in 65 news sites and 14 countries, including India and China, world centers of tea production. Most of the DNA work was done at The New York Botanical Garden in senior author Damon Little’s laboratory. For a small subset of samples (10) we did DNA isolation and amplification in my dining room with recycled lab equipment purchased on the internet for about $5000. Samples were mailed to a commercial facility (Macrogen) for DNA sequencing, with results available by email the next day. It cost about $15 a sample including sequencing (unidirectional). More info and pictures on our TeaBOL website!

What’s next? I am excited about enabling wider use of DNA barcoding by high school students, including Cold Spring Harbor’s Urban Barcode Project competition (I am an advisor), open to teams from all New York City schools, with a focus on public institutions. I expect that soon manufacturers of teas and herbal products (and regulators) will incorporate DNA barcode testing into their quality control practices. One of the important tasks for scientists is building up the reference databases. At the time of the study, BOLD (Barcode of Life Database) and GenBank lacked rbcL or matK records for about 1/3 of plant species listed on product labels in our study. More on herbal plant identification: (Lou et al 2010. An integrated web medicinal materials DNA database. BMC Genomics 11, 402; Smillie and Kahn 2010. A comprehensive approach to identifying and authenticating botanical products. Clin Pharm Therapeutics 87, 175).


Taxonomy disentanglement

July 13th, 2011

Veneridae, commonly known as venus clams, are the largest family of heterodont bivalves (clams and cockles), with about 500 named species, all marine, distributed in mostly shallow water areas around the globe.  In June 2011 Plos ONE, researchers from Fisheries College, Ocean University of China apply DNA barcodes to perform what they call “taxonomy distentanglement” on 315 venerid specimens representing about 60 species collected along the coast of mainland China. This qualifies as the largest analysis of DNA barcodes for marine bivalves to date. Chen and colleagues note “species boundaries of these clams are difficult or even impossible to define accurately based solely on morphologic features,” so there is a potentially a big role for DNA characters.

The clams were collected over a 6 year period from 2004-2010, stored in 95% ethanol (marine specimens are traditionally stored in formalin, which is an effective preservative but makes it difficult to recover DNA), and deposited as voucher specimens in Fisheries College. DNA was extracted from adductor muscle (some bivalves inherit mitochondrial DNA from both male and female parents, but the male type is restricted to gonadal tissue). Given that not many bivalves have been barcoded, it is of interest to learn what primer pairs were effective (BOLD taxonomy browser lists barcodes for 966 of the approximately 10,000 bivalve species).  Starting with Folmer primers, two additional published sets and 4 sets developed for this study were used if needed, with recovery of COI from all specimens.

I note that genetic differences within Family Veneridae are remarkably large–average pairwise COI K2P distance within the family (not counting conspecific and congeneric comparisons) is around 35% and maximum is over 50%. For comparison, in birds, average and maximum distances within families are about half as large, and even within birds as a whole (Class Aves, i.e., two hierarchical levels above family), average and maximum distances are only 20% and 33%, respectively (I generated bird stats by merging public projects in BOLD and running “Distance Summary.”) I wonder if what we call Families in vertebrates and invertebrates reflect different levels on the evolutionary tree.

Back to the paper. Chen and colleagues used neighbor-joining, maximum-likelihood, and MOTU analysis to examine their data with and without 310 additional venerid sequences downloaded from BOLD/GenBank. All individuals that could be morphologically identified to species possessed distinct (reciprocally monophyletic) COI sequences, with the exception of one species pair. 11/23 sequences from specimens that could not be identified morphologically formed five monophyletic clusters, likely representing species new to science or unreported in China. The remaining 12 sequences from morphologically-puzzling specimens clustered within named species, suggesting these represent morphologically variant specimens. Sorting puzzling specimens into genetic clusters led the authors to recognize previously overlooked diagnostic morphologic characters.  A number of existing records in BOLD/GenBank prior to this study clustered with different species, suggesting these specimens were misidentified by submitters or reflected outdated taxonomy.

Chen and colleagues conclude that DNA barcoding has a third purpose in addition to species identification (assigning unknown specimens to known species) and species discovery (flagging divergent clusters), namely what they call “taxonomy disentanglement,” which other authors have called iterative or integrative use of barcoding (for example Smith et al,  Extreme diversity of tropical parasitoid wasps exposed by iterative integration of natural history, DNA barcoding, morphology, and collections, 2008 PNAS). I like the term “disentanglement”–it brings to mind the many confusions in existing classifications and specimen labels, many of which can be unknotted with DNA barcodes.

News Flashes

June 14th, 2011

You have 1 more day! Abstract deadline is 12 midnight tomorrow, June 15,  for the Fourth International Barcode of Life Conference, Adelaide, Australia, 28 November-3 December 2011. Online submission form here.

Young scientists to help document what lives on Earth! Coastal Marine BioLabs (CMB), a private, research-based scientific educational organization in Ventura, California was awarded a 3-year NSF grant to train high school teachers and students in DNA barcoding, with the goal of contributing reference sequences to Barcode of Life Database. CMB students and their teachers will be part of the International Barcode of Life project, which aims to expand BOLD (currently about 1.2M barcodes from 130K species) to 5M records from 500K species, the largest biodiversity intiative ever. For more on how students are helping build the genetic database of global species diversity, see Sacramento Bee news story and CMB web page.



Barcode of Life Connect tops 1000 members! If you haven’t already, I encourage you to visit and join the Barcode of Life Connect site, a “network to allow DNA barcoding professionals to discuss issues, share profiles, form special interest groups, and more.”The more includes webinars and links to upcoming relevant conferences. The core of the site is the chance to connect with like-minded barcoding professionals, either directly through their profiles or through discussion groups–so far there are 40 groups ranging from “Medicinal Plants” to “Madagascar” and “Portugese-Speaking Barcoders.”


To get an idea of how barcoding has taken hold around the world, particularly with young scientists, try perusing recent pictures posted by Connect members–I take the liberty of re-posting some images of the investigators and their specimens-enjoy!



Learning about lichens with DNA

June 1st, 2011

In 1867 Swiss botanist Simon Schwendener was the first to recognize that lichens were symbiotic associations of fungi and algae (or, as subsequently discovered, cyanobacteria) (for more info, try this EOL podcast on lichen–“a tropical rainforest in miniature”).  Today about 13,500 species are described (lichens are named for fungal component), representing 18% of the 74,000 known fungi. It is remarkable that so few fungi have been named, given that estimated diversity is 1.5 million (Hawksworth, 2001). This presumably reflects difficulty of morphologic diagnosis of often microscopic, unculturable organisms with diverse life forms and highlights a need for molecular methods. Several recent epidemics causing serious animal and plant mortality have turned out to be newly recongized fungi [including Batrochochytrium dendrobatidis (chytridiomycosis in amphibians), Geomyces destructans (White-nose Syndrome in bats), Cryophonectria parasitica (Chestnut blight), and Ophiostoma spp. (Dutch elm disease)], hinting at the hidden diversity and importance of fungi.

Back to lichens–in March 2011 New Phytologist, researchers from Royal Botanic Garden (RBG) at Edinburgh and Kew report on DNA barcoding of lichenized fungi using internal transcribed spacer (ITS) region. ITS has been widely used in fungal taxonomy and has been proposed as a standard barcode region for this group (the standard barcode for animals, COI, has so far been difficult to reliably amplify from the diversity of fungi either due to variability at primer binding sites or introns). ITS refers to 2 regions in the nuclear ribosomal RNA gene complex (5′ external transcribed sequence—18s rRNA—ITS1–5.8s rRNA—ITS2—28s rRNA—3′ external transcribed sequence), which is present in several thousand copies in each cell. Advantages of ITS as a barcode region include availability of broad-range primers that bind to conserved regions in 18s and 28s rRNA; presence of multiple copies per cell, facilitating recovery from small or degraded samples; and the legacy of ITS fungal sequences in GenBank. Disadvantages of ITS as barcode locus are that is a non-coding region, making it more difficult to align and compare sequences; multiple copies per cell of which may differ from one another; and presence of misidentified sequences in the legacy data.

Kelly and colleagues sampled 112 freshly collected and herbarium specimens from one genus (Usnea) including 16 of the 19 species occurring in the British Isles and 248 specimens from native woodland habitats in Britain, comprising “94 species from 55, 28, and 8 genera, families and orders, respectively.” In the latter floristic set, 66.0% of species were represented by 3 or more samples and 77.7% by 2 or more samples. DNA was extracted using DNAeasy Plant Mini Kit and amplifications were performed with sets of standard primers that amplify the entire ITS segment (ITS1-5.8s rRNA-ITS); nested PCR was performed on “a small number of samples that failed to yield a single discrete product with standard PCR.” If these failed to generate a suitable product for sequencing, then a “thin slice of a single apothecium” was placed directly into the PCR mix and amplified as above or using primers for ITS2 only.  The full ITS region was obtained from 80.9% of combined 351 samples (75.9% of Usnea and 83.9% floristic). 22 (6.3%) of products showed heterogeneity on direct sequencing and required cloning to obtain suitable products for sequencing. The commonest regions for failure were no amplification [7.1% overall, largely with older (>3y) specimens; and amplification of non-target fungi (2.0% overall, only with field samples from floristic dataset)].

Is there a “barcode gap” (intraspecific<<interspecific distance) among fungal ITS sequences? In this study at least, usually yes. The RBG researchers defined clusters as nodes with ≥ 70 BP under BIONJ method or PP ≥ 0.95 under Bayesian inference. Under these criteria, species discrimination was 73.3% for Usnea dataset and 92.1% for floristic dataset. Simple BLAST analysis was also usually accurate–80% of Usnea species and 92.1 of floristic species were correctly assigned. This bodes well for cataloging the “dark matter” of fungal biodiversity using ITS DNA barcodes. So little is now known, it is exciting to contemplate what will be learned!

What you can learn from a tiny bit of DNA

May 18th, 2011

Infectious diseases may determine survival of individuals, entire species, and perhaps even large branches on the Tree of Life. Beginning in the late 1970’s, rapid declines in amphibian populations around the globe were noted and today about 40% of world’s 6,671 amphibian species are threatened with extinction (e.g. Stuart et al 2004). The major cause appears to global dissemination of a pathogenic chytrid fungus, Batrachochytrium dendrobatidis, first reported  in 1998 and formally described in 1999.

Although the global pattern is clear, many local population declines remain enigmatic due to absence of histologic data. In addition, the pattern of spread of the fungus and its timing in relation to mortality are not known. In April 2011 Proc Natl Acad Sci USA (open access), researchers from San Francisco State University and University of California, Berkeley, describe a non-invasive, DNA-based method for detecting B. dendrobatidis (Bd) in formalin-preserved specimens. Although exceptions are reported, DNA recovery after formalin treatment usually fails,  so these are remarkable results.

Cheng and colleagues analyzed formalin-preserved salamander and frog specimens collected in Mexico, Guatemala, and Costa Rica in areas where population declines had occurred. Specimens were rinsed in 70% ethanol, then, using a skin swab or dental brush, “stroked 30 times over the ventral surface…from neck to vent” [salamanders] or “on the ventral surface, including the inner thighs, abdomen, and between toes” [frogs]; the swab/brush was then stored in a microfuge tube at 4 oC until processing. DNA was extracted with a standard kit (Prepman Ultra or Qiagen DNeasy), and a 146-bp segment of Bd ITS-1 region was amplified, using 1/80th of recovered DNA for each amplification, run in triplicate using real-time PCR along with positive and negative controls.

Initial trials were done with 29 Bd-infected (as determined by histology) and 9 Bd-uninfected formalin-preserved Batrochoseps salamander specimens. Bd was detected in in 24 (84%) of infected specimens and none of uninfected  specimens. They suggest that their success with such unlikely specimens may reflect “(i) the very short length (146 bp) of the target sequence for Bd amplification, (ii) the presence of many copies per Bd cell of the ITS-1 region being targeted in our assay, and (iii) recovery of many cells  of Bd in our swabbing technique because Bd grows on the skin surface of the host.”

The researchers then applied this assay  to frogs and salamanders collected in Mexico (n=537), Costa Rica (n=74), and Guatemala (n=615) between 1964 and 2010. They found Bd as early as 1972, with a large increase (>50% prevalence) beginning in 1980, coincident with the observed population declines (see figure above). Combining their results with those of Lips et al 2006 indicated a steady southward movement of Bd from southern Mexico in 1972 to Panama in 2004. They interpret this remarkably slow expansion to mean that the pathogen is spread by the animals themselves, perhaps as they move between the tiny pools of water that collect in the crowns of bromeliads. The near coincident appearance of Bd around the world suggests additional modes of spread, possibly including human activities. I look forward to additional studies that will shed light on the global dissemination of Bd and point to interventions to limit this ongoing disaster for amphibians.

U Adelaide, CBOL to host IBOL 4 (abstracts by 15 may!)

May 8th, 2011

From the conference website:

The Consortium for the Barcode of Life and the University of Adelaide invite you to join us in Adelaide, Australia from 28 November – 3 December 2011 for the Fourth International Barcode of Life Conference. Barcoding has seen extraordinary growth since the Mexico City Conference in November 2009 so join participants from around the world for the biggest barcoding event ever!

The organizers have developed this website to provide potential participants, co-sponsors, and other stakeholders with information about the conference. The conference organizers are also eager to have your feedback as we plan the conference so please share your ideas through Connect, the DNA Barcoding network. You can do this by using the links found throughout this website.

Important Dates

  • Preliminary agenda available: 1 April
  • Online abstract submission system opens: 1 April
  • Sponsorship opportunities open: 1 April
  • Travel bursary applications open: 15 April
  • Online registration and hotel reservation site opens: 1 May
  • Deadline for submission of Abstracts: 15 May
  • Deadline travel bursary applications: 19 May
  • Agenda with speakers available: 1 August


About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.