The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

DNA barcoding re-tested in Madagascar butterflies

In addition to their regular classes, most US high school students take and re-take a multitude of national standardized tests (and tests to practice for the tests) starting with the PSAT in 10th grade, then various SAT subject tests, AP tests, and the SAT or ACT achievement test (sometimes taken two or more times). Fortunately for students and their parents, this process usually comes to an end once they actually apply to college or university. For DNA barcoding, after six years and 500,000 sequences from 50,000 species, it seems it is still in midst of exams! 

In Nov 2008 Mol Phylogenet Evol, in “a test of the DNA barcoding approach,” researchers from University of New Orleans, USA; University of Antioquia, Colombia; and Natural History Museum, London; analyze barcode region COI sequences in a “hyperdiverse” genus (includes about 70 species) of butterflies endemic to Madagascar. They collected 109 specimens of 6 Heteropsis species, including 2 “undescribed species” and 1 species from a related genus. To confuse biological databases, Heteropsis is also a genus of flowering plant in family Aracaea. 

As an aside, and I know this is a commonplace observation, there needs to be a way of mapping biodiversity that gets around having “described” and “undescribed” species. For one, many of the “undescribed” species that are the focus of biological study, including perhaps those in this paper, will never be formally described. As an analogy of an alternate approach, in astronomical science, a first step is creating detailed sky maps based on particular wavelengths of the electromagnetic spectrum. Such sky maps are “just data,” in this case recordings of radiation-emitting stellar objects. Then, based on study, astronomers label certain objects as quasars, for example. Of course, this “annotation” does not change the underlying data, and astronomers may later change the labels on some objects based on new information or new understanding.

Following the suggestions of others, I believe some sort of sequence-based map of species-level biodiversity is a necessary way forward. Like the sky map, sequences are “just data” (the “just data” also include collection location, date, voucher specimen, and photographs).  Taxonomists would then annotate the “data map” with taxonomic interpretations, assigning species names to particular clusters for example. Species-level taxonomic revisions or conflicting taxonomies are easily accomodated–this simply involves re-labeling a cluster in the former case, or adding alternate names in the latter case. This sequence data map approach explicitly recognizes that species names are hypotheses. 

The present system is the inverse of the above: a taxonomic map (ie species names) is “annotated” with sequences. Under this system, there is no easy way to register biological information about organisms unless they have been already formally described as a species. Without a name or description of diagnostic characters, how does the next researcher know if they are studying the same “undescribed” species unless they examine the original specimens (in this case, stored in Natural History Museum, London)? On the other hand, one could easily report biological findings (eg coloration, larval morphology, food plants) associated with a specimen and its barcode sequence. 

Going back to the astronomical analogy, barcode-region COI is the appropriate “wavelength” for the species-level map of animals. This map will not be perfect. Just as gravitational lensing distorts the positions of some stellar objects, and others are obscured by intergalactic dust, the COI wavelength map will mislead in some areas and be obscured in others, not enabling one to “see” the existence of certain species–e.g. corals with slow mtDNA sequence evolution.

Would this be “DNA taxonomy?” No. First, community standards would ensure that the sequence map is not the arbiter of species status. Just as there are morphologically cryptic species and others that are phenotypically diverse, what we recognize as distinct species might be “hidden” within a single sequence cluster, and on the other hand, some named species might comprise a set of more distantly related sequences. Thus there would not be a fixed numerical determinant (eg distance, characters) of what constitutes a species cluster. Second, the COI-wavelength map would not establish higher-level relationships. Of course the shape and distinctness (or lack thereof) of clusters will change as new sequence data becomes available, as well as the taxonomic annotation, but that is the nature of biological diversity–we just don’t know everything yet! 

Why bother? Taking a sequence-mapping approach, I believe one can accelerate exploration of biodiversity and harness efforts of those outside the taxonomic priesthood. For example, one can predict much of the next ten years of species- and genus-level revisions in avian taxonomy simply on the basis of currently available COI and other mitochondrial DNA data (sequence plus specimen data). I suggest to collate and disseminate the available data in a publicly accessible form. This might even help harness “citizen science” by encouraging submission of birds that died of natural causes, or feathers naturally shed or collected in banding/ringing operations (or barcodes of feathers for those with resources and access to sequencing facilities), along with date, gps coordinates, and digital photo. If so, then legions of devoted birders could help with creating the genetic map, as they are already doing with observational records (see eBird). Because collecting sequence (and specimen-associated data) that establish the map is separate from the taxonomic process of “naming” this would not devolve into taxonomic chaos, rather, like astronomy amateurs, citizens could contribute to the observational database on which the sequence map is built. One utility that is needed is an easy graphical interface that collates available mtDNA on birds for example and highlights areas where information is missing either taxonomically or geographically; this sort of display would likely be of interest both to scientists and scientifically-minded amateurs.

Finally, going further out on this limb, depending on community standards, there might be agreement to consider a sufficiently divergent cluster a new species, until proven otherwise by more biological data. So a specimen plus a sequence could potentially be a “described species.” I prefer keeping the everyday designation of “species” rather than for example molecular operational taxonomic units (MOTU) or (evolutionarily significant units (ESU), but that is a discussion for another time!

Back to the paper. Linares and colleagues found that all 6 Heteropsis spp (including the 2 “undescribed” species) were evolutionarily distinct (ie formed reciprocally monophyletic lineages in Maximum Likelihood and Bayesian analysis of barcode-region COI), and that the mtDNA phylogeny was corroborated by nuclear DNA sequences. Given the large distances among and small within species, a neighbor-joining tree would likely have shown the same species clusters (although not necessarily the same branching pattern; the COI sequences do not appear to be public on GenBank yet, so I could not try NJ analysis). Unsurprisingly, one species pair showed less than “10X distance” (ie interspecies distance less than 10 times the average intra-species distance). Most barcode studies that include multiple congeneric species have sister species pairs that fall below this threshold. The results were initially confounded by amplification of Wolbachia (an intracellular parasitic bacteria of insects) DNA, leading them to design alternate primers. Wolbachia is unevenly distributed in tissues and often concentrated in reproductive tract, so perhaps the use of abdominal segments for DNA extraction is part of the reason this was a problem.

This entry was posted on Thursday, February 5th, 2009 at 8:26 am and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.

Contact: mark.stoeckle@rockefeller.edu

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.