Naming names faster

Species are the units of biodiversity. Discontinuities in biological variation sort organisms into discrete groups that we recognize as species, and so gathering data on differences among organisms is the necessary first step in understanding the diversity of life. Here DNA has singular value–all organisms have DNA and some genetic loci are widely-shared, enabling direct comparisons across the  diversity of multicellular life. Barcoding targets widely-shared gene sequence(s) that nonetheless differ among most closely-related species (COI for animals and rbcL+matK for land plants), providing broadly applicable metrics for mapping the discontinuities that represent species. Large-scale DNA barcoding thus offers for the first time a macroscopic view of biodiversity.

This sounds straightforward enough, but naming species, like medical diagnosis, is a process requiring human judgment. A taxonomic expert generally focuses on one or a few species or potential species at a time, sifting through morphological, ecological, behavioral, and DNA data and making inferences about the evolutionary past. It generally takes years or decades between specimen collection and publication of a new species description, and my impression is that most specimens in museum collections including frozen tissues have never been scrutinized in detail sufficient to determine whether they represent new species. Given that a high-throughput laboratory can generate a hundred thousand barcodes in a year, there are opportunities for new workflows.

In May 2010 Frontiers Zool, researchers from Uppsala University, Sweden, and Technical University of Braunschweig, Germany, look at how we might incorporate the flood of DNA data, outlining an approach they (and others) call “integrative taxonomy”. As current practice in taxonomy already involves integrating different kinds of data (morphology, behavior, range, DNA), I take this term to mean an approach somewhere between one primarily based on morphology (“traditional taxonomy”) and one primarily based on DNA (“DNA taxonomy”), such as that for eubacteria and archaebacteria. Padial and colleagues review the recently revitalized scientific discussion about species delimitation involving population biology and phylogenetics, noting “what matters for the study of speciation matters for taxonomy as well.” They call for a flexible approach including the possibility of “recognition of a species on the basis of a single set of characters”, which could be DNA barcodes.  Near the end, they address the big challenge, which is that DNA studies, particularly DNA barcoding, “are revealing units that might represent potential new species at a faster pace than results can be followed up for taxonomists.” Padial and co-authors review various protocols used for naming “candidate species” and conclude “standardization of such schemes across taxonomic groups of eukaryotes would be clear progress for data retrieval systems.”  As described in more detail here previously, a starting point for discussion of the preferred format for standardizing provisional names was recently proposed (Schindel and Miller, System Naturae 250, Chapter 10), based on the scheme currently used by CHAH (Council of Heads of Australian Hebaria). This system of “taxon labels” (as distinguished from “taxon names”) meets the criteria of uniqueness, stability, and non-confusion with formal taxon names.

At present, our knowledge of biodiversity is built around a catalog of taxon names, annotated with DNA data if available. I imagine the future catalog as being a DNA (barcode) map, annotated with taxon names if available. Some parts of the map, such as for birds, will be heavily annotated, and others, such as for nematodes, will have few formal names, and instead will have taxon labels generated by automated clustering algorithms. In some cases, the DNA data will be derived from individual specimens, backed up by museum vouchers, and in other cases it will be generated from environmental sampling. Only then will we begin to see how much biodiversity is unexplored.

Arthur L. Singer, Jr. “Easy to Forget, and So Hard to Remember”

Arthur L. Singer, Jr., allows us the honor to post his 90-page “East to Forget, and So Hard to Remember,” covering his career at MIT, the Carnegie and Sloan Foundations elsewhere.The memoir embraces subjects ranging from race relations to nuclear war to the origins of public television. Art directly and creatively helped hundreds of people during his career. In December 2005, Art was instrumental in linking Jesse to the MacArthur Foundation to help launch the Encyclopedia of Life.

New scientific newsstand for marine barcoders

Identifying marine life is a major challenge. On land, nearly all animals visible without a microscope are in one of two phyla: Chordata or Arthropoda, the latter most often represented by insects.  In contrast, many ancient lineages are present in the oceans. Abundant marine phyla with well-known representatives include Mollusca (molluscs), Porifera (sponges), Cnidaria (corals, jellyfish), Ctenophora (comb jellies), Echinodermata (sea urchins, others), as well as Chordata (e.g. fish) and Arthropoda (e.g. crabs). Many marine species have strange immature forms (see sea urchin larva above), which may puzzle specialists and others. Even marine vertebrates can be challenging. Using mitochondrial DNA, researchers recently discovered that what were thought to be three families of deep-sea fishes were in fact larval, male, and female forms of a single family of fish (Johnson Biol Lett 2009). Observation of marine life is difficult except in a few near shore areas. It is easier for a school child with a pair of binoculars to survey the moon than for a team of oceanographers with expensive equipment to study the deep ocean.

As with the enigmatic fish species described above, routine application of DNA-based identification will advance oceanographic science, and I imagine will have an even more transformative impact than in terrestrial research. To help establish the DNA reference library, we have the Marine Barcode of Life Initiative (MarBOL), a joint effort of Census of Marine Life (CoML) and Consortium for the Barcode of Life (CBOL), which aims to “enhance our capacity to identify marine life” through DNA barcoding. I note that PLoS ONE recently set up “The MarBOL Collection” of papers devoted to marine barcoding and look forward to seeing how this scientific “newsstand” develops. In June, PLoS ONE received an impact rating of 4.351, placing it in the top 25% percentile of biology journals, making it a prominent place for highlighting and disseminating scientific developments.

Commercial opportunities

The most successful technologies generate money. In turn, a commercial market helps drive improvements in cost and speed, enabling wider applications and new scientific knowledge. The rapid completion of the Human Genome Project (HGP) can be seen as a direct result of Applied Biosystems ABI 3700 DNA analyzer, the first fully automated capillary sequencer, introduced in 1998. In turn, the large market for high-throughput sequencing that resulted from HGP funding helped drive multiple rounds of improvement in cost and speed.

This leads me to thoughts about DNA barcoding.  The first exploratory meetings were held in 2003 at Banbury Center, Cold Spring Harbor Laboratory. Seven years later DNA barcoding is established as an accurate method for species identification with diverse scientific applications. BOLD, the publicly-available library of DNA barcodes, contains over 800,000 records from over 70,000 species. A new international effort, iBOL, is underway to establish DNA barcode libraries for 5 million specimens from 500,000 species by 2015. Like the government-maintained network of GPS satellites, publicly-funded DNA barcode libraries appear to offer enormous commercial opportunity, with potential benefits to society and science.

Where is barcoding on this path? So far, I find only a handful of companies and/or products that provide DNA-based species identification  (for example, Therion, SteriSense, FishDNAID, Applied Food Technologies, Ecogenics).  Of the few that exist, most are aimed at fish identification and do not take advantage of large scope and transparent sourcing of DNA barcode libraries. For example, Agilent Technologies recently introduced a “Fish identification system” based on “experimentally-derived [PCR-RFLP] patterns from more than 50 species.” This is wonderful but the scope is too small and the underlying library is unknown. Agilent is participating with the National Center for Food Safety and Technology,  a US government-industry collaboration, so perhaps that will lead to more robust applications. I note that DNA barcode detection of food fraud (not just fish) was front-page news in Washington Post in March 2010 and the potential educational market is also large. I look forward to more entrepreneurs, whether at established companies or start-ups!

New site design for PHE

Our new cool-looking PHE website is up and running, thanks especially to Jason Yung and Mark Stoeckle. A brand new publications database ties everything together, thanks to diligence of Smriti Rao and Iddo Wernick.

Recognizing invasive insects threatening forests

Gypsy_moth_spread_1900-2007In the late 1860’s, a French entomologist, Étienne Léopold Trouvelot, living in Medford, Massachusetts, imported gypsy moths (Lymantria dispar) which he hoped to hybridize with domesticated Asian silkworms (Bombyx mori), thereby creating a new silk-producing strain with improved disease resistance (for history, see US Forest Service page). The experiment failed (not surprising given moths are from different families), the colony escaped from Trouvelot’s backyard, and gypsy moths became established as a major pest of hardwoods in the northeastern US (animated range data from US Forest Service at right). Subsequent introductions of numerous forest pests and pathogens into the US, largely through importation of infested wood products, have had large impacts on timber industry and local ecosystems alike, and have led to near extinction of American chestnut, and large-scale mortality in elm, hemlock, and oak, and other tree species.

SN_damage22The first step in controlling invasive species is detection. In J Entomolog 2010 7:60 researchers from USDA Forest Service report on DNA barcode identification of Eurasian woodwasp Sirex noctilio. S. noctilio has been established and spreading in northeastern US and Canada since at least 2004, and “will likely become a major pest of pines and possibly other conifers in North America.” The wasp attacks living pines, laying eggs along with an inoculum of  “phytotoxic mucus” and an exotic [non-native] wood decay fungus (Amylosterum areolatum). The wasp larvae “feed on pine wood decayed by the fungus and on the fungus itself”, weakening or killing the tree.

Wilson  and Schiff analyzed COI barcodes of 207 larvae or adults representing 27 woodwasp species or subspecies (including 6 Sirex spp.) following a fairly standard protocol (i.e., 1 leg, DNAeasy kit, HCO 2198/LCO 1490 primers.) [As an aside, these primers (Folmer 1994) remain surprisingly widely used for barcoding invertebrates, despite development of several other effective broad-range primers for COI barcode region (e.g., see CCDB collected protocols), which perhaps reflects absence of a large-scale direct comparison.] All species gave distinct barcodes, minimum interspecific distance was 7.6 (maximum  26.2%) , and, remarkably, there was no variation within any named taxa (average 9 individuals per species/subspecies, range 4-23). However they observed 2.3%-2.8% differences between subspecies of Xeris spectrum and Sirex juvencus, suggesting that “taxonomic revisions are probably in order to separate these subspecies in each case into separate subspecies.”

In addition to application in forest surveys, Wilson and Schiff note the need for a “standardized diagnostic method of identifying insect larval stages at ports of entry within imported wood producs…and in wood used as crates and dunnage for imported goods.” For example, “recent analyses of Sirex larvae intercepted from 1985-2000 by USDA-APHIS personnel at US ports of entry…indicate that only 7 (6.8%) of 103 specimens could be identified to species (Hoebeke et al 2005).” The authors conclude “DNA barcode methods can be used to identify larval states of woodwasps…as easily as free-flying adults,” which “should help prevent future introductions of S. noctilio and other exotic woodwasps.”

PopSci Profile

The July 2010 issue of Popular Science (pp. 54-55) features a profile of Jesse in its Environmental Visionaries series.