Commercial opportunities

The most successful technologies generate money. In turn, a commercial market helps drive improvements in cost and speed, enabling wider applications and new scientific knowledge. The rapid completion of the Human Genome Project (HGP) can be seen as a direct result of Applied Biosystems ABI 3700 DNA analyzer, the first fully automated capillary sequencer, introduced in 1998. In turn, the large market for high-throughput sequencing that resulted from HGP funding helped drive multiple rounds of improvement in cost and speed.

This leads me to thoughts about DNA barcoding.  The first exploratory meetings were held in 2003 at Banbury Center, Cold Spring Harbor Laboratory. Seven years later DNA barcoding is established as an accurate method for species identification with diverse scientific applications. BOLD, the publicly-available library of DNA barcodes, contains over 800,000 records from over 70,000 species. A new international effort, iBOL, is underway to establish DNA barcode libraries for 5 million specimens from 500,000 species by 2015. Like the government-maintained network of GPS satellites, publicly-funded DNA barcode libraries appear to offer enormous commercial opportunity, with potential benefits to society and science.

Where is barcoding on this path? So far, I find only a handful of companies and/or products that provide DNA-based species identification  (for example, Therion, SteriSense, FishDNAID, Applied Food Technologies, Ecogenics).  Of the few that exist, most are aimed at fish identification and do not take advantage of large scope and transparent sourcing of DNA barcode libraries. For example, Agilent Technologies recently introduced a “Fish identification system” based on “experimentally-derived [PCR-RFLP] patterns from more than 50 species.” This is wonderful but the scope is too small and the underlying library is unknown. Agilent is participating with the National Center for Food Safety and Technology,  a US government-industry collaboration, so perhaps that will lead to more robust applications. I note that DNA barcode detection of food fraud (not just fish) was front-page news in Washington Post in March 2010 and the potential educational market is also large. I look forward to more entrepreneurs, whether at established companies or start-ups!

New site design for PHE

Our new cool-looking PHE website is up and running, thanks especially to Jason Yung and Mark Stoeckle. A brand new publications database ties everything together, thanks to diligence of Smriti Rao and Iddo Wernick.

Recognizing invasive insects threatening forests

Gypsy_moth_spread_1900-2007In the late 1860’s, a French entomologist, Étienne Léopold Trouvelot, living in Medford, Massachusetts, imported gypsy moths (Lymantria dispar) which he hoped to hybridize with domesticated Asian silkworms (Bombyx mori), thereby creating a new silk-producing strain with improved disease resistance (for history, see US Forest Service page). The experiment failed (not surprising given moths are from different families), the colony escaped from Trouvelot’s backyard, and gypsy moths became established as a major pest of hardwoods in the northeastern US (animated range data from US Forest Service at right). Subsequent introductions of numerous forest pests and pathogens into the US, largely through importation of infested wood products, have had large impacts on timber industry and local ecosystems alike, and have led to near extinction of American chestnut, and large-scale mortality in elm, hemlock, and oak, and other tree species.

SN_damage22The first step in controlling invasive species is detection. In J Entomolog 2010 7:60 researchers from USDA Forest Service report on DNA barcode identification of Eurasian woodwasp Sirex noctilio. S. noctilio has been established and spreading in northeastern US and Canada since at least 2004, and “will likely become a major pest of pines and possibly other conifers in North America.” The wasp attacks living pines, laying eggs along with an inoculum of  “phytotoxic mucus” and an exotic [non-native] wood decay fungus (Amylosterum areolatum). The wasp larvae “feed on pine wood decayed by the fungus and on the fungus itself”, weakening or killing the tree.

Wilson  and Schiff analyzed COI barcodes of 207 larvae or adults representing 27 woodwasp species or subspecies (including 6 Sirex spp.) following a fairly standard protocol (i.e., 1 leg, DNAeasy kit, HCO 2198/LCO 1490 primers.) [As an aside, these primers (Folmer 1994) remain surprisingly widely used for barcoding invertebrates, despite development of several other effective broad-range primers for COI barcode region (e.g., see CCDB collected protocols), which perhaps reflects absence of a large-scale direct comparison.] All species gave distinct barcodes, minimum interspecific distance was 7.6 (maximum  26.2%) , and, remarkably, there was no variation within any named taxa (average 9 individuals per species/subspecies, range 4-23). However they observed 2.3%-2.8% differences between subspecies of Xeris spectrum and Sirex juvencus, suggesting that “taxonomic revisions are probably in order to separate these subspecies in each case into separate subspecies.”

In addition to application in forest surveys, Wilson and Schiff note the need for a “standardized diagnostic method of identifying insect larval stages at ports of entry within imported wood producs…and in wood used as crates and dunnage for imported goods.” For example, “recent analyses of Sirex larvae intercepted from 1985-2000 by USDA-APHIS personnel at US ports of entry…indicate that only 7 (6.8%) of 103 specimens could be identified to species (Hoebeke et al 2005).” The authors conclude “DNA barcode methods can be used to identify larval states of woodwasps…as easily as free-flying adults,” which “should help prevent future introductions of S. noctilio and other exotic woodwasps.”

PopSci Profile

The July 2010 issue of Popular Science (pp. 54-55) features a profile of Jesse in its Environmental Visionaries series.

New Scientist Interview

The New Scientist weekly magazine 16 June 2010 publishes a short interview with Jesse about the Census of Marine Life and ideas for the international Quiet Ocean and Dark Sky experiments.

Avian catalogue still incomplete

world-bird-species2How many birds in the world? In the tenth edition of Systema Naturae (1758) (copy in US Library of Congress can be viewed or downloaded here, thanks to Biodiversity Heritage Library), Linnaeus listed 564 species collected from all over the world. In 1935, Ernst Mayr estimated 8,500 world birds, and counted more precisely in 1946, arriving at a total of 8,616 species (Auk 63:64-67). Mayr judged “this figure is probably within five per cent, and certainly within ten per cent, of the final figure” and predicted “whatever changes may occur in the future will be due primarily to taxonomic revaluations, that is to shifts from specific to subspecies status and vice versa.” As of today, the IOC World Bird List v2.4 names 10,386 species, plus another 139 accepted or proposed splits, altogether about 20% above Mayr’s 50 year-old estimate.

As Mayr predicted, nearly all new birds represent “splits” of existing entities, often elevating described subspecies to species status. Mayr estimated about 28,500 “valid subspecies”–might these represent species? Most splits reflect, at least in part, newly discovered genetic differences in mtDNA. In 2004, Robert Zink examined in detail 41 widely-distributed N American birds, and found an average of 1.9 “historically significant units” per species, i.e., distinct mtDNA clusters, most or all of which likely represent distinct species (Proc R Soc Lond B 271:561). At the same time, he found over 90% of subspecies “lack the population genetic structure indicative of a distinct evolutionary unit.” I conclude that species-level avian taxonomy will benefit from a concerted effort to analyze mtDNA in all world birds, namely, All Birds Barcoding Initiative (ABBI). Large scale DNA barcoding surveys so far have found distinct mtDNA clusters in  4-24% of species (e.g., Kerr et al 2007, Kerr et al 2009, Johnsen et al 2010).

Lohman-2In some regions and categories of birds, the proportion of unrecognized species may be even higher. In August 2010 Biological Conservation researchers from 8 institutions in Southeast Asia and North America report on “cryptic genetic diversity” in non-migratory Philippine birds that are also apparently widespread in other Southeast Asian countries. Lohman and colleagues analyzed seven of the 72 non-migratory, non-endemic Philippine species in detail, represented by 210 tissue specimens (9-51 specimens/species), collected from 16 countries over 18 years by 54 collectors and held in 13 institutions!

mtDNA analysis revealed genetically distinct clusters in all seven species (minimum Philippine/non-Philippine genetic distance 0.9-8.8% in COI, 2.1-9.4% in cytb). The researchers observe that using a “combination of monophyly, morphological distinctiveness as recognized by current subspecific taxonomy, and a 3% COI distance as a threshold for highlighting possible unrecognized species, six putatively new endemic Philippine species are revealed.” In addition to distinctness of Philippine forms, six of the seven species showed multiple (3-7) geographically-restricted lineages in Southeast Asia, at least some of which are likely to represent new species as well.

As Lohman and colleagues demonstrate, many of the tissue specimens needed to complete the census of world birds are already in museums, awaiting analysis. The world’s avian tissue collections comprise over 300,000 specimens representing over 7,000 species (Stoeckle and Winker Auk 2009), most of which, I surmise, have not been analyzed for any gene. DNA barcoding of existing avian tissue collections will likely lead to many discoveries.

Evidence

What is the evidence that DNA barcoding is a reliable method for species identification?

For this commentary, “DNA barcoding” refers to nucleotide sequencing of PCR-amplified DNA corresponding to an approved barcode region, namely 5′ portion of COI for animals or rbcL + matK for land plants; and “species identification” refers to assigning the name of a known species to a specimen of unknown identity.

Acceptance by scientific community. For identification of known species, I think it is fair to say that DNA testing in general and DNA barcoding in particular are generally accepted in the scientific community as reliable methods. For example, the Canadian Centre for DNA Barcoding website has a compilation of peer-reviewed publications, which includes over 500 articles published since 2003.  The primary limitation to identification is whether the relevant species and close relatives have yet been documented in the databases at the time they are queried. The BOLD database is strongest for multicellular animals (> 1,000,000 records as of May 2010; see chart), particularly arthropods and chordates. For plants, the general principles are the same, but so far there is much less documentation, as plant barcodes were not agreed-upon until last year (Hollingsworth et al PNAS May 2009), and there was not a large set of pre-existing data to Untitled-2-records-2work with. Nonetheless, DNA barcoding of plants is ready for practical application and is providing immediately useful information (e.g. “DNA barcoding exposes a case of mistaken identity in the fern horticultural trade” Prior et al, Mol Ecol Resources April 2010) . For fungi, from perusing database it appears that ITS (internal transcribed spacer) and COI are informally accepted as barcodes. For protists and other domains of life, results so far suggest COI will serve as a primary barcode.

Most articles focus on DNA barcoding in a particular group and assess the accuracy of identification in that group. For example, in “DNA barcoding of commercially important salmon and trout species (Oncorhynchus and Salmo) from North America” (J Agricultural Food Chem 57:8379, 2009) Rasmussen and colleagues analyzed more than 1000 samples representing the 7 commercially important salmonid species from 143 sites  across western North America including Alaska and Canada, (to capture possible variation within species) The authors found 100% separation of these species by DNA barcoding, i.e., distances among species were always greater than within species.

Forensic application. DNA barcoding for species identification has been used in legal cases (e.g. Cohen et al J Food Protection 72: 810, 2009). More general evidence is presented by Dawnay et al in “Validation of the barcoding gene COI for use in forensic genetic species identification” (Forensic Sci International 173:1, 2007). The authors conclude “this study demonstrates that the cytochrome c oxidase I gene enables accurate animal species identification where adequate reference sequence data exists.” As with any laboratory method, quality control and quality assurance (QA/QC) measures are essential (e.g. Morin et al J Heredity 101:1, 2010).

DNA barcode identification was designed to be a simple, straightforward method appropriate for wide use, and the results so far amply bear this out, including its use by high school students (e.g., “FDA pressured to combat rising ‘food fraud’,” Lyndsey Layton, Washington Post March 30, 2010). One aspect that needs work in my opinion are better explanations of the algorithms used for matching sequences to the databases and what the results mean. It still takes an expert to make sense of the data. Although the results are often obvious (e.g., 100% sequence identity to 10 barcode records of “Bos taurus (cow)”, interpretation is context dependent–a 100% match has a different meaning if a “neighboring” species differs by, say 1%, or if a congeneric species is not documented or is represented by a single record, for example. In my experience, identifications are usually straightforward, including recognizing ambiguous identifications. Nonetheless, for DNA barcoding to have the widest use, including in legal settings, it will be helpful to have better documentation of how we arrive at species diagnoses through DNA barcodes.

Why we need DNA ID

a) Culex pipiens, b) Culicens incidens, c) C. pipiens larvae, d) C. pipiens eggsBiting insects transmit human and animal diseases, including protozoan (e.g., malaria, leishmania, trypanosoma (sleeping sickness, Chagas disease)), filiarial (e.g., onchocerciasis, Guinea worm), and viral (e.g., yellow fever, West Nile, dengue) diseases. Control measures rely on identifying the insects, which generally requires expert training.

There are 174 mosquito species and subspecies in North America (“Identification and Geographical Distribution of the Mosquitos of North America, North of Mexico,” Richard F. Darsie, Jr. and Ronald A. Ward, University Press of Florida, 2005). Many species bite humans, but only a handful are important disease vectors. It takes an expert to identify Culex pipiens (panel A), which is the major vector for West Nile virus in eastern U.S., and to distinguish this from other species, for example, Culiseta incidens (panel B), which does not transmit human disease. Even experts are challenged by larvae (C), and eggs (D), and the latter are small and easily overlooked (egg raft size shown in inset). Planning and/or applying control measures is best done before adults hatch, but the early stages are what is most difficult.

The reference work cited above includes morphologic keys for identification of adult females and fourth-instar larvae. However, only an expert could make use of these (e.g. “lower mesepimeral setae absent, pale basal band on abdominal tergum II narrowed, or completely interrupted, medially). If mosquito identification is important for society, then reference DNA barcodes are what is needed, as these enable many more persons to name specimens, regardless of life stage. It does not make sense to rely on reference works for the world’s mosquitos that are incomprehensible to anyone who is not already a mosquito specialist.

Farewell Smriti

The Program for the Human Environment bids a fond farewell to research assistant Smriti Rao, who worked with us beginning in October 2008 and now relocates to San Diego. We trust Smriti will remain part of the extended PHE family.