Blog

Bees conduct floristic survey

As in last week’s post on what deep-water sharks eat, DNA-based species identification helps reveal how animals live, not just what species they are. Diet analysis can also provide a survey of what prey/food species are in the local environment. In April 2010 Diversity, researchers from Université Grenoble, France, apply standardized DNA identification targeting chloroplast trnL intron P6 loop and massively parallel sequencing to examine plant DNAs in honey. The traditional approach for determining geographic and botanical origins of honey is microscopic examination of pollen, which requires expert training.

As previously described, the trnL P6 intron is tiny (10-143 bp) and has highly conserved flanking sequences, enabling successful amplification of DNA from many or most plants, including from degraded samples. The major disadvantages are relatively low taxonomic resolution, which is improved if sequences are matched to local rather than global flora, and a modest reference library.  Interpreting PCR-based results from mixed samples can be complicated, as there may be preferential amplification of some sequences and not others.. To my knowledge, this has not been studied for trnL P6 approach in general or as applied to honey in particular.

Valentini and colleagues extracted DNA from 10 mg samples of honey (one from a commercial “wild flower” blend and a one from local Pyrenean region) using a standard kit (Qiagen), amplified the P6 loop with broad-range primers, and performed pyrosequencing on a Roche Diagnostic G20 system. Different nucleotide sequence tags were applied to the two samples, enabling both to be analyzed in a single pyrosequencing run; the authors point out that tagging could be expanded to enable analyzing hundreds of samples in a single run. A total of 3,671 and 2,191 sequences represented at least 3 times were obtained from Pyrenean and mixed wild flower honey, respectively, which were matched to 22 and 26 plant taxa, respectively. In terms of taxonomic resolution, these were mostly family or generic level assignments: 9 families/subfamiles/tribes, 7 genera, and 6 species (Pyrenean), and 14 families/subfamilies/tribes, 8 genera, and 4 species (mixed wild flower). In both samples, the five most abundant taxa comprised about 75% of total sequences.

Valentini and colleagues note that “several of the plant taxa identified were not the result of nectar collection” (moss, fern, pine), and were presumably due to wind transport from nearby plants. The fern species identified, Athyrium vidalii, which comprised 1.9% of sequences, is distributed in China, Japan, Korea, and Taiwan, evidence for the geographic origin of the honey. Documenting geographic origin of honey products is of commercial interest.

A primary advantage and rationale for DNA barcoding is that standardizing on one or a few regions enables a comprehensive reference library and broadly-applicable testing methods. The trnL P6 target utilized in the present study is not part of the published community standard of rbcL + matK targets (A DNA barcode for land plants, Hollingsworth et al PNAS 2009), so it remains to be seen when this will be widely used. In any case,  authors conclude that their method is “fast, simple to implement, more robust than classical methods” and “opens new perspectives in the analysis of honey diversity.” I look forward to learning more!

Knowing the unknowable

350 years ago Anthony van Leeuwenhoek explored the living world around (and within!) him using tiny, powerful, single lens microscopes. He discovered “tiny animacules” including what we now know as protozoa and bacteria, and detailed structures of plant and animal tissues.  In a similar way, DNA study can reveal features of the living world that would otherwise remain unknown.

In July 2010 Deep-Sea Research (not open access) investigators from National Institute of Water and Atmospheric Research, New Zealand, report on what deepwater sharks eat. Dunn and colleagues analyzed stomach contents of 194 sharks from 6 species (14-50 individuals per species) collected in bottom trawls on the Chatham Rise, a relatively shallow area and important fishing ground that extends 1000 km east of New Zealand, at depths of 200-800 m (note: at these depths ocean is nearly dark and does not support photosynthesis). FYI, the sharks studied are Kitefin shark (Dalatias licha), Deepwater spiny dogfish (Centrophorus squamosus), Roughskin dogfish (Centroscymnus owstonii), Deepwater dogfish (Centroselachus crepidater), Lord Plunket’s shark (Proscymnodon plunketi), and Eastern school shark (Galeorhinus galeus).

Perhaps related to the trawl capture method, the individual sharks were relatively small, ranging from 0.38 – 1.6 m depending on species. Prey items were first subject to morphologic identification, and DNA barcoding (using standard primers for full-length 650 bp COI barcode) was performed only if items were visually unrecognizable. Of the 118 sharks with non-empty stomachs, 43 (36%) had prey identified by morphology alone, 28 (24%) by DNA alone, 37 (31%) by both, and in 10 (8%) no items were identifiable by either method. In addition to a variety of fish, predominantly Hoki (Macruronus novaezelandiae) , the most abundant and commercially most important fish on the Chatham Rise, prey items included other shark species, shrimp, octopus, and squids.

In this report, Dunn and colleagues describe what sharks living in near-darkness in the deep ocean eat. Absent DNA, most of this information would be unknowable. The authors conclude that “DNA barcoding can be used to identify prey, and can greatly increase the rate of data accumulation,” noting “the current cost of survey time vastly outweights that of DNA barcoding of prey, making DNA barcoding a cost-effective way of increasing sampling rate”.

Naming names faster (addendum)

In yesterday’s post I placed “integrative taxonomy” on a spectrum with morphologic taxonomy at one end and “DNA taxonomy” as applied to eubacteria/archaebacteria at the other. Mehrdad Hajibabaei pointed out that bacterial diversity is not partitioned into species in the same way it is in animals and plants. Eubacteria/archaebacteria have relatively fluid genomes with frequent exchange of DNA among lineages. Fewer than 10,000 bacterial species have been named, although their diversity is certainly vaster than all eukaryotes (2 million named species).  Thus DNA-based classification of bacteria, at least as presently applied, does not meet the goals of DNA barcoding, which aims to capture species-level differences.

Naming names faster

Species are the units of biodiversity. Discontinuities in biological variation sort organisms into discrete groups that we recognize as species, and so gathering data on differences among organisms is the necessary first step in understanding the diversity of life. Here DNA has singular value–all organisms have DNA and some genetic loci are widely-shared, enabling direct comparisons across the  diversity of multicellular life. Barcoding targets widely-shared gene sequence(s) that nonetheless differ among most closely-related species (COI for animals and rbcL+matK for land plants), providing broadly applicable metrics for mapping the discontinuities that represent species. Large-scale DNA barcoding thus offers for the first time a macroscopic view of biodiversity.

This sounds straightforward enough, but naming species, like medical diagnosis, is a process requiring human judgment. A taxonomic expert generally focuses on one or a few species or potential species at a time, sifting through morphological, ecological, behavioral, and DNA data and making inferences about the evolutionary past. It generally takes years or decades between specimen collection and publication of a new species description, and my impression is that most specimens in museum collections including frozen tissues have never been scrutinized in detail sufficient to determine whether they represent new species. Given that a high-throughput laboratory can generate a hundred thousand barcodes in a year, there are opportunities for new workflows.

In May 2010 Frontiers Zool, researchers from Uppsala University, Sweden, and Technical University of Braunschweig, Germany, look at how we might incorporate the flood of DNA data, outlining an approach they (and others) call “integrative taxonomy”. As current practice in taxonomy already involves integrating different kinds of data (morphology, behavior, range, DNA), I take this term to mean an approach somewhere between one primarily based on morphology (“traditional taxonomy”) and one primarily based on DNA (“DNA taxonomy”), such as that for eubacteria and archaebacteria. Padial and colleagues review the recently revitalized scientific discussion about species delimitation involving population biology and phylogenetics, noting “what matters for the study of speciation matters for taxonomy as well.” They call for a flexible approach including the possibility of “recognition of a species on the basis of a single set of characters”, which could be DNA barcodes.  Near the end, they address the big challenge, which is that DNA studies, particularly DNA barcoding, “are revealing units that might represent potential new species at a faster pace than results can be followed up for taxonomists.” Padial and co-authors review various protocols used for naming “candidate species” and conclude “standardization of such schemes across taxonomic groups of eukaryotes would be clear progress for data retrieval systems.”  As described in more detail here previously, a starting point for discussion of the preferred format for standardizing provisional names was recently proposed (Schindel and Miller, System Naturae 250, Chapter 10), based on the scheme currently used by CHAH (Council of Heads of Australian Hebaria). This system of “taxon labels” (as distinguished from “taxon names”) meets the criteria of uniqueness, stability, and non-confusion with formal taxon names.

At present, our knowledge of biodiversity is built around a catalog of taxon names, annotated with DNA data if available. I imagine the future catalog as being a DNA (barcode) map, annotated with taxon names if available. Some parts of the map, such as for birds, will be heavily annotated, and others, such as for nematodes, will have few formal names, and instead will have taxon labels generated by automated clustering algorithms. In some cases, the DNA data will be derived from individual specimens, backed up by museum vouchers, and in other cases it will be generated from environmental sampling. Only then will we begin to see how much biodiversity is unexplored.

Arthur L. Singer, Jr. “Easy to Forget, and So Hard to Remember”

Arthur L. Singer, Jr., allows us the honor to post his 90-page “East to Forget, and So Hard to Remember,” covering his career at MIT, the Carnegie and Sloan Foundations elsewhere.The memoir embraces subjects ranging from race relations to nuclear war to the origins of public television. Art directly and creatively helped hundreds of people during his career. In December 2005, Art was instrumental in linking Jesse to the MacArthur Foundation to help launch the Encyclopedia of Life.

New scientific newsstand for marine barcoders

Identifying marine life is a major challenge. On land, nearly all animals visible without a microscope are in one of two phyla: Chordata or Arthropoda, the latter most often represented by insects.  In contrast, many ancient lineages are present in the oceans. Abundant marine phyla with well-known representatives include Mollusca (molluscs), Porifera (sponges), Cnidaria (corals, jellyfish), Ctenophora (comb jellies), Echinodermata (sea urchins, others), as well as Chordata (e.g. fish) and Arthropoda (e.g. crabs). Many marine species have strange immature forms (see sea urchin larva above), which may puzzle specialists and others. Even marine vertebrates can be challenging. Using mitochondrial DNA, researchers recently discovered that what were thought to be three families of deep-sea fishes were in fact larval, male, and female forms of a single family of fish (Johnson Biol Lett 2009). Observation of marine life is difficult except in a few near shore areas. It is easier for a school child with a pair of binoculars to survey the moon than for a team of oceanographers with expensive equipment to study the deep ocean.

As with the enigmatic fish species described above, routine application of DNA-based identification will advance oceanographic science, and I imagine will have an even more transformative impact than in terrestrial research. To help establish the DNA reference library, we have the Marine Barcode of Life Initiative (MarBOL), a joint effort of Census of Marine Life (CoML) and Consortium for the Barcode of Life (CBOL), which aims to “enhance our capacity to identify marine life” through DNA barcoding. I note that PLoS ONE recently set up “The MarBOL Collection” of papers devoted to marine barcoding and look forward to seeing how this scientific “newsstand” develops. In June, PLoS ONE received an impact rating of 4.351, placing it in the top 25% percentile of biology journals, making it a prominent place for highlighting and disseminating scientific developments.

Commercial opportunities

The most successful technologies generate money. In turn, a commercial market helps drive improvements in cost and speed, enabling wider applications and new scientific knowledge. The rapid completion of the Human Genome Project (HGP) can be seen as a direct result of Applied Biosystems ABI 3700 DNA analyzer, the first fully automated capillary sequencer, introduced in 1998. In turn, the large market for high-throughput sequencing that resulted from HGP funding helped drive multiple rounds of improvement in cost and speed.

This leads me to thoughts about DNA barcoding.  The first exploratory meetings were held in 2003 at Banbury Center, Cold Spring Harbor Laboratory. Seven years later DNA barcoding is established as an accurate method for species identification with diverse scientific applications. BOLD, the publicly-available library of DNA barcodes, contains over 800,000 records from over 70,000 species. A new international effort, iBOL, is underway to establish DNA barcode libraries for 5 million specimens from 500,000 species by 2015. Like the government-maintained network of GPS satellites, publicly-funded DNA barcode libraries appear to offer enormous commercial opportunity, with potential benefits to society and science.

Where is barcoding on this path? So far, I find only a handful of companies and/or products that provide DNA-based species identification  (for example, Therion, SteriSense, FishDNAID, Applied Food Technologies, Ecogenics).  Of the few that exist, most are aimed at fish identification and do not take advantage of large scope and transparent sourcing of DNA barcode libraries. For example, Agilent Technologies recently introduced a “Fish identification system” based on “experimentally-derived [PCR-RFLP] patterns from more than 50 species.” This is wonderful but the scope is too small and the underlying library is unknown. Agilent is participating with the National Center for Food Safety and Technology,  a US government-industry collaboration, so perhaps that will lead to more robust applications. I note that DNA barcode detection of food fraud (not just fish) was front-page news in Washington Post in March 2010 and the potential educational market is also large. I look forward to more entrepreneurs, whether at established companies or start-ups!