PLoS ONE paper “Structural Analysis of Biodiversity”

In 24 February 2010 PLoS ONE paper “Structural Analysis of Biodiversity”, PHE researcher Mark Stoeckle and colleagues at Mt. Sinai School of Medicine apply their recently-developed indicator vector technique to over 16,000 DNA barcode sequences from 12 diverse animal groups, with correct assignment in all 11,000 test cases. This approach generates “Klee diagrams” which represent affinities among large numbers of nucleotide sequences in condensed, single-page displays. The computationally-efficient indicator vector analysis could be applied to even larger datasets  (BOLD database at > 800,000 records, >67,000 species), an exciting prospect.

Medicinal orchids unmasked

Herbal products make a compelling case for DNA-based identification–how else to recognize dried bits of roots, leaves, stems, bark, and flowers from a multitude of species? In December 2009 J Nat Med, researchers from Ochanomizu University and Showa Pharmaceutical University, Japan, apply recently agreed-upon standards for DNA barcoding land plants, namely matK and rbcL, to distinguish among Dendrobium species. Dendrobium is a large (about 1200 species) genus of orchids widely distributed through east Asia to Philippines, Australia, and New Zealand.  Over 50 Dendrobium species are used in traditional medicines and are thought to have various pharmacologic activities, although the active ingredient(s) are not yet characterized.

Asahina and colleagues analyzed rbcL and matK from 12 samples representing 5 Dendrobium sp. and 3 hybrid cultivars whose genetic histories are uncertain. Single primer sets successfully amplified matK and rbcL from all specimens. The researchers cloned PCR products (and then sequenced at least 3 clones per species), rather than directly sequencing amplified products (rationale for the cloning step is not given). They found that matK, but not rbcL, distinguished among the five species; this is consistent with general observation that rbcL varies less among closely-related species than does matK. Results were similar when 22 matK Dendrobium sp. sequences from GenBank were added to analysis (bringing species total to 6), with one exception; 1 of 11 D. officinale GenBank matK sequences was unique, and in NJ diagram appeared on branch distant from the other 10. In this modest sampling, there was no intra-specific variation in the original 12 samples; some intra-specific differences were noted in 2 species in comparison with GenBank sequences.

Untitled-7This study demonstrates advantages of DNA barcoding approach for plant identification. Of course, there is already a lot of interest in DNA identification of herbal plants in general and Dendrobium orchids in particular. For example, I found over a dozen articles describing DNA methods for distinguishing Dendrobium sp. However, the methods described are limited to identifying species in this one genus, which means one has to have a pretty good idea what the specimen is before applying DNA testing! This highlights the essential advantage of barcoding–a standardized approach can be applied to any unknown, and makes feasible creation of a comprehensive reference library.

Looking ahead, we want to know more about intra- and inter-specific variation in plants. In animals, the patterning of mitochondrial variation is quite uniform, with intra-specific << inter-specific variation, such that most species form relatively tight clusters distinct from those of other species in NJ diagrams. Results so far in plants generally show little intraspecific variation in chloroplast genes (including rbcL and matK), but a diversity of distances among closely-related species. Assuming these early results are borne out, we then want to know why plants and animals differ? For more genetic variation in plants and animals, see Rieseberg et al Nature 2006, Fazekas et al Mol Ecol Res 2009).

Identifying forensic flies with DNA

800px-Sarcophaga_nodosaIn forensic investigation, insect evidence helps date the time of death, as the various species that colonize corpses exhibit different stages of development according to time and temperature. Determining the post-mortem interval (PMI) rests on accurate species identification, including of immature forms. In Dec 2009 Int J Legal Med researchers from University of Wollongong, Australia, test DNA-based identification of Sarcophagidae flies, which lack distinguishing features as immature forms, and their adult identification requires “meticulous examination of subtle morphological differences, including regional hair presence and colour, body pigmentation and bristle length, placement and abundance”, and even then may need genitalic dissection for confirmation. As a result, sarcophagid flies are little used in forensic study, although being viviparous, they are “prospectively more reliable for PMI estimations compared with other initial dipteran colonisers” [the latter are mostly egg-laying species (e.g. callophorid blowflies), which hatch only if certain environmental conditions are met, adding uncertainty to PMI determinations].

The researchers successfully recovered COI barcodes, without evidence of pseudogenes, from 85 adult specimens representing 16 species, using a single primer pair with degenerate bases previously applied to forensic blowflies (Nelson et al 2007 Med Vet Entomol).  In NJ analysis, 14 of 16 species showed single clusters distinct from other species; the remaining 2 species showed deep divergences which the authors surmise may indicate cryptic species, perhaps more likely given that “taxonomic descriptions of the Australian Sarcophagidae have not been updated since the 1950s”.

Meikeljohn and colleagues demonstrate efficacy of COI barcodes as species-level identifiers for Australian sarcophagids. The tight intra-specific clustering in these flies appears identical to that seen in diverse animal groups including vertebrates, for example, yet flies are presumably several orders of magnitude more abundant. (As an aside, although the authors report their sequences and associated specimen data are deposited in BOLD, their data are not visible in “Public Projects”–I hope the authors will amend this.)  What then limits mitochondrial variation within species? Or in the language of population genetics, why are effective population sizes for animal species uniformly small, unrelated to census population sizes? Like the nature of dark matter, explanation(s) await.

Addendum 11 Feb 2010: Dr. Meikeljohn reports that the sequences and associated data are scheduled to appear in BOLD and NCBI GenBank as soon article appears in print edition.

Trans-Atlantic DNA survey reveals overlooked avian diversity in scientific heartland

UntitledIn January 2010 J Ornithol (open access article) researchers from Norway Natural History Museum, Swedish Museum of Natural History, University of Guelph, and Rockefeller University (myself) survey mitochondrial differences in 296 species representing 97-98% of Scandanavian breeding birds. 283 (95.6%) of species formed unique clusters; the remaining 13 species formed 5 clusters consisting of 2-4 species with shared or overlapping barcodes, which might reflect young species, hybridization with introgression, and/or a single gene pool. Surprisingly for such a relatively small geographic area, large sequence differences were found in 4 species, all of which have large breeding ranges that extend outside of Scandanavia; the authors propose these represent “a mixture of separate lineages that evolved in allopatry” and advise further sampling to “elucidate the phylogeographic history”.

Johnsen et al take advantage of existing barcode library to compare species whose breeding ranges extend across the Atlantic. 19 (25%) of the 78 showed intercontinental divergences typical of species-level differences, including 8 species that had not been identified in prior work (data re-compiled in figure below). Most were inland species with discontinuous breeding ranges but there were unexpected exceptions such as Steller’s eider (Polysticta stelleri), which has what appears to be a continuous circumpolar breeding range. Three of the species formed paraphyletic clusters when combined with N American congeners, suggesting the inter-continental “conspecifics” are not even each other’s closest relatives.

Untitled6aIn my view, this paper demonstrates that a survey approach produces a high level of discovery and hypothesis-generating, and leads me to question how well we understand diversity in birds, which are generally considered the taxonomically best-known large group of animals. Many of the species in the present study have been known to science for over 250 years, are resident in densely-settled, scientifically-advanced regions, and yet Johnsen and colleagues demonstrate hidden diversity. In 1946, Ernst Mayr compiled a world list of 8,616 species, which he judged to be “within 5 percent and certainly 10% of the final total”. The current IOC World Bird List v 2.3 recognizes 10,322 species (19% higher than Mayr’s estimate) and there is a steady stream of splits of existing forms, fueled by DNA sequence data. I believe DNA barcoding offers a way complete this process in a timely manner. If we analyzed multiple individuals from each of world’s named species, there would still be many areas of uncertainty, but at least the larger differences would be known. It is a scientific embarrassment that we are still discovering lineages that have been reproductively isolated for millions of years, in everyday birds no less!

There are over 300,000 avian tissue samples in the world’s museums, representing over 7,000 species (Stoeckle and Winker, Auk 2009). By my calculation, a modest number of these have been analyzed to date for species-level differences. For instance, by my count GenBank contains 13,361 cytochrome b sequences representing 4,320 avian species, and the All Birds Barcoding Initiative (ABBI) has so far collected 17,250 sequences representing 2,969 species. A concerted project of the world’s avian tissue collections employing DNA barcoding approach suggests an unmatched opportunity for large-scale, species-level genetics with many discoveries and hypothesis-generating findings which will inform various areas in evolutionary science. For instance, population genetics modeling starts with correctly identifying breeding populations (ie species). These samples may be eventually be analyzed in small batches, assuming they are not lost or destroyed, but the pace of standard research practices brings to mind the story of the Dead Sea Scrolls. Some were published soon after discovery in 1946, but the rest fell under the control of a committee of scholars and remained hidden not only from public but from other scholars for more than 40 years. When the monopoly was broken in 1991 (by researchers using a desktop computer to reconstruct texts from published concordances), some complained:

“Dr. Frank M. Cross, a scholar at the Harvard Divinity School who has worked with the scrolls since the 1950’s, said in a telephone interview that the publication of these unauthorized versions, which he described as “pirated,” would have no effect on the pace and publication schedules involving the actual scrolls. He defended his colleagues from the frequent charges of undue secrecy and procrastination, saying the critics did not understand the difficulties of working with the remaining unpublished documents that are mostly a collection of fragile fragments of parchment.” New York Times September 5, 1991

J Ornithol paper

PHE researcher Mark Stoeckle is co-author on Jan 2009 J Ornithol paper comparing Scandinavian and North American birds, which found divergent lineages in 19 (24%) of trans-Atlantic species, demonstrating DNA barcoding helps discover diversity even in intensively-studied groups such as birds.

Names

In Systema Natura 250 (Andrew Polaszek, ed; CRC Press), a new collection of essays on the state of taxonomy, David Schindel and Scott Miller address how to speed up “naming” of specimens without causing chaos, in chapter entitled “Provisional nomenclature: The on-ramp to taxonomic names.” The authors observe the increasing numbers of undescribed and undescribable specimens (eg fragments, mixed environmental samples) and propose to standardize provisional names (preferred designation of these standardized hu7temporary placeholders is “taxon label”). As they note, there are many provisional names in GenBank (e.g. Ocyptamus sp. MZH S143_2004), so this is not a change in usual practice, except that the format of provisional names is standardized. As a starting point, Schindel and Miller propose a scheme developed by Council of the Heads of Australian Herbariums (CHAH) and recommend review by Biodiversity Information Standards (TDWG). The CHAH format is:

Genus_name sp. Locality (Voucher identifier) Source, where “(Voucher-specimen identifier) is a two-part field consisting of a collector’s name and the voucher specimen number attached to the exemplar of the taxon concept,” and “Source refers to the name of the concept’s proposer.”  Regarding sequence data as identifiers, such labels could be generated by a clustering algorithm for DNA barcodes for example. Schindel and Miller discuss short and long-term advantages to taxonomic workflow, academic credit, and scientific sharing.

A standardized format for provisional names is a simple, powerful proposal with many downstream benefits. I hope TDWG will adopt!