DNA Barcoding – Page 30 – The Rockefeller University

Exploring unicellular eukaryotic universe with mtDNA

September 14, 2006

Most DNA barcode research to date analyzes multicellular animals, but why stop there? Unicellular eukaryotes or “protists” probably comprise most of Earth’s biomass and encompass more genetic diversity than all multicellular animals and plants combined. In current J Eukaryot Microbiol 2006 53:385 Denis Lynn and Michaela Struder-Kypke report on mitochondrial cytochrome oxidase I sequences in Tetrahymena, a ciliate protozoan genus related to Paramecium which includes T. thermophila, a model organism and the first free-living unicellular eukaryote genome sequenced. The authors analyzed 14 isolates of T. thermophila from 2 geographically distant locations, and 4 pairs of Tetrahymena sister species selected because they show NO sequence difference in nuclear small subunit ribsomal (SSrRNA) genes. They found less than 1% intraspecific sequence variation within T. thermophila isolates. Differences between species ranged from 1%-12% and the sister species pairs which have identical SSrRNA genes showed differences in mtCOI sequences.

This early study suggests further exploration of unicellular eukaryotic biodiversity with COI barcodes will be fruitful.

Growing libraries reinforce mtDNA sequence clustering

September 7, 2006

Growing barcode libraries confirm distinct clustering of mtDNA sequences. In early surveys of mtDNA differences, it seemed possible that as larger numbers of individuals were sampled, there would invevitably be many overlaps between closely-related species. The accumulating barcode data show this supposition is incorrect. Instead, further sampling reinforces the observation that most animal species correspond to distinct mtDNA sequence clusters, as for example, in the tree of Canada and Cackling Goose mtDNA sequences below (Figure A generated with public data files and software on Barcode of Life Database site https://www.barcodinglife.org/). Where large differences within species are found, they generally reflect the distinct sequence clusters of geographically restricted populations which have other identifiable biological differences, as in tree of Winter Wren mtDNA sequences below. Such clusters are probably best regarded as separate species (Figure B adapted from Drovetski et al 2004 Proc R Soc Lond B 271:545; number individuals sequenced shown in parentheses).

An emerging corollary is that most named subspecies do not represent evolutionary significant units. Large-scale surveys are revealing many genetically distinct clusters within named species, but these clusters generally do NOT correspond to described subspecies. For example, none of the 39 subspecies of Winter wren correspond to the geographic clades reported by Drovetski et al. and grouping by subspecies did not account for any variance. Robert Zink reports that “97% of continentally distributed avian subspecies lack the population genetic structure indicative of a distinct evolutionary unit” (Zink 2004 Proc R Soc Lond B 271:564). Regarding mtDNA sequence clusters, Zink states “it is these unnamed units and not named subspecies that should play a major role in guiding conservation efforts and in identifying biological diversity” Large-scale DNA barcode surveys of multicellular animals and plants can provide a foundation for intelligent conservation efforts.

What needs explanation is the absence of variation

August 31, 2006August 7, 2024

Results so far show most animal species correspond to clusters of closely-related mtDNA sequences, distinct from clusters of neighboring species. This patterning is so striking that if a neighbor-joining tree of mtDNA sequences were shown on the SAT, high school students could likely recognize the branches that correspond to species. For example, below is a nj tree of mtDNA sequences showing cryptic species of long-tailed shrew tenrecs in Madgascar (Olson et al 2004 Biol J Linnaean Soc 83:1). For an invertebrate example, see last week’s post.

The remarkably widespread pattern of restricted intraspecific sequence variation in mtDNA in animals calls out for better scientific understanding. To my reading, most of the genetic taxonomic literature is focused on sequence differences because sequence differences are the necessary grist for reconstructing evolutionary history. Absence of variation means no characters, and no ability to generate evolutionary hypotheses. When absence of variation within species is found, it is often given the ad hoc and untestable explanation of being due to a recent population bottleneck. In individual cases this might seem plausible, but the hypothesis becomes absurd when applied to the large number of animal species that show low intraspecific variation. For example 97% of the 263 world cowrie species show constrained intraspecific variation (Meyer and Paulay 2005 PLoS Biol 3: e422); it is nonsensical to suppose all went through recent population bottlenecks. Low intraspecific variation is often said to indicate a small “effective population size”, but this is simply a restatement of the finding. As far as we know, many species are ancient and have enormous population sizes, factors that should permit variation. The burgeoning barcode libraries demonstrating limited intraspecific mtDNA variation in most animal species prompt the question, what erases history within species?

In April 2006 Science, researchers at Universite Montpellier, France (Bazin et al 2006 Science 312:570) report that population size does not influence mitochondrial diversity in animals, and hypothesize that mitochondrial DNA “probably undergoes frequent adaptive evolution”. Perhaps because this report threatens the foundation of many lines of research based on the assumption that mtDNA is a neutral marker whose diversity reflects population size, this study has elicited cautious commentary. Science’s own Perspectives piece concludes weakly “the diversity of mitochondrial DNA does not appear to reflect population size…and may be of only limited utility in understanding ecological, genetic, and evolutionary processes. It is ironic that the lack of recombination, once seen as a great asset of mitochondrial DNA, may be something of a problem in this context”. May be something of a problem? This study and the growing barcode surveys demonstrating limited mitochondrial sequence variation within most animal species overturn the assumptions of population biology and phylogeography and call for a new look at mitochondrial genetics.

Goldilocks finds mtDNA COI barcode length “just right” for distinguishing most animal species, asks why

August 25, 2006

The standard animal barcode 648 bp of mitochondrial gene COI seems “just right” for delimiting most animal species. If it were “too short”, then closely-related species would not be resolved. If it were “too long” then sequencing effort would be wasted. Here I examine what might underlie the Goldilocks effect.

The following figure looks at how often closely-related species (differing by .5%, 1%, or 2%) are predicted to have overlapping sequences. With the assumptions examined below, above 600 base pairs all but the most-closely-related species will be distinguished, and above 800 base pairs, there is little gain in sensitivity.

The assumptions underlying this table-napkin analysis appear supported by data so far:

First, mitochondrial DNA sequence differences between closely-related species are widely and relatively evenly distributed throughout the protein coding and ribosomal genes. For example, see an earlier post with percent identity plots comparing whole mitochondrial genomes for congeneric salamanders. Further support is provided by a plot of parallel sequence differences in the 2 most commonly utilized mitochondrial genes, COI and cytB.

Second, most closely-related animal species have COI sequences that differ by at least 1%. For example more than 98% of 13,320 congeneric pairs from a wide array of invertebrate and vertebrate species showed greater than 2% sequence difference (Hebert et al 2003 Proc Biol Sci 270:S96).

Third, intraspecific sequence variation in mtDNA is generally very low, less than 1% in most animal species.

If most closely-related species can be distinguished by short mtDNA sequences, then recognizing the sets of mtDNA sequences that make up species, ie species delimitation, should at least sometimes be simple. Using the neighbor joining tree of mtDNA barcodes below, an untrained person might pick out the groups of sequences that correspond to species. The top 5 groups represent previously unrecognized cryptic species of scorched mussel Brachidontes exustus (Lee and Foighil 2004 Mol Ecol 13:3527)

Goldilocks leaves us with the scientific questions: why are differences within most species so small, and why are the distances between most nearest neighbor species so large?

“Tag sequencing” reveals vast microbial diversity

August 19, 2006

A “tag sequencing” approach analogous to mtCOI sequencing for barcoding multicellular organisms reveals vast numbers of very rare, highly divergent, deep sea microbes. In August 2006 PNAS (Sogin et al Proc Natl Acad Sci USA 32:12125), researchers from Marine Biological Laboratory at Woods Hole and Royal Netherlands Institute for Sea Research report on pooled bacterial samples collected at 550-1,710 meters in the Atlantic Ocean. To enable detection of rare populations, they focused on a short hypervariable region of 16s rRNA (only 79 bases) and analyzed a large number of PCR amplicons (118,000!) using 454 Life Sciences technology. This approach makes it economical to analyze enormous numbers of sequences from pooled environmental samples and avoids possible selection artifacts due to biases in amplifying longer PCR products and in cloning. Remarkably, the very short 79 base pair “tag” captured about 90% of the sequence differences in full-length 1500 base pair 16s rRNA sequences.

Sogin et al 2006 PNAS

The results were compared to a V6 hypervariable region database, which contains about 40,000 unique V6 sequences extracted from the nearly 120,000 published bacterial rRNA gene sequences. A small number of sequence tags similar to known bacteria made up most of the samples, including 25% that were identical to sequences in the database and 40% that were no more than 3% different. Overall 75% of “total tags” were less than 10% different from previously sequenced bacteria. The remaining 25% was comprised of thousands of low abundance, extraordinarily diverse populations. The authors conclude the “rare biosphere” is “an ancient and..nearly inexhautible source of genomic innovation..[that] at different times in earth’s history..may have had a profound impact on shaping planetary processes.” There is a lot more we will learn through standardized genetic analysis using short sequences, including mtCOI barcodes and v6 rRNA tags, applied to vast numbers of organisms.

Barcode libraries grow on the web

August 13, 2006

The All Birds Barcoding Initiative (ABBI) website barcodingbirds.org provides a continuously updated progress report on barcoding world birds. A live feed matches barcodes deposited in the

Barcode of Life Data Systems (BOLD) to a checklist of world birds. barcodingbirds.org visitors can view world and regional progress reports, progress by orders and families, and detailed results for individual species including zoomable Google world maps showing where barcodes were collected. A link out to species pages in Integrated Taxonomic Information System (ITIS) is provided.

A sibling website fishbol.org provides live updates for the Fish Barcode of Life initiative (FishBOL) which aims to collect barcodes from all fishes, approximately 30,000 species.

In addition to assisting researchers scattered across the globe track progress and coordinate efforts, these sites will interest many other persons. They link an enormous amount of taxonomic information with growing genetic databases derived from museum collections. The instant Google maps provide a early glimpse of what these sites can do. Future tools will overlay genetic differences in mitochondrial DNA barcodes on top of the geographic map. These “mashups” of traditional taxonomy, widely-accessible species identification through genetic barcode analysis, and user-friendly visualization will have many viewers.

Minimalist DNA barcodes to help with museum specimens

August 7, 2006

Analyzing shorter barcode sequences is an inexpensive way to link museum specimens with degraded DNA to the barcode database. In Molecular Ecology Notes July 2006, Hajibabaei et al first demonstrate in silico that COI sequences as short as 109 base pairs contain enough information to assign most specimens to known species, using simulated “minibarcodes” taken from two full-length barcode datasets. The researchers then analyzed the recovery and performance of various lengths of “minibarcodes” amplified from 33 dried and 91 ethanol-preserved insect specimens ranging in age from 1 to 21 years. As shown in the below, although full-length barcodes were recovered from only 24-39% of specimens, there was encouragingly high success amplifying shorter segments.

As expected from the in silico analysis, in most cases species could be distinguished as well as with full-length barcodes, ie sequences formed distinct non-overlapping clusters in a NJ tree. Hajibabaei et al’s results indicate that analyzing shorter minibarcode sequences can link museum specimens with degraded DNA to the gold standard full-length barcode database. Rather than spend time and money optimizing primers and amplification conditions on individual specimens, instead apply a general method that recovers a 100-400 bp fragment. They point out this approach will be useful “when barcoding reveals several cryptic species within what had been viewed as one species, and it is not morphologically evident which of them matches the holotype” and as “a cost-effective way of building barcode libraries with broad geographical coverage”. They caution that “very short barcode sequences are..valuable for the identification of old specimens from SELECTED NARROW taxonomic arrays” (emphasis added)

I agree a mini-barcode approach can be useful in certain situations, and emphasize their caution that it is not a substitute for a standardized full-length barcode database. First, if widely used, a minimalist approach could easily devolve into a Tower of Babel, with a hodgepodge of non-overlapping minibarcodes that cannot be compared to each other. Second, even if the minibarcodes were standardized so they all overlapped, a simple calculation implies that they would lump together most species with less than 1% sequence difference (in birds, this is about 15% of species). Less than 1% sequence difference means less than 6.5 diagnostic differences with a full-length barcode, and assuming randomly distributed substitutions, a shorter barcode could easily fail to capture any diagnostic differences.

Some taxonomists begin to worry less

July 31, 2006

In 21 June 2006 Heredity News and Commentary “DNA barcodes: recent successes and future prospects” Dasmahapatra and Mallet describe the DNA barcoding initiative as “plausible and worthwhile” and conclude that “recent studies convincingly demonstrate the efficacy of DNA barcoding to recover biologically significant groupings or species”. Their generally positive review stumbles near the end with a call “to supplement the mtDNA-based barcode with nuclear barcodes.” This is an impractical proposal of uncertain benefit. First of all, routinely adding a “nuclear barcode”, if one were to be found, would be solving a problem that does not exist, as there are few cases so far in which an mtDNA COI barcode does not distinguish closely-related species. Of course these exceptional cases need further taxonomic study “integrating DNA sequencing, morphology, and ecologic studies”. Secondly, although over 30 years of research demonstrate the broad utility of mtDNA in delimiting animal species, no one has yet identified a nuclear locus that can regularly distinguish closely-related species, as Dasmahapatra and Mallet acknowledge.

Most of the topics in their review are analyzed in our 2005 brochure “Barcoding Life, Illustrated” which outlines the benefits and limitations to DNA barcoding, including a section on “Why barcode animals with mitochondrial DNA?” .

Establishing a DNA standard barcode for land plants

July 26, 2006

Kew Scientist April 2005 Plant researchers from 11 world herbaria are investigating DNA regions for their potential as barcodes for land plant species. From the project rationale: “although the mitochondrial gene region, CO1 ( cox1 ), has already been used with considerable success across a range of animal groups and shows promise in at least some algal groups, it is characterized by relatively low rates of sequence divergence in land plants. Mitochondrial DNA in land plants also undergoes rearrangements, exhibits incorporation of foreign genes and frequent transfer of some genes to the nuclear genome. It is therefore desirable to find an alternative region or, if necessary, regions from one of the other genomes that would be suitable as a barcode.”

This project aims to establish a standard DNA barcode for land plants. Phase 1, completed in December 2005, was a survey of regions that have potential as land plant barcodes. Phase 2, to be completed by January 2007, is to “ground test” the most promising regions in a series of parallel case studies that incoporates representatives of all major land plant lineages.

If this competition is successful, it should be relatively straightforward (ie fast and inexpensive) to compile a comprehensive library of plant DNA barcodes, as there are only about 500,000 known plant species, the world catalog is thought to be essentially complete, and there are several herbaria with large specimen and DNA collections.

mtDNA sequences can define insect species

June 30, 2006

DNA-based species descriptions could enable a catalog of life on Earth. Without some sort of automated approach, I believe this goal is unattainable. Insects are a good place to start testing an automated sequence-based approach, as there are about 1 million insect species already described, and probably several million more to go. In upcoming August 2006 Systematic Biology Pons et al examine genus Rivacindela tiger beetles in Australia, providing an explicit test of a DNA sequence-based approach to defining species. They analyzed 468 individuals from 65 sites, using sequence data from 3 mitochondrial genes including DNA barcode region of COI, and found sequence variation was strongly partitioned between 46 or 47 putative species, using a novel tree-based, quantitative method of species recognition based on fixed unique diagnostic characters. Most (40 to 43) of the species entities were recovered by analyzing the three gene regions separately; COI alone produced the closest match to the full data set. The putative species defined by sequence data exhibited biological properties of species in terms of geographic ranges and known morphologic characters. Average divergence within species was .5%, much lower than average among species of 6.3% and between sister species of 2.2%. The sequence analysis took 3 days on a desktop computer, so if this approach proves useful, it can be a benchmark for testing faster methods.

Rockefeller University

Program for the Human Environment

Area of Research: DNA Barcoding