New places to find DNA

In 29 july 2008 Fish Biology scientists from Macquarie University, Sydney describe successful recovery of mitochondrial DNA from contemporary and historical shark teeth and jaws. After developing the method on 11 recently collected teeth from Gray nurse shark Carcharias taurus and Ornate wobbegong (excellent name!) Orectolobus halei, Ahonen and Stowe applied it to 20-40 year old museum specimens, including 5 jaws from 3 species and 19 individual teeth from 2 species. They collected approximately 0.02-0.06 g of “tooth powder” by drilling several small holes into a tooth or jaw; DNA was extracted using a standard silica-based method or Qiagen DNAeasy tissue kit.

The authors are interested in historical population sizes for sharks; following the theory that genetic variation within species is an indicator of population size, they picked the hypervariable control region as their target. As an aside, results so far with mitochondrial surveys including DNA barcoding generally show very low variation within most animal species and no relationship between intraspecific variation and census population size. In any case, a 700 bp fragment of mtDNA control region was amplified with a single pair of primers. The two extraction methods gave similar results.  DNA was amplified and sequenced from 100% of the contemporary samples and 15/34 (44%) historical samples. 700 bp is a relatively long sequence to amplify from historical samples, suggesting it may be possible to obtain standard COI barcodes (648 bp) from museum skeletons of sharks and bony fish, which would be particularly useful for those species which are rare or otherwise difficult to collect. A standard set of fish primers (see for example Hubert et al June 2008 PLoS ONE) amplifies COI barcode region from most fish (more than 5,000 species so far, including including representatives of all major divisions of Chondrichthyes (cartilaginous fish) and Osteichthyes (bony fish), both marine and freshwater).

To date most fish specimens are preserved in formaldehyde, which makes routine DNA recovery difficult or impossible. If DNA can be recovered from skeletons, there are many museum specimens that might be used. For example, the American Museum of Natural History Icthyology Department collection includes over 35,000 fish skeletons as compared to about 2,500 tissue samples so far.

DNA differences first step in describing new spider species

In August 2008 Sys Biol (open access!) researchers from East Carolina University apply mtDNA analysis as the necessary first step in defining three new species of trapdoor spider, previously subsumed as a single species Aptostichus atomarius. According to Bond and Stockman, “the genus Aptostichus is species rich, consisting of 30+ species (most undescribed) found predominantly throughout southern California.” I note that online World Spider Catalog version 9.0, 2008, lists four Aptostichus species, all described between 1891 and 1919, so apparently there is a lot of more work to be done, including updating the reference lists. One of the new names is A. stephencolberti, which led to what must be the first appearance of a spider taxonomist on national television (link to TV episode).

The authors describe the challenge for delimiting species in these California trapdoor spiders: “Highly structured, genetically-divergent, yet morphologically homogeneous species (eg nonvagile cryptic species[my note: nonvagile refers to organisms with limited dispersal]), although often ignored or overlooked, provide one of the greatest challenges to delimiting species. Populations, or very small groups of populations constitute diverent genetic lineages but present somewhat of a contradiction because they lack the “requisite” characteristics” often used when delimiting species. Morphological approaches to species delimitation in many of these groups grossly oversimplify and underestimate diversity; in short these traditional applications fail if our interests extend beyond what can simply be diagnosed with a visual and/or anthropormorphic-based assessment.”

So on the one hand, these spiders comprise multiple genetically distinct lineages (up to 24% sequence difference in 12S/16S mtDNA) with geographically restricted ranges; on the other hand, they all look more or less alike. How to decide which are species? The authors apply “cohesion species concept” by asking if the lineages are “genetically and/or ecologically interchangeable.” The authors provide helpfully provide explicit details of their decision making process. The short version is that genetically distinct, geographically disjunct lineages are counted as separate species, and parapatric or sympatric lineages are counted as different species only if they are NOT “ecologically interchangeable (EI).” EI is calculated from a defined set of ecological and climatic parameters.

Under some criteria, the authors note these spiders could be split into “more than 20” [or even] “~60” groups, which they describe as “an unreasonable number of species-level lineages.” This conjecture may be true; I hope that more scientists apply similarly explicit criteria for species delimitation as described here so we can learn more about how finely divided biodiversity is, in addition to our judgment about what is a “reasonable” number of species. Genetics is a powerful window into biology, of course. In birds the frequency of extra-pair matings (up to 96% pairs and 75% offspring in fairy wrens, for example (Double and Cockburn 2000)) was unsuspected until genetic testing was applied to parents and offspring. 

The genetic framework in this study is based on 1300 bp of 12S/16S mtDNA (167 individuals, 75 locations), plus 905 bp nuclear ITS sequence in a subset of 22 individuals. Looking ahead, I hope that in their next study of spider phylogeography the authors include COI as an mtDNA locus (full-length sequence is 1500 bp, so that would likely have given the same phylogenetic signal as 12S/16S); this would enable the authors and others to combine their data with the reference COI DNA barcode databases.

I close with an observation about spider genetic data. To my eye, there are surprisingly few genetic data on spiders so far. A search in GenBank for Order Araneae (spiders) shows 9,445 sequences (representing any gene) from 1,852 species (4.6% world total of 40,432 species (World Spider Catalog)). Looking at mitochondrial genes, there are 2,629 COI sequences from 1,071 species (2.6% world) and 2,268 12S/16S sequences from 1,041 species (2.6% world). Thus it appears that only about 1/40th of world’s spiders have a uniform gene locus deposited in GenBank, and on average, only 2 individuals per species have been sequenced. The Spider Tree of Life project plans to sequence 50 loci (including COI and 12S/16S) from about 500 species, so that will help. I hope that arachnologists will follow the approach in this paper and include a standard genetic locus (most usefully COI) as part of species descriptions and analyze multiple individuals per species. Among other applications, this might help identify currently unidentifiable juvenile forms, like the wind-blown “little aeronaut[s]” that arrived on silk threads in vast numbers on the Beagle when it was sixty miles distant from land, November 1, 1832 (Voyage of the Beagle).

Wired, Scientific American highlight DNA-based future of species identification

In October 2008 Wired reporter Gary Wolf profiles birth and rapid growth of standardized DNA-based species identification (ie DNA barcoding). His article centers around time spent in Costa Rica with Dan Janzen, Winnie Hallwachs, and their band of parataxonomists in Area de Conservacion Guanacaste; additional legwork includes visits to worried taxonomists at University of California Berkeley (“Honestly, I never thought it would get this far,” says Kipling Will), and University of Guelph, Ontario. He concludes with an evocative analogy: “barcodes are not just devices to put names on animals; they are also clever traps to catch all the people in the world whose curiosity impels them toward data as if toward light.”

An article in October 2008 Scientific American, with Sci Am’s trademark excellent illustrations, (web version; pdf) examines hows and whys of DNA-based future of species identification (I am co-author with Paul Hebert). After discussing the many practical applications for identifying known species, we  conclude with our own analogy: “Just as the speed and economy of aerial photography caused it to supplant ground surveys as the first line of land analysis, DNA barcoding can be a rapid, relatively inexpensive first step in species discovery.”

DNA plus morphology speeds taxonomy

In May 2008 PLoS One researchers from California Academy of Sciences and University of Guelph analyze morphology and COI barcodes of Madagascar ants in genera Anochetus and Odontomachus.  Their taxonomic revision is “based on arthropod surveys in Madagascar that included over 6,000 leaf litter samples, 4,000 pitfall traps, and 8,000 additional collecting events…from 1992 to 1996”–phew!  Researchers Fisher and Smith used COI sequence data of 501 individuals to speed their analysis and provide an accessible reference for future work. 

First, COI barcodes enable associating the various caste forms including males and females within species. Second, DNA barcodes provide an additional tool for matching names with type specimens. For example, Meusnier et al have recently applied broad-range primers to amplify a 130 base pair “universal mini-barcode” (this lies within the 648 full-length COI barcode sequence). The mini-barcode can more easily be amplified from older museum material with partly degraded DNA, and usually contains enough sequence information to associate older specimens with more recently collected material. Third, distinct genetic clusters within morphologically undifferentiated ant species suggest avenues for future study. Fourth, DNA barcodes establish a method for future workers, not skilled in ant morphology, to identify specimens. For example, not many persons will be able to recognize males of Malagasy Anochetus by “shortest distance between lateral ocellus and margin of compound eye smaller than maximum length of ocellus. Petiolar node as seen from front or rear with lateral corners rounded, without acute spine or sharp tooth.” There are multiple high-resolution photos of each described species posted on AntWeb; I find these just as mysterious as the text descriptions.

As a test of how DNA barcoding might work for the interested ant novice, I collected the tiny specimen at left in Rincon, Puerto Rico, and submitted its COI barcode to BOLD ID engine. This gave 100% match to Paratrechina longicornis, and on the corresponding  Encyclopedia of Life page, I learned the common name is “Crazy ant”, an invasive species found worldwide, plus found many interesting links including to AntWeb P. longicornis pages. It was amusing to learn that Crazy ants overran Biosphere II and were one factor leading to demise of the project (link to NYT article). Of course not all 100+ Paratrechina spp are in BOLD, and there may be a closely-related species with similar or identical COI barcode sequences as P. longicordis, so more work is needed to build up the database!

Worried taxonomists discover quality control

In 9 September 2008 Proc Natl Acad Sci USA researchers from Brigham Young University and University of South Carolina report that nuclear pseudogenes, if not excluded from analysis, can confuse COI DNA barcoding studies.  To my reading, this study re-iterates a well-understood hazard and proposes remedies that are already standard in most phylogenetic DNA work including DNA barcoding. 

Pseudogenes, first described by Jacq, Miller, and Brownlee in 1977, are non-functional genes that presumably arose from ancient duplication events and subsequent loss of function through accumulation of mutations. In sequencing studies, pseudogenes of protein coding genes are usually easily distinguished from their functional counterparts as they harbor insertions, deletions, and/or point mutations that interrupt the reading frame.

Pseudogenes derived from mitochondrial DNA, often called numts (nuclear copies of mtDNA) were first reported by Gellissen et al in 1983. A search of NCBI PubMed for “mitochondrial pseudogenes” shows 282 articles and 12 review articles over the past 25 years.

Song and colleagues analyzed mitochondrial COI sequences in grasshoppers (single individuals of four species representing different Acrididae subfamilies) and cave crayfish (119 individuals of four species in genus Orconectes collected at 56 localities in southeastern US).  Most of the analyses involved sequencing of cloned PCR products, which adds a level of complexity and is unlike any DNA barcoding study I am aware of. To skip to the conclusion, the authors emphasize that if numts generated by PCR amplification of mtCOI are NOT excluded, then it will confuse DNA barcoding or other phylogenetic studies. Since most of the numts generated in this study were easily recognized I do not understand why they did so much work (in all they sequenced 125 grasshopper clones and 560 crayfish clones) to reach this sensible but obvious conclusion.

First, grasshoppers. The authors amplified a subsegment of the COI barcode region (439 vs 648 bp in full-length barcode region; shorter amplicons are more likely to represent pseudogenes). The amplified products from the four individual grasshoppers were cloned, and 30 clones/species were sequenced, generating an average of 15 unique haplotypes per species. Of these, 97.3% had stop codons, meaning they could be immediately excluded as not representing true mtCOI sequences.  A full-length barcode sequence was amplified from 1 species, and cloned products yielded 19 paralogues (ie obvious pseudogenes).

Second, crayfish. The researchers amplified the full-length COI barcode region from 172 individuals using Folmer primers. “For 93 individuals, we were able to obtain clean COI sequences; however, 79 individuals from southern populations of O. australis and O. barri yielded ambiguous sequences.” To my reading, the next step would be to stop there and find different primers or PCR conditions that did not generate ambiguous sequences (indicating that more than one COI-like template was being amplified). Instead the authors proceeded to clone products from individuals that yielded ambiguous results and also from those with clean sequences “to determine whether numts were present but not being detected without cloning.” Not surprisingly, they found probable numts in all 4 species of crayfish, and interestingly some of the clones did NOT contain stop codons (ie might be mistaken for functional COI sequences). These apparent numts, which might be easily overlooked, came from the 2 species with ambiguous results on sequencing of uncloned products, which I take as further evidence that it would have been better to develop a different COI amplification protocol, assuming the goal is to accurately determine the barcode sequence.

bold figureAmong other quality control standards in Barcode of Life Database (BOLD), COI sequences with stop codons, such as found in most pseudogenes in this study, are automatically flagged, signalling the researcher to re-check the data.  

Finally, it may be that some of what the authors call numts instead reflect heteroplasmy, ie differences among individual mitochondrial DNAs. Like static noise generated when you turn the volume up all the way, cloning is likely to reveal various mutations in some of the 10^17 or so mitochondrial genomes present in eukaryotic organisms. Looking ahead, it seems to me that the authors have missed an opportunity to contribute protocols or sequences that could be applied by other researchers to DNA barcoding of grasshoppers or crayfish.

NY Times again per K&L

Sushigate continues! The discovery of inaccurately labeled fish by Kate Stoeckle and Louisa Strauss (see 22 August 2008 What’s New entry) evoked a NY Times sequel story where chefs claimed supreme expertise, but then today Edward Dolnick countered with a wise Op-Ed “Fish or Foul” about the judgment of experts.