DNA Barcoding – Page 22 – The Rockefeller University

Wired, Scientific American highlight DNA-based future of species identification

September 21, 2008

In October 2008 Wired reporter Gary Wolf profiles birth and rapid growth of standardized DNA-based species identification (ie DNA barcoding). His article centers around time spent in Costa Rica with Dan Janzen, Winnie Hallwachs, and their band of parataxonomists in Area de Conservacion Guanacaste; additional legwork includes visits to worried taxonomists at University of California Berkeley (“Honestly, I never thought it would get this far,” says Kipling Will), and University of Guelph, Ontario. He concludes with an evocative analogy: “barcodes are not just devices to put names on animals; they are also clever traps to catch all the people in the world whose curiosity impels them toward data as if toward light.”

An article in October 2008 Scientific American, with Sci Am’s trademark excellent illustrations, (web version; pdf) examines hows and whys of DNA-based future of species identification (I am co-author with Paul Hebert). After discussing the many practical applications for identifying known species, we conclude with our own analogy: “Just as the speed and economy of aerial photography caused it to supplant ground surveys as the first line of land analysis, DNA barcoding can be a rapid, relatively inexpensive first step in species discovery.”

DNA plus morphology speeds taxonomy

September 14, 2008

In May 2008 PLoS One researchers from California Academy of Sciences and University of Guelph analyze morphology and COI barcodes of Madagascar ants in genera Anochetus and Odontomachus. Their taxonomic revision is “based on arthropod surveys in Madagascar that included over 6,000 leaf litter samples, 4,000 pitfall traps, and 8,000 additional collecting events…from 1992 to 1996”–phew! Researchers Fisher and Smith used COI sequence data of 501 individuals to speed their analysis and provide an accessible reference for future work.

First, COI barcodes enable associating the various caste forms including males and females within species. Second, DNA barcodes provide an additional tool for matching names with type specimens. For example, Meusnier et al have recently applied broad-range primers to amplify a 130 base pair “universal mini-barcode” (this lies within the 648 full-length COI barcode sequence). The mini-barcode can more easily be amplified from older museum material with partly degraded DNA, and usually contains enough sequence information to associate older specimens with more recently collected material. Third, distinct genetic clusters within morphologically undifferentiated ant species suggest avenues for future study. Fourth, DNA barcodes establish a method for future workers, not skilled in ant morphology, to identify specimens. For example, not many persons will be able to recognize males of Malagasy Anochetus by “shortest distance between lateral ocellus and margin of compound eye smaller than maximum length of ocellus. Petiolar node as seen from front or rear with lateral corners rounded, without acute spine or sharp tooth.” There are multiple high-resolution photos of each described species posted on AntWeb; I find these just as mysterious as the text descriptions.

As a test of how DNA barcoding might work for the interested ant novice, I collected the tiny specimen at left in Rincon, Puerto Rico, and submitted its COI barcode to BOLD ID engine. This gave 100% match to Paratrechina longicornis, and on the corresponding Encyclopedia of Life page, I learned the common name is “Crazy ant”, an invasive species found worldwide, plus found many interesting links including to AntWeb P. longicornis pages. It was amusing to learn that Crazy ants overran Biosphere II and were one factor leading to demise of the project (link to NYT article). Of course not all 100+ Paratrechina spp are in BOLD, and there may be a closely-related species with similar or identical COI barcode sequences as P. longicordis, so more work is needed to build up the database!

Worried taxonomists discover quality control

September 5, 2008

In 9 September 2008 Proc Natl Acad Sci USA researchers from Brigham Young University and University of South Carolina report that nuclear pseudogenes, if not excluded from analysis, can confuse COI DNA barcoding studies. To my reading, this study re-iterates a well-understood hazard and proposes remedies that are already standard in most phylogenetic DNA work including DNA barcoding.

Pseudogenes, first described by Jacq, Miller, and Brownlee in 1977, are non-functional genes that presumably arose from ancient duplication events and subsequent loss of function through accumulation of mutations. In sequencing studies, pseudogenes of protein coding genes are usually easily distinguished from their functional counterparts as they harbor insertions, deletions, and/or point mutations that interrupt the reading frame.

Pseudogenes derived from mitochondrial DNA, often called numts (nuclear copies of mtDNA) were first reported by Gellissen et al in 1983. A search of NCBI PubMed for “mitochondrial pseudogenes” shows 282 articles and 12 review articles over the past 25 years.

Song and colleagues analyzed mitochondrial COI sequences in grasshoppers (single individuals of four species representing different Acrididae subfamilies) and cave crayfish (119 individuals of four species in genus Orconectes collected at 56 localities in southeastern US). Most of the analyses involved sequencing of cloned PCR products, which adds a level of complexity and is unlike any DNA barcoding study I am aware of. To skip to the conclusion, the authors emphasize that if numts generated by PCR amplification of mtCOI are NOT excluded, then it will confuse DNA barcoding or other phylogenetic studies. Since most of the numts generated in this study were easily recognized I do not understand why they did so much work (in all they sequenced 125 grasshopper clones and 560 crayfish clones) to reach this sensible but obvious conclusion.

First, grasshoppers. The authors amplified a subsegment of the COI barcode region (439 vs 648 bp in full-length barcode region; shorter amplicons are more likely to represent pseudogenes). The amplified products from the four individual grasshoppers were cloned, and 30 clones/species were sequenced, generating an average of 15 unique haplotypes per species. Of these, 97.3% had stop codons, meaning they could be immediately excluded as not representing true mtCOI sequences. A full-length barcode sequence was amplified from 1 species, and cloned products yielded 19 paralogues (ie obvious pseudogenes).

Second, crayfish. The researchers amplified the full-length COI barcode region from 172 individuals using Folmer primers. “For 93 individuals, we were able to obtain clean COI sequences; however, 79 individuals from southern populations of O. australis and O. barri yielded ambiguous sequences.” To my reading, the next step would be to stop there and find different primers or PCR conditions that did not generate ambiguous sequences (indicating that more than one COI-like template was being amplified). Instead the authors proceeded to clone products from individuals that yielded ambiguous results and also from those with clean sequences “to determine whether numts were present but not being detected without cloning.” Not surprisingly, they found probable numts in all 4 species of crayfish, and interestingly some of the clones did NOT contain stop codons (ie might be mistaken for functional COI sequences). These apparent numts, which might be easily overlooked, came from the 2 species with ambiguous results on sequencing of uncloned products, which I take as further evidence that it would have been better to develop a different COI amplification protocol, assuming the goal is to accurately determine the barcode sequence.

Among other quality control standards in Barcode of Life Database (BOLD), COI sequences with stop codons, such as found in most pseudogenes in this study, are automatically flagged, signalling the researcher to re-check the data.

Finally, it may be that some of what the authors call numts instead reflect heteroplasmy, ie differences among individual mitochondrial DNAs. Like static noise generated when you turn the volume up all the way, cloning is likely to reveal various mutations in some of the 10^17 or so mitochondrial genomes present in eukaryotic organisms. Looking ahead, it seems to me that the authors have missed an opportunity to contribute protocols or sequences that could be applied by other researchers to DNA barcoding of grasshoppers or crayfish.

CSI for fish: High school students help showcase ease of DNA-based species identification

August 26, 2008

In Pacific Fishing September 2008 (issue available on newstands, not yet on web) two New York City teenagers, Kate Stoeckle (my daughter!), 19, and classmate Louisa Strauss, 18, apply DNA-based identification to fish sold in their Manhattan neighborhood. The girls purchased 60 items from 14 establishments and sent samples to University of Guelph where graduate student Eugene Wong performed DNA barcode analysis. 14 (25%) of 56 samples with recoverable DNA were mislabeled, in all cases as more expensive or more desirable fish. Mislabeled items were sold at 2 of 4 restaurants and 6 of 10 grocery stores/fish markets.

The frequency of mislabeling and the ease of high-schoolers obtaining DNA-based species identifications captured public interest. Their study was featured on page 1 in New York Times on August 22, one of the “quotes of the day” on CNN/Time website, a live segment on CBS TV Early Show, an interview on national public radio, and has appeared in over 350 print and news items and blogs from 34 countries in 10 languages so far, with particularly heavy coverage in China, Korea, and Japan, presumably related to dietary importance of fish in general and sushi in particular.

This response demonstrates how powerful the FishBOL database is already (30,665 barcodes from 5,463 species so far, which represents about 20% of world fish), and hints at enormous uses that DNA barcoding will have as technology gets smaller and cheaper.

Summer hiatus

July 28, 2008

I will be away from Blog until mid-August.

Helping reveal relationships among species

July 27, 2008

COI barcodes aim to enable identification of species, assigning unknown specimens to known species, and helping flag genetically divergent organisms that may represent new species. Might barcodes also help understand relationships among species?

Here I look at one example from birds, comparing differences among COI barcodes to a recently revised phylogeny of terns (subfamily Sternini). According to American Ornithologists’ Union Check-list of North American Birds Supplement 47 (2006), “the data show that the genus Sterna as currently defined…is paraphyletic.”…”[W]e follow the recommendation of Bridge et al 2005 to resurrect four generic names currently placed in synonomy with Sterna.” The figure at left, taken from the 2005 paper by Bridge, Jones, and Baker, shows phylogenetic relationships based on 2800 bp of mtDNA from 33 species of terns (Bayesian tree with ML distances and ML boostrap support indices), and is juxtaposed to an NJ tree of COI barcodes from 29 of the same 33 taxa. The figure is colored according to the revised generic assignments (AOU 2006).

The topology of the COI NJ tree is similar to the larger data set tree, including that all currently recognized genera are reciprocally monophyletic, and most show similarly high boostrap values as in the Bayesian/ML analysis based on the larger data set.

Of course mitochondrial DNA is widely used in analyzing relationships among animal species, including birds. Most of these studies are focused on relatively small groups of species, such as the tern study cited here. With growing DNA barcode libraries it will be increasingly possible to get at least a preliminary look at genetic relationships for large numbers of species (so far 2,393 avian species (24% of world birds) have barcode records in BOLD). This could be exciting!

Mitochondrial DNA’s unique power

July 20, 2008

The DNA barcode for animals is a 648 base pair (bp) fragment from the 5′ end of mitochondrial gene cytochrome c oxidase subunit I (COI). Does this relatively short mitochondrial sequence contain enough information to make evolutionary inferences about species limits, or is it a more of a rough survey method that needs to be confirmed by more data including from nuclear genes?

In April 2008 Mol Ecol researchers from University of Minnesota and American Museum of Natural History, New York, analyze utility of mitochondrial as compared to nuclear DNA for inferring recent evolutionary history. Zink and Barrowclough first apply population genetic theory and then look at real data from bird species.

Based on mathematical population genetics, they find “mitochondrial loci are generally a more sensitive indicator of population structure than are nuclear loci,” primarily due to much smaller effective population size (Ne) for mitochondrial as compared to nuclear markers, which leads to more rapid sorting of differences among genetically isolated populations. Analysis of real-world data in 45 studies of differences among and within avian species confirms this expectation, ie either the patterning is consistent between mitochondrial and nuclear genes, or there are shallow mtDNA trees which are not yet reflected in nuclear genes. Reanalysis of one study, which appears to show a split in nuclear but not mitochondrial markers, suggests possible misinterpretation. Regarding other factors that could potentially lead to mistaken inferences about species limits based on mitochondrial DNA (NUMTs, sex-biased gene flow, introgression), experimental data suggests these are rarely important, at least in birds. The authors conclude “mtDNA patterns will prove to be robust indicators of population history and species limits.” Nuclear markers ARE important for deep gene trees, for detecting hybrids, and for “quantitative estimates…of rates of population growth and values of gene flow.”

Regarding length of mitochondrial sequence, this only has to be long enough to capture differences among closely-related species. Most populations that we recognize as species differ from their closest relatives by >1% in mitochondrial coding regions (corresponding to about 0.5 million years or more of reproductive isolation). At this level, even 100 bp is generally sufficient to distinguish most closely-related species, and a 648 bp COI barcode sequence should generally allow resolution of populations/species which have been reproductively isolated for much shorter periods of time.

Identifying the unidentifiable

July 13, 2008

In July 2008 Wildlife Management researchers from Smithsonian Institution report on identifying otherwise unidentifiable remnants from bird-aircraft collisions (hereafter birdstrikes). Authors Dove et al point out “birdstrikes are a serious safety hazard and a major expense for the industry”. The US Federal Aviation Agency Wildlife Mitigation site shows about 600 incidents a month over the past year, peaking in late summer and early fall, presumably coincident with fall migration. The Smithsonian Institution has been identifying birdstrike species for military and civil aviation industries since the 1960s, analyzing specimens which range from whole carcasses to bits of feathers, tissue, or blood. Prior to availability DNA testing, identifications have relied on expert examination of detailed feather morphology with comparisons to Smithsonian’s vast bird specimen collection.

Of 1,715 birdstrike samples sent to Smithsonian Insitution during 4 months in fall 2006, 821(47.9%) contained only blood or tissue. Of these, 554 (67.5%) had amplifiable mtCOI DNA, and 535 (96.6%) with DNA led to species-level identifications based on reference sequences in Barcode of Life Database (BOLD). DNA barcoding identified 128 species representing 14 orders of birds, plus 2 bat species. 19 cases were deemed inconclusive as DNA barcode matched to a set of 2 or more closely-related species with overlapping barcodes, or the recovered sequence did not meet their 98% match criteria when compared to BOLD.

There was much better success recovering DNA from dry samples (70%) than from samples collected with a wet paper towel (about 23%), which had been the standard method, pointing the way toward improving yield of DNA-based ID. The authors conclude with a call for applying “a combination of morphological and molecular methods such as DNA barcoding for efficient, cost-effective birdstrike identifications”.

Just as in CSI television series, DNA-based identification can make possible what would otherwise be impossible; in this case, identifying birds from bits of tissue and blood and making birdstrike identifications available to those without access to Smithsonian’s experts or vast collections. In addition to helping airlines, birdstrike ID will inform our knowledge of bird migration routes. There are many exciting discoveries ahead.

DNA identifies invasive parasitic wasp

July 7, 2008

Like the creatures Sigourney Weaver battles in Alien, parasitoids are organisms whose larva develop in other species, usually leading to the death of the host. Insect parasitoids are widely used as biological control agents; sometimes these efforts go awry, threatening non-pest species in local ecosystems. Widespread introduction of tachnid fly parasitoid Compsilura concinnata has failed to control Gyspy moth Lymantra dispar outbreaks in eastern US, but has led to dramatic declines in large, showy Silk moths including the beautiful Luna moth Actias luna (Elkinton and Boettler. 2004).

About 10% of named insect species are parasitoids, mostly wasps, but recognizing these often minute insects can be tricky. In November 2007 Conservation Genetics researchers from Czech Academy of Sciences and University of South Bohemia, Czech Republic; Natural History Museum, London; and Imperial College London apply COI DNA sequencing to identify wasps parasitizing Canary Islands Large White butterfly Pieris cheiranthi, which is restricted to local endemic ecosystem of relict laurel forests. Lozan et al reared 55 P. cheiranthi caterpillars from 2 Canary Island sites, and found half of the larva from forest margin and none from central forest were parasitized with what appeared to be Cotesia glomerata, native to Europe and introduced elsewhere as biocontrol agent although not in Canary Islands.

3 of 600 C. glomerata-like adult wasps reared from Canary Island White larvae and 2 of 700 C. glomerata reared in Czech Republic from European Large White P. brassicae larvae were analyzed and found to have identical 5′ COI DNA sequences (this is the same region selected as a DNA barcode for animals). The authors conclude that European C. glomerata has been accidentally introduced to Canary Islands and is threatening a local endemic butterfly already under pressure from habitat loss. Without mentioning DNA barcoding by name, the authors conclude with a call for “increased effort to sequence morphological Costesia spp. from a broad geographical range…enabling the regular testing of species hypotheses…and the incorporation of all life stages using a single character set”. I hope that the authors can join forces and enable their sequences and associated metadata (eg collection location, specimen photographs, voucher information) from this and future Cotesia spp work to be usefully combined with growing COI barcode database (>415,000 COI barcode records from >41,000 species in BOLD so far, including 514 records from 89 named and provisional Cotesia spp). Looking ahead, routine application of DNA-based identification to parasitoids will help establish host ranges of potential biocontrol agents and detect inadvertent introduction of broad-range parasitoids that damage local ecosystems.

Freshwater fish DNA data debut

June 22, 2008

In June 2008 PLoS ONE, thirteen researchers from nine Canadian universities, museums, and federal agencies report on mtDNA sequences from 1360 individuals representing 195 (95%) of Canada’s 205 freshwater fish species. Hubert et al follow “best practices” established for DNA barcode records (similar criteria would enhance the value of other genetic reference data as well), namely each sequence is derived from a vouchered specimen and the barcode record includes:

“Bi-directional sequences of at least 500 base-pairs from the approved barcode region of COI, containing no ambiguous sites
Links to electropherogram trace files available in the NCBI Trace Archive
Sequences for the forward and reverse PCR amplification
Species names that refer to documented names in a taxonomic publication or other documentation of the species concept used
Links to voucher specimens using the approved format of institutional acronym:collection code:catalog ID number”

The researchers analyzed an average of 7.6 specimens/species, with an effort to sample across species ranges. A first pass look at genetic distances among and within Canadian freshwater fish shows results similar to those of other animal groups: average variation within species, 0.3%; average minimum distance between congeneric species (nearest neighbor), 8.3%; species with overlapping mtDNA sequences, 7% (4 species pairs and 1 flock of 5 species; one of the overlapping species pairs represents probable introgression. ) Five species showed divergent clusters differing by 1-2% in different parts of their geographic ranges, and 2 species showed larger divergences (3%, 7%); some or all of these might represent distinct species.

A challenge for science publishing is disseminating the large data sets that are increasingly generated. Restricting publication to only those studies with novel findings can lead to a kind of distortion, sometimes with serious consequences. The bias against negative studies, for example, is one factor contributing to the misculation of risks of medicines. As biodiversity genetics moves forward, we need ways to ensure high-quality work, receive appropriate academic credit, and disseminate results in a timely manner. PLoS ONE describes itself as “an international, peer-reviewed, open-access, online publication…that welcomes reports on primary research from any scientific discipline.” It seems to me that this sort of forum with a focus on quality rather than novelty is needed as a home for publication of large genetic data sets including DNA barcode records. Making this information available in a timely manner will in turn help drive development of analytic and display tools and enable scientific applications, such as identification of fish eggs and larva shown above.

Rockefeller University

Program for the Human Environment

Area of Research: DNA Barcoding