DNA Barcoding – Page 19 – The Rockefeller University

www.iBarcode.org: web tools for sequence analysis

July 8, 2009

cloud In 16 june 2009 BMC Bioinformatics researchers from University of Guelph report on web platform for DNA barcode analysis, www.iBarcode.org. The site works with aligned barcode files in standard .fas format, such as produced by MEGA or BOLD. Registration is not required; the site keeps track of files you have uploaded.

According to authors Singer and Hajibabaei, iBarcode is designed to “allow the user to manage their barcode datasets, cull out non-unique sequences, identify haplotypes within a species, and examine the within- to between-species divergences.” iBarcode provides several clever, easy-to-use tools and I look forward to further refinements.
.
.
.
.
.
.

Lizard mitochondria converge on snakes–why?

June 30, 2009

In 2 june 2009 Proc Natl Acad Sci USA researchers from 5 American universities report on convergent molecular evolution among agamid lizards and snakes. In constructing a nuclear and mitochondrial DNA phylogeny of squamates (snakes and lizards), Castoe and colleagues noted their data placed agamid lizards as sister to snakes, rather than within lizard clade Iguania, as supported by prior work including morphology. The apparently aberrant phylogenetic placement was due to similarity among mitochondrial genomes of agamid lizards and snakes; nuclear genes recovered the established tree. Most of the aberrant signals were in first and second codon positions in protein-coding genes, and thus associated with similarity in predicted amino acid sequences among agamids and snakes. These convergent changes were distributed across all 13 mitochondrial protein-coding genes, but were clustered particularly in COXI and ND1.

The authors conclude that there was an ancient adaptive episode in the ancestors of today’s agamid lizards, which led to a snake-like mitochondrial genome. I note this conclusion is based on analyzing just 2 of the more than 350 species in 52 genera in Agamidae. Are these changes universal in Agamidae? There are 2 more complete agamid mitochondrial genomes in GenBank which could be examined; of additional interest would be to see if the same convergent changes are found in the 253 COI sequences from 88 agamid species in 11 genera in BOLD. As in this study, phylogenetic reconstruction usually involves just a few representatives of each lineage, which means that evolutionary patterns may remain invisible. I expect that BOLD will be an increasingly useful resource to expand the scope of phylogenetic studies utilizing mitochondrial DNA.

The conclusion that these findings represent convergent adaptive evolution is strong, yet it is also puzzling, as at first glance there doesn’t seem to be any special morphological or life-style resemblance between snakes and agamids as compared to other lizards. Perhaps we need to keep an open mind for other seemingly unlikely mechanisms, such as eukaryotic horizontal gene transfer.

Poisonous fish revealed

June 16, 2009

What fish is that you are eating? This question has many possible answers. Unlike meats, which are derived from a handful of species, most of which are farmed, there are numerous fish sold for human consumption, most of which are wild. The US FDA Regulatory Fish Encyclopedia and the Canadian Food Inspection Agency lists of approved fish and shellfish include approximately 1700 and 660 names, respectively. And yet DNA surveys regularly turn up fish in the marketplace that are not on any regulatory list, as well as mislabeling of those that are listed, suggesting we may not know what we are eating or what fish stocks are being harvested.

fish-soup In addition to economic and environment impact, mislabeling can have public health implications. In April 2009 J Food Protection government and research scientists report on 2 cases of tetrodotoxin poisoning in Chicago, IL resulting from ingestion of soup prepared from mislabeled puffer fish, sold as “monkfish.” Two additional cases were traced to the same supplier and this led to the recall of several thousand pounds of frozen fish. Morphologic examination of leftover parts and DNA testing of the cooked meat implicated Lagocephalus sp., most likely Green roughed-back puffer L. lunaris. Unlike most other toxic puffer species, L. lunaris tetrodotoxin is in muscle as well as organ tissue, making safe preparation impossible. At the time of the study, there were no reference sequences in BOLD for L. lunaris, so the DNA barcode identification was incomplete. It would be of interest to repeat the database searches (as of today GenBank contains 1 L. lunaris COI sequence and BOLD taxonomy browser lists 2), but for some reason the sequences obtained by the researchers were not published.

DNA testing is the only way to identify many of the fish items in the marketplace. I expect that standardized DNA testing (aka DNA barcoding) will play an increasingly important role in helping protect both consumers and fish.

DNA helps reveal bat diets

June 9, 2009

What do carnivorous animals eat? Predation drives evolution and underlies ecology, yet except for a few easily observed species, it is surprisingly hard to determine what eats what. In June 2009 Mol Ecol, researchers from University of Guelph and University of Western Ontario, Canada, apply DNA testing to help solve diet of Eastern red bat Laiurus borealis. L. borealis is the commonest tree-roosting bat in North America, ranging from Canada and United States east of the Rocky Mountains into Central and northern South America. Like other insectivorous bats, L. borealis uses echolocation to detect night-flying insects. Many moth species have evolved “ears” that detect the ultrasonic sounds emitted by bats and exhibit defensive behaviors in response to echolocation signals, making bats and moths an interesting study in predator-prey co-evolution.

Clare and co-workers applied standardized DNA testing to insect parts in faecal samples collected from 56 mist-net trapped bats. Guano samples were frozen for up to 2 y then soaked in 95% ethanol for 12 h and examined with a dissecting microscope. Prey items including “legs, wings, antennae, eye cases, exoskeletal fragments, eggs” were isolated and stored separately in 96 well-plates. DNA extraction, amplification, and sequencing were performed using standard techniques and broad-range insect primers (LepF1/LepR1). COI sequences were compared to the 127,000 reference sequences of North American arthropods in BOLD database www.barcodinglife.org at the time of the study. Test sequences with >/=99% identity to reference sequence(s) and without equivalent similarity to other species in the database were given species-level identifications; those with less than 99% identity to reference sequence(s) were assigned to higher-level taxonomic categories.

bat-diets Clare et al obtained sequence data from 89% of 896 arthropod fragments; 78% of these were identified to species or genus level (the remaining 22% showed sequence similarity to bacteria, fungi, or were unidentifiable or chimeric), with a total of 127 prey species identified (125 insects, mainly lepidoptera including a number of economically important pest species, and 2 spiders). The “molecular scatology” approach documented greater diversity in prey species than prior studies based on morphologic analysis. Most prey were identified only once, with an average of 3.5 species per guano sample. Surprisingly, “more than 60% [of recovered insects] appear to have ears capable of hearing the echolocation hunting calls of L. borealis.” The authors speculate the abundance of eared moths might reflect bats hunting around streetlights, as moths in such brightly-lit environments are thought to use daytime predator-avoidance strategies rather than nocturnal responses to echolocation. There was a notable absence of actiid and tortricid moths, given their local abundance, suggesting these moths may have alternative predator-avoidance strategies.

This study documents the diversity of L. borealis prey, and hints at how much more we will learn from broad application of standardized DNA analysis to food chains, including such unexpected findings as possible disruptive effects of man-made lighting on local ecosystems.

Biggest tree so far

May 26, 2009

Phylogenetic tree-building programs are the workhorses of evolutionary analysis. Thus it might be surprising that, given there are at least 1.7 million named species of plants and animals, output trees with over 1000 taxa are exceptional. The primary reason is computational–the number of possible arrangements rises logarithmically with input taxa (eg for 1000 taxa, ~10^2500 possible trees; Tamura et al 2004), such that standard algorithms, even those that sample a fraction of “tree space,” are too slow. As a result, so far the Tree of Life has been constructed by concatenating multitudes of trees each built with relatively small numbers of taxa. This is unsatisfying and possibly unreliable.

In May 2009 Cladistics researchers from Argentina and Sweden report on the largest tree to date–73,060 eukaryotic taxa, essentially everything Goloboff and colleagues could find in GenBank, ranging from algae and protozoans to flowering plants and vertebrates. In addition to size, there were several remarkable features. The tree was constructed from just 13 genes, each of which was sequenced for a subset of the total (750 to ~20,000 taxa), plus 604 morphologic characters that applied across most of the data set. Nearly all (92%) of the cells in the resulting data matrix (73,060 taxa x 9535 characters) were empty due to lack of data. Nonetheless, the parsimony analysis recovered most eukaryotic groups down to the level of order as monophyletic taxa. The analysis utilized TNT software previously developed (and made publicly available) by Goloboff and colleagues and took 2.5 months on 3 desktop computers (total 96 GB RAM, 16 x 3 Ghz processors). To manage the flow of data, nearly all steps were automated from extracting, labeling, and aligning GenBank sequences to analyzing monophyly of groups at various taxonomic levels.

Looking ahead, the authors see biggest challenges not in tree-building, but in alignment software and “that the sequence information required is simply non-existent, and the morphological information is scanty and fragmentary.” I know that a short segment of a single mitochondrial gene is considered insufficient for phylogeny, but it would be interesting to see what TNT could do with 40,777 COI sequences from 6,506 fish species (FishBOL), for example. I imagine that even TNT might have trouble analyzing all 603,002 COI sequences of the 57,159 species represented in BOLD (with many more to come). Phylogenetic trees are established as the goal of evolutionary analysis, but we may need alternate methods for analyzing differences and similarities in very large data sets.

Potatoes challenge taxonomists

May 17, 2009

In 7 May 2009 Amer J Botany, David Spooner, scientist at USDA and University of Wisconsin, applies DNA barcoding to wild potatoes. According to the author, “the taxonomy of sect. Petota [section Petota is a subdivision within genus Solanum which comprises wild and domesticated potatoes] is complicated by interspecific hybridization, introgression, allopolyploidy, a mixture of sexual and asexual reproduction and possible recent species divergences.” As an aside, this one genus Solanum contains over 1500 species, including such seemingly diverse plants such as nightshades, horsenettles, tomatoes, and eggplants. While the most speciose bird genera, for example, have fewer than 100 species, Solanum is one of at least 50 plant genera with over 500 species (Pelser et al 2002 Am J Botany). Such large genera are unwieldy for constructing phylogenies and testing DNA-based identification methods–do they reflect biological differences in rates of speciation among genera, or a lack of phylogenetic knowledge?

The above summary of Petota taxonomy is an understatement of the confusion regarding species boundaries in wild potatoes. For one, the apparent number of taxa seems to be shrinking rapidly: “an account of post-1990 taxonomic decisions of many workers published in Spooner and Salas (2006) reduced the 232 species of Hawkes (1990) to 190, but a taxonomic decision in my laboratory is converging on about 110 species.” Second, experts can be perplexed: “members of the complex are so similar that even experienced potato taxonomists…provided different identifications for identical collections numbers of the Solanum brevicaule complex in fully 38% of cases.” Third, genetic analysis (including multiple studies in the author’s laboratory) has been little help so far: “single- to low-copy nuclear restriction fragment length polymorphism (nRFLPs) and random amplified fragment length (RAPD) data…and amplified fragment length polymorphism (AFLP) data failed to clearly differentiate many wild species in the complex.” Independent work by researchers in the Netherlands (Jacobs et al 2008) similarly documents a challenging lack of concordance between genetics and taxonomy in Petota sp. Jacobs and colleagues performed AFLP analysis (this screens the entire nuclear genome) on 951 accessions representing 196 Petota species. Of the 196 taxa, multiple accessions of species clustered together in 58 cases, 38 formed multiple clusters, and 48 were mixed with accessions of other species. Regarding higher-level groupings, these researchers found absence of support for 4 Petota clades proposed by Spooner and colleagues, and conclude that recent speciation and high levels of hybridization will likely challenge attempts to create a genetic taxonomy of wild potatoes.

Given the above background, one might guess that a minimalist approach (ie DNA barcoding) using 2 or 3 plastid genes might not distinguish among Petota species whose underlying taxonomy and genetics are so jumbled. Thus I am puzzled why the author went to the trouble of performing this study, and why, having set out to do so, he analyzed only a single plastid gene (trnH-psbA spacer) when all recent plant barcoding studies I am aware of are based on a combined analysis of 2 or 3 plastid genes. The author also analyzed ITS nuclear gene segment (approximately 800 nucleotide segment containing ITS 1, 5.8S rRNA, and ITS2). This is interesting, although for some reason the phylogenetic analysis looked at ITS segment and trnH-psbA individually. I believe there is general understanding that a single barcode region will not suffice for distinguishing land plants. Lastly, I am puzzled why only 23 of 63 species analyzed were represented by multiple accessions. The author asserts “many barcoding studies lack robust assessments of intraspecific polymorphism or assessments of all species within a genus that are needed to assess the species-specific nature of barcodes;” as a general criticism I believe this comment is incorrect, but it does apply to the present study.

To summarize the study, 104 accessions of 63 Petota species plus 10 accessions of 9 outgroup species were analyzed (the author does not comment as to whether the selections are drawn from the revised total of 110 Petota species as defined in his laboratory). Regarding ITS, 23 species were represented by more than one accession; of these 10 species formed monophyletic lineages, which seems surprisingly good species-level resolution for a single marker in plants. With trnH-psbA, 17 species were represented by more than one accession; of these only 2 formed separate clades (1 of which did not form a distinct clade with ITS); as above, combined analysis was not done. The author dismisses matK on the basis of two previously published sequences for Petota sp. Finally, the trees used parsimony not neighbor-joining, the latter being the usual first-pass method of looking at barcode data. I find this paper a haphazard assessment of DNA barcoding in a taxonomically intensively-studied but poorly understood group.

High rates of horizontal gene transfer in archaea and eubacteria mean that it is not possible to draw clear species boundaries. It may be that relationships among potato species are similarly complex, and that species boundaries are fuzzier than the current taxonomy of morphologically-defined species would suggest. It seems to me that more taxonomic and genetic work is needed on this important group, including better tests of barcoding with combined analysis of 2 or 3 of the standard plastid regions in multiple accessions from a larger number of species. The goal of a standardized minimalist approach to identifying species, including wild potatoes, is important to help move beyond having only experts being able to identify plant species.

A diversity of open access DNA barcoding articles

April 30, 2009

The entire May 2009 Mol Ecol Res “Special Issue on Barcoding Life” is open access, thanks to support from Genome Canada and NSERC. As an aside, Mol Ecol Res publisher Wiley-Blackwell, which puts out over 1400 journals, charges $3000 US per article for open access, as compared to, for example, $1300 in PLoS ONE (all articles open access), and $1200 (plus $70/page) for open access option in Proc Natl Acad Sci USA. If funders mandate open access for publications based on research they support, then either this differential will disappear, or many manuscripts will migrate to lower cost journals. The special barcoding issue is based on Canadian Barcode of Life Network Scientific Symposium held at the Royal Ontario Museum in April 2008 and includes 27 articles on topics ranging from methodology to applications in creatures great and small including fungi and plants.

Most DNA barcoding analyses look at DNA identification through the lens of established taxonomy, ie how well does sequence data capture the species-level taxonomic categories established by morphologic analysis? In the special issue article “DNA barcoding and the mediocrity of morphology” researchers from York University and University of Guelph look at the comparison the other way around–how well does morphology identify the sorts of specimens that can be distinguished by DNA-based methods, barcoding in particular? In Packer and colleagues’ analysis, morphology comes up short “in numerous important situtations such as the association of larvae with adults and discrimination among cryptic species.” Taking an example not entirely at random, the authors analyze a key to Agathidium genus slime mold beetles co-authored by a sometime skeptic of barcoding (Miller and Wheeler, 2005) (this key made popular news as 3 of newly described beetles were named in tribute to then current US government officials–A. bushi, A. cheneyi, A. rumsfeldi). As is common in keys to insect identification, the reliance on adult male characters, usually genitalia, means that females and immature forms often cannot be identified to species (for the 3 USG namesakes, the key states “female not examined” and there is no description of immature forms). Again typical of insect keys, there is no documentation of intraspecific variation in diagnostic characters (for A. cheneyi, “the holotype is the only specimen examined of this species”). As a result, Packer and colleagues note “the morphological equivalent of the barcode gap that enables molecular identification of species cannot be calculated using traditional approaches, and the sample size of illustrations upon which measures of intraspecific variation might be estimated usually averages one per species with zero variance.”

I hope that future keys for slime mold beetles will include DNA barcode sequences. This will enable anyone, scientists and public alike, with access to a DNA sequencer to identify A. cheneyi adults of both sexes, larvae, fragments in the guts of predators, and perhaps eggs in random leaf litter samples.

Coaxing DNA out of ancient insects and sediments

April 25, 2009

Deep space telescopes gather light from the early universe, providing pictures of the unimaginably remote past. What about the biological universe–can we peer back in time? Geochemical evidence suggests life on Earth arose about 3.5 billion years ago and fossils reveal what life looked like as far back as 3.0 billion years, and important fossil discoveries across that whole span of time continue to be made. What about DNA? As Carl Woese first realized, DNA sequences of living organisms contain signatures of their evolutionary relationships, and enable reconstructing history as far back as the origin of replication, even before cells and DNA. At the near end of the time scale, recovery of DNA from historical samples can help identify organisms that lived hundreds, thousands, tens of thousands, or even, in a few cases so far, hundreds of thousands years ago.

In April 2009 PLoS ONE ten researchers from university centers in Denmark, United Kingdom, United States, Canada, Russia, and New Zealand report on non-destructive recovery of diagnostic DNA from ancient insect specimens. As an aside, PLoS ONE is an important sea change in scientific publishing. First of all, as described on their website, the journal “features reports of original research from all disciplines within science and medicine. By not excluding papers on the basis of subject area, PLoS ONE facilitates the discovery of the connections between papers whether within or between disciplines.” Second, it puts the judgement of importance in the hands the scientific community where it belongs:

“Too often a journal’s decision to publish a paper is dominated by what the Editor/s think is interesting and will gain greater readership — both of which are subjective judgments and lead to decisions which are frustrating and delay the publication of your work. PLoS ONE will rigorously peer-review your submissions and publish all papers that are judged to be technically sound. Judgments about the importance of any particular paper are then made after publication by the readership (who are the most qualified to determine what is of interest to them).”

This is so sensible it is surprising it has not happened earlier! There is of course a place for journals like Nature and Science, but I expect that a great deal of scientific publishing will migrate to PLoS ONE, with benefits to the authors and the scientific community.

Back to the paper. Thomsen and colleagues first tested a non-destructive extraction method (Gilbert et al 2007 PLoS ONE 2:e272) on museum beetle specimens. This involves overnight incubation with gentle agitation in a digestion buffer at 55^o C. Remarkably, the specimens emerged none the worse for the wear. The researchers recovered 77-204 bp segments of mtCOI from all of 20 beetles, which were collected as early as 1825 (1/3 were over 100 years old). Using a Bayesian approach that generates taxonomic assignments with probability estimates, these short fragments were sufficient for identification to species in most cases; the remainder could be assigned to family or genus level. The researchers then applied this same technique to insect chitin (exoskeleton) fragments preserved in permafrost dating from about 7,000 to over 47,000 years before present (BP). Here only 3 of the 14 (21%) samples (10,000-26,000 y BP) yielded amplifiable DNA, with Bayesian assignments to family or order level. Although the authors appear to have hoped for higher success, this seems pretty remarkable to me. They speculate that destructive sampling might have produced higher yields.

Saving what might be the best for last, Thomsen and colleagues tested non-frozen sediment samples that lacked visible insect parts collected in New Zealand caves and dated 1800 to 3280 years BP. Using a more or less standard extraction protocol developed by some of the authors (Willerslev et al 2003 Science 300:791), 96 bp fragments of COI (1 beetle, 1 butterfly) were recovered from 2 of 3 samples tested. The authors drily note “although the non-frozen sediment DNA approach involves destructive sampling, it has the advantage that the material is the sediment itself, which is usually abundant, and normally not too valuable to process.”

I conclude that if bits of DNA are preserved in ancient dirt then DNA from the past and present must be all around us. Perhaps single molecule sequencing methods will reveal an even greater abundance and diversity of DNA in environmental samples.

Dinoflagellate diversity revealed by DNA

April 12, 2009

Peering into the vast diversity of life beyond multicellular eukaryotes (animals, plants, and fungi) is dizzying. In March 2009 Applied Environ Microbiol researchers from University of Connecticut assess dinoflagellate diversity with mitochondrial DNA sequencing. Dinoflagellates are unicellular, often photosynthetic, mostly marine plankton characteristically having two flagella and encased in a segmented hardened exterior. Dinoflagellate blooms are the cause of red tides, and dinoflagellate toxins ingested by fish and shellfish are the cause of ciguatera and paralytic shellfish poisoning. For unknown reasons, some species are bioluminescent when mechanically stimulated, producing glowing displays when perturbed by waves, fish, or kayakers, for example.

As a first step toward creating a reference library, Lin and colleagues compiled mtDNA sequences from 49 dinoflagellate species representing six orders (this included 20 COI and 60 cytochrome b sequences; 12 of the latter were newly obtained in this study). As there are about 2500 named dinoflagellate species, this is a sparsely-populated reference library so far. In addition, there were multiple samples from just 5 species, so intraspecific variation is not yet well-studied. As an aside, I note that most of the published and new sequences were derived from strains maintained at Pravasoli-Guillard National Center for Culture of Marine Phytoplankton (CCMP). There is no explicit mention of CCMP in the paper or GenBank depositions, although a plankton specialist would probably recognize the source from sample designations. More generally, there is no formal documentation of taxonomic identifications (eg collection sources for cultures or photographs for environmental samples and/or individual who performed identifications). Although this is not unusual in taxonomic papers, it seems to me that identifications should be as well documented as for example PCR conditions.

In preparing the reference library, the researchers were unable to develop primers that amplified the barcode region of COI efficiently (ie the primers worked with some species and not others) and instead focused on cytochrome b using a primer pair that amplified a 385 bp segment. The primer difficulty is surprising given that COI is usually more conserved than cyt b (including in dinoflagellates), which should make it easier to design broad-range primers.

The researchers then analyzed pooled environmental DNA samples prepared by filtering water specimens collected during different months at 3 marine stations in Long Island Sound and at a freshwater retention pond (Mirror Lake) on the University of Connecticut campus. While PCR products from monospecific cultures were sequenced directly, those from environmental samples were first cloned, and then 20 to 50 clones from each water sample were sequenced (total clones analyzed 450).

Lin and co-workers obtained a large number of distinct haplotypes from the environmental samples; by my inspection of their phylogram nearly all of the clones (>420) were unique. Only a small minority could be assigned to known species or genera. On the technical side, the authors used a complex model of nucleotide substitution (TVM+G) to calculate differences among haplotypes and UPGMA to create trees, so their distance results and trees are not directly comparable to those in most DNA barcoding papers, which use K2P- or p-distances to calculate differences and neighbor-joining to create trees. In any case, according to the authors, the sequence results consistently showed greater diversity than was detected through microscopic analysis, “likely caused by the much higher detection sensitivity of PCR than of microscopic counting and by some genotypes that could not be discriminated morphologically.” The authors conclude “[w]hen a broader cob [cyt b] database becomes available, the taxon-resolving power of this gene would certainly increase.” I hope they or others will also develop efficient primer sets for amplifying COI in addition to cyt b.

Looking ahead, the reference library can be augmented relatively inexpensively by analyzing mtDNA sequences of the 2400 strains at CCMP. However, the mtDNA diversity in this study suggests dozens of new species from just 4 sampling sites around Connecticut, implying the global total of undescribed species is very large. This suggests a need for some sort of “automated species identifier”: a machine approach that would sort samples into individual cells, then photograph, sequence, apply MOTU-type analysis, for example. In the meantime, it may be necessary to work with pooled sequences from environmental samples, as is done for bacterial communities, without attempting to delineate species.

DNA sorts out bewildering morphology

March 31, 2009

DNA helps flag genetically divergent forms that may represent cryptic species and is equally valuable the other way around: in linking morphologically diverse forms that occur within species. In 20 jan 2009 Biol Lett, researchers from National Museum of Natural History, Washington, DC; Australian Museum, Sydney; Virginia Institute of Marine Science; University of Tokyo; and Natural History Museum, Tokyo, solve the mystery of “the most extreme example of ontogenetic morphoses and sexual dimorphism in vertebrates.”

Johnson and colleagues examined specimens of small (body size 4-408 mm) deep water (1000-4000m) fishes thought to represent 3 families in the order Stephanoberyciformes (whalefish and relatives). The authors analyzed morphology and whole mitochondrial genomes from 34 individuals of 16 species including representatives of all 5 whalefish families. They found three whalefish “families” are one: “Mirapinnidae (tapetails), Megalomycteridae (bignose fishes), and Cetomimidae (whalefishes), are larvae, males and females, respectively of a single family Cetomimidae.” These are strange-looking fish–the males, which do not feed as adults, are sustained by enormous livers, and the minute larvae have streamers up to 75 cm. For fun, see deep ocean video of live female whalefish swimming (and narration of the amazed icthyologists) in supplementary material. Next up is to link the three life stages of each species; here DNA will help along with meristic data (quantitative features such as number of fins or scales).

Rockefeller University

Program for the Human Environment

Area of Research: DNA Barcoding