Barcoding Nemo

How does one collect tropical reef fish without leaving North America? In July 2009 PLos ONE researchers from University of Guelph report on genetic diversity in SE Asian tropical reef fish, collected without plane fares or permits. How did they do it? Steinke and colleagues analyzed “dead on arrival” marine fish imported into Canada for the ornamental pet trade from various locations in SE Asia. A total of 1631 specimens representing 391 named species were frozen, imaged on a flatbed scanner, and a muscle tissue sample was taken for COI analysis. This is remarkable on several counts. First, the large number of species–according to FAO report cited by Steinke, “some 800 marine fish species, representing about 5% of all marine taxa, are involved in this trade, with 70% of sales directed to North America,” and estimated revenue of $200-$300 million annually. Second, this study surveys genetic biodiversity in reef fishes, provides a practical method for identification, and at the same time provides insight into what is probably the major threat to their survival. I am reminded of near extinction of Common egrets in North America in the late 1800’s as a result of hunting for plumes in women’s hats. This led to a popular uprising among women of fashion, who pledged not to wear such clothing, organizing what were the first  “Audubon Societies” and successfully petitioning for legislative change, saving egrets and many other birds. Nemo and other reef fish may need a similar campaign.

charactersBack to the study, Steinke and colleagues found distinct barcodes among 384/391 (98.2%); 9 species displayed 2 or 3 distinct clusters, most of which were allopatric. Review of these potential “splits” revealed possible inappropriate synonymization in several cases. On the other side, 2 pairs and 1 triplet of species were not distinguished by DNA barcodes using distance. I look more closely at one of these examples, butterfly fishes Chaetodon multicinctus and C. punctatofasciatus, to see if there might be diagnostic characters whose signal is swamped by intraspecific variation. As in figure, there are 2 possibly diagnostic differences among this species pair. Of course, this sort of analysis only works for known species, but I wonder how many other species pairs/sets with  “overlapping” barcodes have diagnostic differences.

Voucher and collection information in GenBank records

A core tenet of DNA barcoding initiative, beginning with the first workshops in 2003, is that reference sequences should be linked to vouchered specimens stored in museums, so that data can be re-checked. This also provides visibility to collections. For example, “GenBank DQ433554 Crotophaga ani voucher KU 89123 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial”  contains voucher information in the title and the record itself, at least for those who know “KU” refers to University of Kansas. The GenBank file contains a “LinkOut” to the BOLD page which spells out the collection name. The GenBank file (and the BOLD record) could also include a “LinkOut” to the museum itself, although I do not find examples of this feature being used.

bold-systems-specimen-record

More generally, is collection information available in GenBank records? Taking birds as an example, there are 475,273 GenBank avian records; eliminating the five most-represented species (Chicken, Turkey, Mallard, Zebra Finch, Fairy Wren) leaves 108,766 sequences, of which about half (48,915) contain the word “voucher.” This sounds promising but my unscientific sample suggests most entries in the “voucher” field are cryptic designations that do not identify the institution storing the specimen. I tried searching by acronyms for some of the larger collections. Louisiana State University has the largest avian tissue collection in the world with about 40,000 specimens; searching “LSU AND aves[organism] AND voucher” returned only 1,148 records, which seems likely to underrepresent the museum’s contribution. Results for some other large collections were higher but still appear to be incorrectly small considering there are 100,000+ avian GenBank records: (Burke Museum (UWBM) 3,318; Field Museum (FMNH), 2,593;  American Museum of Natural History  (AMNH), 1,994;  Smithsonian (USNM), 1,920; University of Kansas (KU), 684 records).

I conclude that researchers and collections will benefit from following practices promoted by DNA barcode initiative for GenBank records including taking advantage of GenBank’s “LinkOut” feature.

www.iBarcode.org: web tools for sequence analysis

cloudIn 16 june 2009 BMC Bioinformatics researchers from University of Guelph report on web platform for DNA barcode analysis, www.iBarcode.org. The site works with aligned barcode files in standard .fas format, such as produced by MEGA or BOLD. Registration is not required; the site keeps track of files you have uploaded.

According to authors Singer and Hajibabaei, iBarcode is designed to “allow the user to manage their barcode datasets, cull out non-unique sequences, identify haplotypes within a species, and examine the within- to between-species divergences.” iBarcode provides several clever, easy-to-use tools and I look forward to further refinements.
.
.
.
.
.
.

Lizard mitochondria converge on snakes–why?

https://en.wikipedia.org/wiki/Central_Bearded_DragonIn 2 june 2009 Proc Natl Acad Sci USA researchers from 5 American universities report on convergent molecular evolution among agamid lizards and snakes. In constructing a nuclear and mitochondrial DNA phylogeny of squamates (snakes and lizards), Castoe and colleagues noted their data placed agamid lizards as sister to snakes, rather than within lizard clade Iguania, as supported by prior work including morphology. The apparently aberrant phylogenetic placement was due to similarity among mitochondrial genomes of agamid lizards and snakes; nuclear genes recovered the established tree. Most of the aberrant signals were in first and second codon positions in protein-coding genes, and thus associated with similarity in predicted amino acid sequences among agamids and snakes. These convergent changes were distributed across all 13 mitochondrial protein-coding genes, but were clustered particularly in COXI and ND1.

The authors conclude that there was an ancient adaptive episode in the ancestors of today’s agamid lizards, which led to a snake-like mitochondrial genome. I note this conclusion is based on analyzing just 2 of the more than 350 species in 52 genera in Agamidae. Are these changes universal in Agamidae? There are 2 more complete agamid mitochondrial genomes in GenBank which could be examined; of additional interest would be to see if the same convergent changes are found in the 253 COI sequences from 88 agamid species in 11 genera in BOLD. As in this study, phylogenetic reconstruction usually involves just a few representatives of each lineage, which means that evolutionary patterns may remain invisible. I expect that BOLD will be an increasingly useful resource to expand the scope of phylogenetic studies utilizing mitochondrial DNA.

The conclusion that these findings represent convergent adaptive evolution is strong, yet it is also puzzling, as at first glance there doesn’t seem to be any special morphological or life-style resemblance between snakes and agamids as compared to other lizards. Perhaps we need to keep an open mind for other seemingly unlikely mechanisms, such as eukaryotic horizontal gene transfer.

Poisonous fish revealed

What fish is that you are eating? This question has many possible answers. Unlike meats, which are derived from a handful of species, most of which are farmed, there are numerous fish sold for human consumption, most of which are wild. The US FDA Regulatory Fish Encyclopedia and the Canadian Food Inspection Agency lists of approved fish and shellfish include approximately 1700 and  660 names, respectively. And yet DNA surveys regularly turn up fish in the marketplace that are not on any regulatory list, as well as mislabeling of those that are listed, suggesting we may not know what we are eating or what fish stocks are being harvested.

fish-soupIn addition to economic and environment impact, mislabeling can have public health implications. In April 2009 J Food Protection government and research scientists report on 2 cases of tetrodotoxin poisoning in Chicago, IL resulting from ingestion of soup prepared from mislabeled puffer fish, sold as “monkfish.”  Two additional cases were traced to the same supplier and this led to the recall of several thousand pounds of frozen fish. Morphologic examination of leftover parts and DNA testing of the cooked meat implicated Lagocephalus sp., most likely Green roughed-back puffer  L. lunaris. Unlike most other toxic puffer species, L. lunaris tetrodotoxin is in muscle as well as organ tissue, making safe preparation impossible. At the time of the study, there were no reference sequences in BOLD for L. lunaris, so the DNA barcode identification was incomplete. It would be of interest to repeat the database searches (as of today GenBank contains 1 L. lunaris COI sequence and BOLD taxonomy browser lists 2), but for some reason the sequences obtained by the researchers were not published.

DNA testing is the only way to identify many of the fish items in the marketplace. I expect that standardized DNA testing (aka DNA barcoding) will play an increasingly important role in helping protect both consumers and fish.

DNA helps reveal bat diets

What do carnivorous animals eat? Predation drives evolution and underlies ecology, yet except for a few easily observed species, it is surprisingly hard to determine what eats what. In June 2009 Mol Ecol, researchers from University of Guelph and University of Western Ontario, Canada, apply DNA testing to help solve diet of Eastern red bat Laiurus borealisL. borealis is the commonest tree-roosting bat in North America, ranging from Canada and United States east of the Rocky Mountains into Central and northern South America. Like other insectivorous bats, L. borealis uses echolocation to detect night-flying insects. Many moth species have evolved “ears” that detect the ultrasonic sounds emitted by bats and exhibit defensive behaviors in response to echolocation signals, making bats and moths an interesting study in predator-prey co-evolution. 

Clare and co-workers applied standardized DNA testing to insect parts in faecal samples collected from 56 mist-net trapped bats. Guano samples were frozen for up to 2 y then soaked in 95% ethanol for 12 h and examined with a dissecting microscope. Prey items including “legs, wings, antennae, eye cases, exoskeletal fragments, eggs” were isolated and stored separately in 96 well-plates. DNA extraction, amplification, and sequencing were performed using standard techniques and broad-range insect primers (LepF1/LepR1). COI sequences were compared to the 127,000 reference sequences of North American arthropods in BOLD database www.barcodinglife.org at the time of the study. Test sequences with >/=99% identity to reference sequence(s) and without equivalent similarity to other species in the database were given species-level identifications; those with less than 99% identity to reference sequence(s) were assigned to higher-level taxonomic categories. 

bat-dietsClare et al obtained sequence data from 89% of 896 arthropod fragments; 78% of these were identified to species or genus level (the remaining 22% showed sequence similarity to bacteria, fungi, or were unidentifiable or chimeric), with a total of 127 prey species identified (125 insects, mainly lepidoptera including a number of economically important pest species, and 2 spiders). The “molecular scatology” approach documented greater diversity in prey species than prior studies based on morphologic analysis. Most prey were identified only once, with an average of 3.5 species per guano sample. Surprisingly, “more than 60% [of recovered insects] appear to have ears capable of hearing the echolocation hunting calls of L. borealis.” The authors speculate the abundance of eared moths might reflect bats hunting around streetlights, as moths in such brightly-lit environments are thought to use daytime predator-avoidance strategies rather than nocturnal responses to echolocation. There was a notable absence of actiid and tortricid moths, given their local abundance, suggesting these moths may have alternative predator-avoidance strategies. 

This study documents the diversity of L. borealis prey, and hints at how much more we will learn from broad application of standardized DNA analysis to food chains, including such unexpected findings as possible disruptive effects of man-made lighting on local ecosystems.

Biggest tree so far

Phylogenetic tree-building programs are the workhorses of evolutionary analysis. Thus it might be surprising that, given there are at least 1.7 million named species of plants and animals, output trees with over 1000 taxa are exceptional. The primary reason is computational–the number of possible arrangements rises logarithmically with input taxa (eg for 1000 taxa, ~10^2500 possible trees; Tamura et al 2004), such that standard algorithms, even those that sample a fraction of “tree space,” are too slow. As a result, so far the Tree of Life has been constructed by concatenating multitudes of trees each built with relatively small numbers of taxa. This is unsatisfying and possibly unreliable.

In May 2009 Cladistics researchers from Argentina and Sweden report on the largest tree to date–73,060 eukaryotic taxa, essentially everything Goloboff and colleagues could find in GenBank, ranging from algae and protozoans to flowering plants and vertebrates. In addition to size, there were several remarkable features. The tree was constructed from just 13 genes, each of which was sequenced for a subset of the total (750 to ~20,000 taxa), plus 604 morphologic characters that applied across most of the data set. Nearly all (92%) of the cells in the resulting data matrix (73,060 taxa x 9535 characters) were empty due to lack of data. Nonetheless, the parsimony analysis recovered most eukaryotic groups down to the level of order as monophyletic taxa. The analysis utilized TNT software previously developed (and made publicly available) by Goloboff and colleagues and took 2.5 months on 3 desktop computers (total 96 GB RAM, 16 x 3 Ghz processors). To manage the flow of data, nearly all steps were automated from extracting, labeling, and aligning GenBank sequences to analyzing monophyly of groups at various taxonomic levels.

Looking ahead, the authors see biggest challenges not in tree-building, but in alignment software and “that the sequence information required is simply non-existent, and the morphological information is scanty and fragmentary.” I know that a short segment of a single mitochondrial gene is considered insufficient for phylogeny, but it would be interesting to see what TNT could do with 40,777 COI sequences from 6,506 fish species (FishBOL), for example. I imagine that even TNT might have trouble analyzing all 603,002 COI sequences of the 57,159 species represented in BOLD (with many more to come). Phylogenetic trees are established as the goal of evolutionary analysis, but we may need alternate methods for analyzing differences and similarities in very large data sets.

Jesse awarded an honorary doctorate

Dalhousie University bestows an honorary doctorate on Jesse, really an honor for everyone who has contributed to the work of the ‘Program for the Human Environment’ for the past 20 years.  We post Jesse’s Convocation address, titled “Son et lumiere“, discussing environmental dimensions of sound and light.

p.s. On 23 May 2009 Anne McIlroy of the Toronto Globe and Mail reported on Jesse’s address (p. F5)

Oceans speak volumes. Sound spreads widely in the world’s oceans, and the clamour of human activity reaches every cove, says Jesse Ausubel, director of the Human Environment program at Rockefeller University in Manhattan.

“Motors and propellers are noisy; so are jet skis and oil-and-gas exploration. In fact, we make the oceans three decibels noisier each decade”, he says. In a convocation address this week at Dalhousie University in Halifax, Dr. Ausubel proposed turning down the volume for four hours in an International Quiet Ocean Experiment. That would be enough time for thousands of researchers around the world to see how sea creatures respond to pre-industrial noise levels, he says. Would whales, for example, change the frequencies they use to communicate? If we can quiet things down, would they return to their normal, natural frequency rather than deepening their voices or raising their voices? he said an interview.

Dr. Ausubel has experience with ambitious, large-scale scientific projects. He played an important role in creating the Encyclopedia of Life, an online catalogue of the species on Earth, and was also involved in establishing the Census for Marine Life, an international program to chart life in the oceans by 2010.

Scientists from around the world who are interested in his Quiet Ocean Experiment will get together for their first meeting before the end of the year .

Dr. Ausubel acknowledges how difficult it will be to get four noise-free hours. Navies and the world’s maritime industries would have to be on board. “Maybe the time to do it would be Christmas Day,” he says. “We would like to inconvenience people as little as possible”.