A Scalable Method for Analysis and Display of DNA Sequences

Together with colleagues at Mt. Sinai School of Medicine, we report a new mathematical approach to the genetic structure of biodiversity, using indicator vectors calculated from short DNA sequences. Sirovich L, Stoeckle MY, Zhang Y (2009) A Scalable Method for Analysis and Display of DNA Sequences. PLoS ONE 4(10): e7051. This method is scalable to the largest datasets envisioned in this field and provides a macroscopic view of “biodiversity space”. It offers a complement to tree-building techniques and could enable automated classification at various taxonomic levels.

From the Abstract:

The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA data.

To download zip files containing MatLab code and datasets utilized in this paper, select the following links:

Finding out what small herbivores eat

What do animals eat? For many animals other than large, diurnal, terrestrial species, this is surprisingly hard to study. In August 2009 Frontiers Zool researchers from Norway and France apply standardized DNA analysis, and compare with microscopic techniques, for diets of two arctic voles, Microtus oeconomus (Tundra vole) and Myodes rufocanus (Grey red-backed vole) collected in July and September in northern Norway. Soininen and colleagues analyzed stomach contents of 48 individuals using a microscope and a DNA sequencer, the latter to analyze amplified P6 loop (length 10-46 bp) of chloroplast trnL intron. As previously described by some of the same authors (Taberlet et al 2007) P6 loop is amplifiable from diverse gymnosperms and angiosperms with a single set of primers, however not surprisingly this very short segment often does not provide species-level identification even with local flora.

varanger2

For microhistological analysis, the authors first prepared a photographic guide by collecting samples of all vascular plant species in study area; the samples were dried, scraped to reveal epidermis, bleached, boiled in table vinegar, then 40x micrographs were taken. Stomach content samples were filtered, bleached, and 1 droplet was examined on a microscope slide, counting 25 bits of identifiable material; if >95% of material was unidentifiable, a new slide was prepared. In 4 individuals, no slide with adequate amount of microscopically identifiable material count be made. For DNA analysis, The P6 loop was amplified, using tagged primers that identified each individual, and the pooled material was analyzed by pyrosequencing, and the sequences were compared to a database of 842 species representing “all widespread and/or ecologically important taxa of the arctic flora”. With standardized DNA approach (the authors call this DNA barcoding although it does not use recently agreed-upon standard loci) “75% of sequences were identified at least to genus level, whereas with microhistological method, less than 20% of the identified fragments could be specified at this level”.

As a result of greater resolution as compared to microscopy, DNA identified more plant species and genera in vole diets (for M. oeconomus, 13 species/9 genera vs 9 species/5 genera; for M. rufocanus 17 species/8 genera vs 11/7). Both methods showed large variation among individuals. Limitations to DNA approach include possible overrepresentation of species with chloroplast-rich tissues and inability of P6 to detect fungi, horsetails, and mosses. Looking ahead, researchers conclude “DNA-based technology makes it possible to study vole-plant interaction by non-destructive sampling of faeces in the natural habitats of voles”, first identifying rodent species using a mitochondrial DNA marker (and potentially sex and individual identification with Y-chromosome and microsatellite detection) and then diet analysis. I conclude standardized DNA analysis opens wide avenues for ecology.

Counting zooplankton diversity with DNA

net1Marine zooplankton comprise an enormous mass of diverse organisms distributed throughout the world’s oceans from deep waters to surface. Zooplankton include representatives of at least dozen phyla, some of which are larval forms of much larger animals, and challenge identification with their diversity and tiny size.  In current BMC Genomics (open access) researchers from University of Tokyo and Osaka Medical College, as part of Census of Marine Zooplankton (CMarZ) program of the Census of Marine Life (CoML),  apply single-gene sequencing to the task. Machida and colleagues collected at a Micronesia site using a single pass with 2m^2 plankton net from depth of 721 meters to surface, obtaining 60 mL of of zooplankton (large organisms, up to 4 cm, were discarded). Rather than direct DNA sequencing, the researchers isolated mRNA from the pooled sample and constructed a cDNA library from which they analyzed 1,336 inserts. The rationale for these extra steps was to avoid sequencing pseudogenes present in genomic DNA (but not transcribed into mRNA). It would be interesting to know if this strategy was based on experience or is a theoretical precaution.

1471-2164-10-438-18Machida and colleagues found evidence for 189 species, only 10 of which could be confidently matched to reference sequences. This report demonstrates that this sort of “kitchen blender” approach, which has previously been applied largely to bacterial and archaeal communities, shows promise for assemblages of eukaryotes and reveals surprisingly few organisms have reference sequences in databases. Identified organisms included several copepods as well as presumably larval forms of Sthenoteuthis oualaniensis (Purple-back flying squid) and Coryphaena hippurus (Common dolphinfish)!

Species identification by DNA opens major avenues for for ecosystem research. The NJ tree at left suggests that even in absence of close matches, 500 bp of mtDNA is sufficient to sort most specimens into appropriate higher-level groups. To better understand the changing oceans, we need biological monitoring machines akin to physical instruments for studying weather and climate, which routinely monitor thousands of sites. It seems to me the only practical way to monitor biological “weather” is by repeatedly sampling species assemblages at multiple points, and particularly in aqueous environments, automated species identification with DNA will be an important analytic method.

DNA data to help save bushmeat animals

Harvesting wild animals for sale as food is a large, mostly illegal business that threatens wild animal populations and puts humans at risk for exotic infections, witness the SARS outbreak in 2003. Regulations and treaties exist, but before these can be enforced, one needs to establish the species origin of bushmeat and other derived marketplace products. Here DNA can help. In 1 September 2009 Conservation Genetics (open access article) researchers from University of Colorado, Barnard College, and American Museum of Natural History describe DNA barcodes for 23 species of South American and Central African primates, ungulates, and reptiles regularly harvested for bushmeat. Equally important as the DNA sequences, Eaton and colleagues report high success (179/204 samples (87.7%)) with primer cocktails first developed for fish DNA barcoding by Ivanova et al 2007, demonstrating these can serve as universal vertebrate primer cocktails. Intraspecific variation was low (mean 0.24%) and differences among congeneric species was generally high (average 9.77%), making assignment to known species straightforward using either tree-based maximum likelihood or character methods.
bushmeat-composite
This report is focused on documenting barcodes of bushmeat species, using well-identified vouchered specimens (1 vouchered specimen labeled as Melanosuchus niger (Black caiman) was found to be Caiman yacare (Yacare caiman).  The researchers did test a handful of unknown or partially identified specimens; all with recoverable COI sequences could be assigned to known species in the data set using the tree-based or character methods as described. Remarkably, Eaton and colleagues were able to recover COI DNA from 1 of 5 leather goods, which had been impounded by the US Fish and Wildlife Service as likely of CITES species origin. This proved to be Crocodylus niloticus (Nile crocodile). Recovering DNA from leather suggests many unsuspected household items have legible DNA barcodes.

I only wish the research report could have included pictures–there is so much more we might learn. There is an AMNH webpage describing the project which has several interesting images, although these are unlabeled and not referred to by the text. Perhaps we need a “mash-up” utility into which one could insert a scientific paper, which then would pull in relevant material–images, maps, links. Along these lines, there is a very neat Encyclopedia of Life NameLink utility which automatically detects scientific names and inserts hyperlinks to relevant EOL pages–try it!

Coach and the Fly

Are USA carbon dioxide emissions now around their “natural” peak?  Following up on our postings 589 and 585 on 20 April 2009, we post a trio of figures that suggest that the energy system is decarbonizing as expected and in spite of the buzzing of pundits and politicians and policymakers, who remind us of the fly in the classic fable of Jean de la Fontaine (1621-1695), the Coach and the Fly.

The Coach and The Fly

Jean de la Fontaine (translation from French by Jesse Ausubel)

On a climbing, bad, sandy road,
Exposed to the sun on all sides
Six strong horses pulled a coach.
Women, a monk, old people had gotten out.
The team of animals sweated, panted, was spent.
A fly turned up, and approached the horses;
And pretends to excite them by its buzzing.
Stings one, stings another, and thinks at this moment
That she makes the machine go.
She sits on the shaft, on the nose of the coachman;
As soon as the coach moved,
And she saw the people walk,
She attributes uniquely to her herself all the glory;
Go, come, hurry; it seems the fly could be
A battle sergeant going to each spot
Making his troops advance, and hasten victory.
She cries she acts alone, and she has all the worries;
That no one helps the horses to pull out of this mess.
The Monk recites his breviary;
He takes his time!  A woman sings;
As if this is a question of singing songs!
Lady Fly goes to buzz in their ears,
And does a hundred equally silly things.
After much work the coach arrives on top.
Let’s breathe now, says the fly immediately:
I have done so much that our people have finally reached the plateau.
So, my good horses, pay me for my pain.
Thus, certain people, eager to impress,
Introduce themselves into affairs:
They play the busybody everywhere,
And, everywhere importuning, they must be chased off.

Tracing invaders with DNA

Saint-Gervais-les-Bains_fg22The horse-chestnut leaf miner moth Cameraria ohridella (link to Encyclopedia of Life species page), first described as an apparent endemic in Macedonia in 1984, has steadily expanded its range over the past 25 years, turning once attractive stands of horse-chestnut trees in many urban areas across Europe into unsightly arrays. The damage results from larvae feeding on the leaf interior (ie “leaf mining), causing extensive mottling and leaf loss. C. ohridella is an “invasive pest” in Europe and the subject of an international symposium in Prague in 2004 aimed at identifying biocontrol methods. For such a well-known and important organism, one might expect that scientific information would be readily available. As above, there is an excellent EOL species page, but I was unable to find the original species description online (Deschka G, Dimic N. 1986. Acta Entomologica Jugoslavica, 22, 11-23). A number of museums and universities have print copies of this journal, and I could request a photocopy through Rockefeller University inter-library loan, although of course that service is not available to the public.  I did find a complete set of AEJ available from antique bookseller (the journal ceased publication in 1990) for about $500 US!  For wider access, I hope that EOL pages will include links to original species descriptions when available as out-of-copyright or open-access.

This leads to report in July 2009 Mol Ecol by researchers from France, Switzerland, Hungary, and Canada, using mitochondrial and microsatellite DNA markers to trace origin of C. ohridella. For this remarkably wide-ranging study, the researchers analyzed 486 specimens from 88 localities in 22 European countries, collecting a single individual per leaf per tree, and if possible, from 30 different trees at each collecting site. To skip to the conclusion, consistent with historical pattern of spread north and west through Europe, the invasive form of the moth appears to be derived from populations infecting wild horse-chestnut trees in the southern Balkans. The genetic diversity was greatest in natural forests in Macedonia, Greece, and Albania, whereas the individuals collected from all “artificial” habitats (ie planted trees in parks, gardens, and roadsides across Europe) had nearly identical COI barcode sequences, consistent with recent expansion from a single source. The important practical conclusion is that biocontrol agents in the form of natural parasitoids are most likely to be found in wild stands of horse-chestnut in southern Balkans. I look forward to more studies on detecting and monitoring invasive species with DNA.

EOL News

“The Encylopedia of Life has issued its 2nd Annual Report as well as a newsy press release which received coverage in at least 24 nations and 12 languages, including a lively article by the Spanish wire service EFE.