Mapping biodiversity with DNA: sectors analyzed so far

As of 28 january 2008, there are 341,825 barcode records from 35,798 species in the Barcode of Life Database (BOLD) www.barcodinglife.org . What sectors of biodiversity have been analyzed so far? Here I follow the daily updated pages publicly available through “DNA Taxonomy Browser” link on BOLD home page. One can click downward through the taxonomic hierarchy from phyla to species, with a cogent summary at each level showing barcode records so far, contributing institutions and countries, and collection locations. The summary map shows remarkably good coverage of most terrestrial and coastal regions, and representation of nearly all countries. The open oceans are sparsely sampled so far, and remain an exciting terra incognita for biological exploration, including with DNA barcoding.  The global totals of 342K records/36K species work out to about 10 barcodes/species, and the average number of barcodes/species is similar at least down to the class level for most groups I looked at, suggesting a target of roughly 10 specimens per species is being achieved. 

The densest records so far are from Phylum Arthropoda (244,297), particularly insects (230,838), and of these mostly Lepidoptera (moths and butterflies) (169,145); and Phylum Chordata (74,720), particularly mammals (27,186), fish (26,752), and birds (12,770). There is broad sampling of other groups, including records from 376 animal orders in 80 classes representing 25 phyla. In addition, there are a few thousand records from fungi (3 phyla, 8 classes) plants (mostly red algae; 3 phyla, 8 classes), and protists (7 phyla, 11 classes), the latter of which DNA barcoding is likely to reveal as an enormous, deeply diverse group. 

The first paper proposing DNA barcoding was published in February 2003. The results displayed today on BOLD Taxonomy Browser demonstrate amazing progress in a short time, thanks to the inspiration and hard work of many! 

Standardized DNA analysis gives new vision of biodiversity, disturbing some

By identifying species (leaves) and determining how they are related (branches), taxonomy aims to reconstruct the Tree of Life. To do so taxonomists must distinguish variation within species from that between species, and identify the shared characters that reflect evolutionary ancestry. These tasks require highly-specialized sets of knowledge and skill for each animal and plant group. 

One result is that it has been difficult to compare patterns of diversification between different branches in the Tree. One might ask, how finely and how evenly divided is biodiversity? Are the differences among and within mosquito species (3,400 species), for example, similar to the differences among and within fruit flies (6,200 species) or birds (10,000 species)? Broad application of DNA analysis is beginning to provide some insights. To enable these sorts of comparisons, a standardized locus is needed, as unique genes can solve local branching patterns, but do not allow easy comparisons between branches.

Large-scale surveys of standardized genetic loci, including COI barcoding, commonly reveal distinct groups within what was thought to be a single species. There is also the converse finding that in some cases COI barcode sequences do not distinguish named species, but generally this strikes me as a relatively minor scientific problem that usually involves very closely-related species pairs and can be solved where needed with more DNA sequence, assuming the underlying taxonomy is correct and the named species really are distinct. The greater scientific challenge is finding multiple groups within what appear morphologically to be single species. In many cases so far, organisms with genetically distinct COI barcode clusters show associated biological differences signalling they represent different species.

In December 2007 Mol Ecol 16:4999, researchers from Museum of Comparative Zoology, Harvard, examine genetic differences in Aoraki denticulata, a tiny (2-3 mm) daddy longlegs or harvestman spider found in leaf litter widely through New Zealand.  They determined COI barcode region sequences for 119 individuals from 17 localities in the mountainous northern part of South Island. The two described subspecies A. d. denticulata and A. d. major were genetically distinct. The surprising finding was there were at least 14 distinct clusters within A. d. denticulata, with a different group at almost every site, and 2 clusters at 3 of the sites. The differences between mtDNA clusters were as larger or larger than between other Aoraki species, up to 19.2%, but no morphologic differences were found even with electron microscopic scanning of males. Boyer et al observe “while it is conceivable that some of the geographically widespread populations…represent cryptic species, it is difficult to imagine that morphologically identical individuals from a single sample at a unique geographical point are not conspecific…it is hard to believe that almost every sampled locality would host at least one, if not two, cryptic species.”

Continue reading “Standardized DNA analysis gives new vision of biodiversity, disturbing some”

Taxonomy needs DNA, and quick, simple ways to analyze it

NOAA Alaska Fisheries CenterLumpsuckers are globular, scaleless marine fish with bony tubercles on head and body, and a ventral sucking disc, derived from specialized pelvic fins, which allows them to adhere to environmental substrates. The genus Eumicrotremus comprises 16 species distributed in the Arctic and northern Atlantic and Pacific oceans; the commonest and most widespread in the north Atlantic is the Spiny lumpsucker E. spinosus, which was first described by Fabricius in 1776. A new subspecies E. s. eggvinii was described in 1956, based on a single specimen, and this was later elevated to species level “on the basis of wrinkled skin, numerous dermal warts and a large sucking disk, in addition to the low number of bony tubercles.”

In August 2007 J Fish Biol 71A: 111, researchers from University of Bergen, Norway, analyze DNA and morphologic characters of E. eggvinii (n=16) and E. spinosus (n=67).  All specimens were easily classified by morphologic characters. However, the two species had identical mitochondrial DNA sequences (COI barcode region, COII, cytb) and identical nuclear gene Tmo-4C4. Further genetic testing revealed that E. eggvinii were all males, and E. spinosus were all females. The authors conclude that the two morphologically distinct “species” represent the sexually dimorphic forms of E. spinosus

In this study by Byrkjedal et al, identical mtDNA sequences suggested synonymy, and this in turn suggested that morphologic divergence might represent sexual dimorphism, confirmed by further genetic testing. To my reading, this study suggests DNA testing needs to be as commonplace in taxonomy as recording size, shape, and coloration, and counting rays in fins and placement of tubercles. Every new species should have a representative DNA sequence as part of the species description. For animals, the standard should be a COI barcode. One of the remaining impediments to widespread adoption is that simple protocols for sequencing COI barcode region need to be better disseminated. In this study, the researchers were able to recover COI barcode region using primers designed for invertebrates (Folmer et al 1994), although others have published primer pairs that have greatly increased effectiveness with diverse fish (Ward et al 2005, Ivanova et al 2007). Compiling primer pairs and amplification protocols and displaying this information prominently on the various barcoding web sites will help (see for example SpongeBOL home page www.spongebarcoding.org link to illustrated primer primer!). I close with note this is post #100 since the first DNA barcode blog entry of March 15, 2006!

Embedding standardized DNA analysis in taxonomic practice

In 17 September 2007 Zootaxa (open access full article) researchers from Museo Nacional de Ciencias Naturales-CSIC, Madrid, make a plea for routinely incorporating standardized DNA sequence analysis, ie DNA barcoding, into modern taxonomic practice. In their view, “integrative taxonomists should use and produce DNA barcodes.” Of course, this is already happening in many areas, but new practices diffuse slowly through the fragmented world of taxonomy, and so Padial and de la Riva’s exhortation is an important step. With growing DNA barcode libraries and increasingly inexpensive sequencing technologies, DNA testing will likely be the fastest way to sort specimens into species and will enable identification of multiple forms that now go unnamed or misidentified while waiting for an expert, waiting for eggs and larva to mature, or waiting to find an identifiable adult male or a recognizable fragment in stomach contents.

One might view taxonomic science as an effort to construct detailed, reliable “maps” of species and their historical relationships. Adopting Padial’s and de la Riva’s advice to routinely “use and produce DNA barcodes” will speed taxonomic research and, more importantly, will naturally produce a “map of species” with general scientific and public utility. Few persons can have the requisite knowledge to distinguish larval fish for example, whereas anyone can submit a sample for DNA sequence analysis. In this way, a DNA barcode library is a map of species, one that anyone can read with the right device, a DNA sequencer. Of course, more work is needed to identify the best approaches for assigning sequences to named species and for flagging divergent sequence clusters that might represent new species. With improved analytic software and as more species and specimens per species are analyzed, the reliability of DNA barcode maps will increase. Based on results so far, I expect rapid growth in mail-order identification services, analogous to today’s DNA ancestry companies, that do DNA barcode analysis of submitted specimens, and, as others have envisioned, soon enough there will be table-top or hand-held devices that pinpoint where the specimen in hand belongs on the biodiversity map. Best wishes to all this holiday season!

Reading DNA labels on sponges

Sponges are difficult to identify and classify. Many sponges “have a depauperate suite of morphologic characters and/or are plagued by morphological homoplasies” and vary according to environmental conditions, challenging identification at species level and stymying attempts to reconstruct evolutionary lineages.  In Dec 2007 J Marine Biol Assoc UK researchers from Geoscience Centre Gottingen, Germany, and Queensland Museum, Australia, report on how DNA can help. Worheide and Erpenbeck describe the nascent Sponge Barcoding Project www.spongebarcoding.org, which aims to collect DNA signature sequences [COI barcodes] from all 8,000 known marine intertidal, deep sea, and freshwater sponge taxa.  According to the authors “DNA barcoding will open up a new dimension and quality in biodiversity research and will become of vital importance for the survival and acknowledgement of sponge taxonomy and increase its reputation over the coming decades.” In addition to assisting species-level identifications, the authors posit the necessity of “DNA-assisted” taxonomy of sponges given the inability to construct convincing higher order classifications with morphologic characters.

An accompanying article analyzes COI barcode results for 166 specimens belonging to 65 species of Caribbean sponges. Similar to findings in other animal groups, the 584 bp COI fragment produced a gene tree similar to that with 28s rRNA, a slowly-evolving nuclear gene. In a ML analysis, some species had overlapping or shared sequences, which the authors point out may mean these are not “good species”, specimen identifications are incorrect, or that these species cannot be distinguished by a COI barcode alone. The sequences are published in GenBank and available individually through the Sponge Barcode website, and I hope the authors will also make their sequence and specimen data available on the Published Projects section of BOLD. This will allow access to the analytic and display software on the BOLD site, enable easy comparison of the sponge data set with that of other animals, and facilitate testing of other methods particularly for those species which are not distinguished in ML analysis.  

DNA plus database of sounds help reveal new bird species

Birds are relatively large, conspicuous, vocal, and mostly diurnal creatures, making it relatively easy for humans to tell apart. Even so, new species continue to be discovered; these usually represent distinct forms within what were thought to be single species. In 27 February 2007 Mol Phylogenetics Evol, Arpad Nyari, University of Kansas, reports on molecular and vocal differentiation in a widespread South American songbird, the Thrush-like Schiffornis Schiffornis turdinus. S. turdinus is a “dull-colored, secretive bird distributed throughout Neotropical humid lowland forests from southeastern Mexico south to northern Bolivia and the Atlantic Forest of southeastern Brazil.”  The thirteen recognized subspecies show “subtle differences in plumage hue and intensity, and body size”. How to sort out which forms might represent distinct species?

Nyari analyzed 38 individuals representing 10 of the 13 subspecies, plus 3 congeneric individuals of S. virescens or S. major. 8 of the 41 specimens were from University of Kansas, and the remainder were loaned from 6 museum collections in the US and Brazil, which highlights the distributed nature of avian tissue collections and the benefit of sharing resources. 2475 bp of mtDNA including COI barcode region, ND2, and cytochrome b were sequenced as molecular markers. Vocalizations were downloaded from Macaulay Library of Natural Sounds, Cornell University and spectrographs analyzed with RAVEN bioacoustic software which is provided on the Cornell site. The ability to combine these disparate data sets hints at power of making biological data widely available. Molecular analysis revealed 7 distinct geographically restricted clades, 5 of which had characteristic vocalizations. On this basis, S. turdina is recommended to be split into 5 species.  When analyzed separately, the COI barcode region (615 bp) recovered the same 7 clades as the full 2475 bp. As in other studies, the branching order among clades and congenerics was better shown with the larger data set.  

To my reading, this and other studies suggest that large-scale COI barcode screening of avian tissue collections will be a scientifically productive and efficient approach that will speed discovery of new bird species and advance understanding of avian diversity. As in Kerr et al’s recent study of North American birds, it seems likely that many or most of the new bird species awaiting discovery will be found among birds similarly inconspicuous as S. turdinus

DNA helps identify African agricultural pests

Among the 35,000 known species of noctuid moths, a number are destructive agricultural pests, including for example Corn earworm Helicoverpa zea and Tobacco budworm Heliothis virescens. Accurate identification is the essential first step in pest management, but morphologic identification can be difficult, particularly of eggs and larval forms.

In September 2007 African Entomology, researchers from South Africa, Australia, and France analyze COI mtDNA of Busseola sp. larva in Ethiopian sugarcane. The DNA barcode region of COI was amplified from 7 morphologically-indistinguishable larval specimens using standard invertebrate primers (Folmer et al, Mol Marine Biol Biotech 3:294, 1994). Rearing of the larva was attempted, but none of the collected larva developed to the adult stage. Sequence analysis revealed two distinct clusters that matched sequences derived from adult B. fusca and B. phaia. Assefa and co-authors conclude “DNA-based methods were found to be a quick, easy and reliable method for identification of species…[and] may then be solutions for conditions in Africa where there is an acute shortage of experts and rearing facilities to keep field-collected insects alive until emergence of adults for morphological identification.”

Web initiative aims to help clear name confusion

“The first part of knowledge is getting the names right.”   Chinese proverb quoted in Evolution of Insects, Grimaldi and Engel, 2005.

Species names are the primary entrance for accessing biological knowledge about organisms. However, the tangled bank of nomenclature created by 250 years of diverse communities of taxonomic specialists working largely in isolation challenges those seeking knowledge. It can be difficult to know what is already known. Identifying even well-studied organisms in backyards, such as North American ants for example, may require graduate-level training. As taxonomic knowledge moves increasingly onto the web, tools that enable non-specialists and specialists alike to access biological knowledge of organisms are beginning to be developed. In my view, the solution will be a combination of information science tools enabling access to biological literature together with a universal library of standardized genetic sequences, ie DNA barcodes, and simple technologies for barcode sequencing. 

An exciting development in taxonomic information science is https://www.ubio.orguBio (Universal Biological Indexer and Organizer) www.ubio.org, “an initiative within the science library community to join international efforts to create and utilize a comprehensive and collaborative catalog of known names of all living (and once-living) organisms. The Taxonomic Name Server (10,699,999 NameBank records so far) catalogs names and classifications to enable tools that can help users find information on living things using any of the names that may be related to an organism.??” 

The uBio site provides a sophisticated and enjoyable illustrated introduction (excerpt at right) to the variety of challenges in retrieving information using organism names. Another feature is Nomenclator Zoologicus, a searchable list of the names of genera and subgenera in zoology from the tenth edition of Linnaeus 1758 to the end of 2004, developed with Zoological Society of London. uBio is helping organize and index Encyclopedia of Life (“a web page for every species”) and Biodiversity Heritage Library (1.124 million pages digitized and on the web so far). 

I close with an example from birds. Some taxonomic confusion reflects the struggle to integrate older works that use outdated taxon names or species limits with modern knowledge. Other discordances reflect lack of consensus among current experts. Given the intensity of scientific study and public interest in birds, it is surprising there is no single authoritative world checklist, especially since most of the differences at the species level reflect minor disagreements about generic assignment, a few cases of splitting/lumping, or differences in spelling. As one step until there is an expert consensus checklist, for those interested in birds, we have prepared an “ABBI Name Lookup” (Excel, 8 MB) file for harmonizing specimen lists that recognizes 2,462 synonyms, alternate and misspellings, and extinct species.

Optimizing PCR primers for amphibian COI sequences

“Amphibians are globally in decline, yet there is still a tremendous amount of unrecognized diversity” observed Vences et al in 2005 Phil Trans R Soc B 360:1859, the first report applying DNA barcoding to amphibian diversity.  Vences and colleagues highlighted the pressing need for fast and reliable identification tools, including for eggs and larva, which are often unrecognizable morphologically.

Here I focus on one technical aspect of DNA barcoding amphibians, namely designing primers that amplify the target sequence from a broad range of species. Previous research had shown remarkable mitochondrial sequence diversity among closely-related amphibians, and even within what appear to be single species, some of which may represent cryptic species. In the 2005 Proc R Soc B paper, researchers used COI primers designed for invertebrates (Folmer et al 1994); suprisingly these “worked in a large proportion of specimens”. They concluded “We support attempts to build up a global and complete cox1 database of [animal] eukaryotes”.

In 2005 Frontiers Zool 2:5 the same group of researchers quantified their PCR amplification success on specimens from 38 individuals representing 20 amphibian species. Using a well-established primer set for vertebrate 16s (Palumbi et al 1991) 38 of 38 (100%) samples amplified; with 3 COI primer sets (1 for invertebrates, 2 for birds), 36 of 38 (95%) amplified, although there was only 50-70% success for the individual COI primer pairs. The authors did not attempt to design new primers for amphibians. They concluded “we strongly advocate use of 16s rRNA as standard DNA marker for vertebrates to complement COI”. This seems reasonable but the advantages of standardizing on a single gene call for an effort to design primers that amplify COI from amphibians before abandoning the field to 16s or some other marker.  

In 2007 Mol Ecol Notes Smith and colleagues from University of Guelph analyzed 83 amphibian COI sequences in GenBank to design new primers. The 3′ ends of the forward and reverse primers bind at 1st or 2nd codon position G-C residues, which they found to be highly conserved among amphibian species, and each primer contains three 2-fold degenerate sites. Using this set, they amplified full-length PCR products from 267 of 377 specimens (71%) representing 39 amphibian species (including Triturus vulgaris illustrated at right), and recovered an additional 34 sequences (9%) using a “mini-barcode” primer set designed for butterflies. The authors comment “many of the specimens…which failed to amplify had been fixed in formalin or were collected more than 15 years ago”, so further work to test these primers on fresh material and a diversity of species is needed.

Amphibians are an exciting group. A comprehensive amphibian DNA barcode library will likely provide many, many new insights. I believe further work will help establish robust primer sets for amphibian COI sequences. 

Non-invasive DNA recovery leaves tiny specimens intact

Rowley et al Mol Ecol Notes 2007Reference databases of DNA sequences used for species identification, ie DNA barcode libraries, are most powerful when the morphologic specimens are vouchered in a museum collection. This way, when there are puzzling results, DNA and morphologic specimens can be re-examined. However to date it has been challenging to recover DNA from small organisms without destroying them in the process. 

In Mol Ecol Notes 9 aug 2007 researchers from US Department Agriculture and Smithsonian Institution, National Museum of Natural History, describe a uniform protocol for “nondestructive extraction of DNA from terrestrial arthropods” including ticks, spiders, beetles, flies, and bees. 1 to 4 h in a guanidium thiocyanate extraction buffer yielded amplifiable COI DNA from most specimens. Inspection of specimens after extraction including with phase contrast and scanning electron microscopy demonstrated preservation of most morphologic characters.

In Mol Ecol Notes 27 june 2007, UK researchers (University College, London, NERC Centre for Ecology and Hydrology, Oxford, and UK Environmental Agency) describe a rapid, non-destructive, chemical-free method for DNA recovery from blackflies, including adult, larval, and pupal forms. Hunter et al report brief (1 minute) sonication in sterile water yielded 66% success with COI barcode amplification and preserved morphologic details.

These reports are exciting in the methods they describe and in how they highlight the general value of extracting DNA and determining DNA barcode sequences as an integral part of preparing traditional morphologic vouchers.