The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Fungal database a Tower of Babel, needs rebuilding

Early in Michael Crichton’s 1990 novel Jurassic Park, Dr. Henry Wu, chief scientist at Jurassic Park Research Insitute, showing visitors around his facility, displays “the actual structure of a small fragment of dinosaur DNA“. Astute readers pointed out Dr. Wu’s dinsosaur genetic resuscitation project was unlikely to succeed, as the sequence in Crichton’s novel was a fragment of the bacterial plasmid pBR322. They discovered this by feeding the “dinosaur sequence” into the online BLAST software engine, which searches the billions of base pairs of nucleotide sequences deposited in the amazing public resource of GenBank and the other international genetic databases, EMBL and DDBJ.

The power of genetic databases as identification tools rests on the quality of sequences and their annotations.  Just as we need regularly updated maps for safe navigation, we need regularly updated genetic databases for accurate identifications.

One of the strengths of GenBank is that it serves as a permanent repository for genetic sequence data. As a result, GenBank is sometimes a permanent repository for faulty data.  In a recent PLoS One paper, researchers from Goteborg University and Chalmers University of Technology, Sweden, and University of Tartu, Estonia, examined the taxonomic reliability of the 51,534 fungal internal transcribed spacer (ITS) sequences in the International Nucleotide Sequence Database (ie GenBank, EMBL, DDBJ). ITS is the most widely used locus for species identification in fungi. The Tower of Babel, Bruegel, Pieter the Elder, 1563The results show a “variegated picture of the taxonomic status of publicly indexed fungal sequences“.  Taxonomic coverage is sparse: of the estimated 1.5 million fungi, less than 1% (9,684 species) are represented. Taxonomic data is lacking for many sequences (27% are not identified to species level), and most of the species-level identifications are unverifiable (82% are not linked to voucher specimens, 63% are not tagged with specimen country of origin, and 42% are marked as unpublished). Sequence comparisions suggest mislabeling is common (11% show best matches to congeneric but heterospecific sequences, and another 7% match among species of a different genus. Overall 10-21% of the INSD sequences have incorrect or unsatisfactory annotations. 

It seems better to start over than to try to revise this Tower of Babel.  Nilsson et al conclude “the large body of insufficiently identified fungi in INSD constitutes a silent plea for a wide and generalized sequencing effort of well-identified and -annotated [type] specimens residing in herbaria worldwide.” Toward this end, an All-Fungi Barcoding Initiative Workshop will be held 14-15 May 2007 at the Smithsonian Center for Research and Conservation, Fort Royal, Virginia. An international collection of researchers aim to hammer out how to build a reliable database, including which gene(s) should be adopted as standard barcode targets.  

So far, DNA-based fungal identifications have primarily used ITS. Other nuclear genes have been used in some studies including the nuclear large ribosomal subunit, beta-tubulin, and elongation factor 1-alpha. It would be excellent if the fungal barcode database could link directly with those being built around the mitchondrial gene COI, which is effective for resolving most protozoan and metazoan (multicellular animal) species examined so far. In this regard it is exciting that a report by Seifert et al in 6 March 2007 Proc Natl Acad Sci USA shows COI provides species-level resolution similar to that for ITS, amplification was generally straightforward, and introns in the COI gene were found in only 2 of 370 Penicillium strains. 

This entry was posted on Friday, April 27th, 2007 at 11:44 am and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.


About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.