The Barcode Blog

A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Subscribe to this blog

Sign up for email notifications

Voucher and collection information in GenBank records

A core tenet of DNA barcoding initiative, beginning with the first workshops in 2003, is that reference sequences should be linked to vouchered specimens stored in museums, so that data can be re-checked. This also provides visibility to collections. For example, “GenBank DQ433554 Crotophaga ani voucher KU 89123 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial”  contains voucher information in the title and the record itself, at least for those who know “KU” refers to University of Kansas. The GenBank file contains a “LinkOut” to the BOLD page which spells out the collection name. The GenBank file (and the BOLD record) could also include a “LinkOut” to the museum itself, although I do not find examples of this feature being used.

bold-systems-specimen-record

More generally, is collection information available in GenBank records? Taking birds as an example, there are 475,273 GenBank avian records; eliminating the five most-represented species (Chicken, Turkey, Mallard, Zebra Finch, Fairy Wren) leaves 108,766 sequences, of which about half (48,915) contain the word “voucher.” This sounds promising but my unscientific sample suggests most entries in the “voucher” field are cryptic designations that do not identify the institution storing the specimen. I tried searching by acronyms for some of the larger collections. Louisiana State University has the largest avian tissue collection in the world with about 40,000 specimens; searching “LSU AND aves[organism] AND voucher” returned only 1,148 records, which seems likely to underrepresent the museum’s contribution. Results for some other large collections were higher but still appear to be incorrectly small considering there are 100,000+ avian GenBank records: (Burke Museum (UWBM) 3,318; Field Museum (FMNH), 2,593;  American Museum of Natural History  (AMNH), 1,994;  Smithsonian (USNM), 1,920; University of Kansas (KU), 684 records).

I conclude that researchers and collections will benefit from following practices promoted by DNA barcode initiative for GenBank records including taking advantage of GenBank’s “LinkOut” feature.

This entry was posted on Friday, July 17th, 2009 at 12:28 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

2 Responses to “Voucher and collection information in GenBank records”

  1. Robert Hanner Says:

    This is an interesting post and your efforts to sleuth collection records from GenBank entries are commendable. It is noteworthy that the barcoding community successfully lobbied for the addition of new fields to the INSDC (DDBJ, EMBL & GenBank) feature table to accommodate a structured reference to voucher specimens, GPS coordinates, name of collector, name of specimen identifier, etc. and these fields are now available for any sequence submission. Sadly, with the notable exception of Arctos, few museum databases yet support external linkage with GenBank or BOLD. Even fewer of them support dynamic web services… for those that even have online collection catalogs. Sadly it will be a long time before collection records are digitized and brought into the e-Biosphere, particularly for invertebrates. Even when such collection records are available online thru portals such as GBIF, without barcodes to qualify the application of Linnean names associated with each record it is difficult to interpret the validity of the results obtained…

  2. Mark Stoeckle Says:

    This is helpful reminder that “voucher” and related fields now available in GenBank are outcome of barcode initiative, with general benefit to collection and biodiversity science communities, and that there is a lot-of-opportunity/long-way-to-go to unlock scientific and social value of collections.

Contact: mark.stoeckle@rockefeller.edu

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.