A mostly scientific blog about short DNA sequences for species identification and discovery. I encourage your commentary. -- Mark Stoeckle

Rapid data release for barcode data

At the Fourth International Barcode of Life Conference in Adelaide, there was general recognition that the initiative’s remarkable success in generating barcodes is outstripping the relatively slow process of releasing experimental data after academic publication. Of approximately 1.4 million barcode records in BOLD at the time, fewer than 300 thousand sequences with species names were publicly available, and the proportion of barcode sequences that are published and have species names appears to be be falling further behind over time, as the rate of barcoding specimens increases. Given that privately held sequence data does not contribute to the overarching goal of creating a community resource for society and science, this stimulated many discussions on how to proceed. Many cited the rapid data release policies hammered out by the genomics community as a precedent.

At a 1996 summit in Bermuda, leaders of the scientific community agreed on a groundbreaking set of principles requiring that all DNA sequence data be released in publicly-accessible databases within twenty-four hours after generation. These “Bermuda Principles” (also known as the “Bermuda Accord”) contravened the typical practice in the sciences of making experimental data available only after publication. These principles represent a significant achievement…and have established rapid pre-publication data release as the norm in genomics and other fields.

Human genomics is not the same as biodiversity genomics–barcodes are derived from a multitude of often irreplaceable specimens for one, but the general principle of rapid data-release contributing to a community resource, for what is after all, an enterprise funded by society, surely holds.

What follows is one strategy for academic publication AND rapid data release which we hope will encourage others. With the assistance of ZooKeys (open access), GenBank, and BOLD, on December 8, 2011, a brief “Project Description” of a barcode dataset (see below), completed just two weeks earlier, was published coincident with release of sequence data in GenBank and BOLD, with a full descriptive paper summarizing the dataset to follow in the next six months. A set of explicit statements regarding use of early release data (see below) is included in the Project Description.

Title: Project Description: DNA Barcodes of Bird Species in the National Museum of Natural History, Smithsonian Institution, USA

Abstract: The Division of Birds, National Museum of Natural History, Smithsonian Institution in Washington, DC, has obtained and released DNA barcodes for 2,808 frozen tissue samples. Of the 1,403 species represented by these samples, 1,147 species have not been barcoded previously. This data release increases the number of bird species with standard barcodes by 91%. These records meet the data standard of the Consortium for the Barcode of Life and they have the reserved keyword BARCODE in GenBank. The data are now available on GenBank and the Barcode of Life Data Systems.

Excerpt regarding use of early release data:

The authors invite the research community to examine and analyze the data in their current form with the following understandings:

• As with all data released on GenBank, the National Center for Biotechnology Information places no restriction on their use or distribution.

• The authors intend to publish a descriptive paper summarizing the dataset and its implications for bird barcoding and any taxonomic issues arising from the data. Publication of this data release paper is anticipated by 1 June 2012. In accordance with the Fort Lauderdale Principles (Welcome Trust 2011), the authors ask the community to respect our intent to publish on these topics and not to submit manuscripts for this purpose based on this dataset.

• Use of this dataset for purposes other than those described above are welcome and encouraged, contingent on proper citation of this publication.

• The authors invite members of the community to examine the data and test their accuracy relative to other datasets. We welcome your comments, suggestions and corrections. BOLD 3.0 includes the capability to submit annotations to data submitters and we encourage readers to use this new system to submit observations on this dataset.

• The species determinations are not yet final. Some of the species identification may be change by the time of publication of the data release paper (anticipated by 1 June 2012).

I hope to soon see more public barcode data, following this and other pathways!

About this site

This web site is an outgrowth of the Taxonomy, DNA, and Barcode of Life meeting held at Banbury Center, Cold Spring Harbor Laboratory, September 9-12, 2003. It is designed and managed by Mark Stoeckle, Perrin Meyer, and Jason Yung at the Program for the Human Environment (PHE) at The Rockefeller University.

About the Program for the Human Environment

The involvement of the Program for the Human Environment in DNA barcoding dates to Jesse Ausubel's attendance in February 2002 at a conference in Nova Scotia organized by the Canadian Center for Marine Biodiversity. At the conference, Paul Hebert presented for the first time his concept of large-scale DNA barcoding for species identification. Impressed by the potential for this technology to address difficult challenges in the Census of Marine Life, Jesse agreed with Paul on encouraging a conference to explore the contribution taxonomy and DNA could make to the Census as well as other large-scale terrestrial efforts. In his capacity as a Program Director of the Sloan Foundation, Jesse turned to the Banbury Conference Center of Cold Spring Harbor Laboratory, whose leader Jan Witkowski prepared a strong proposal to explore both the scientific reliability of barcoding and the processes that might bring it to broad application. Concurrently, PHE researcher Mark Stoeckle began to work with the Hebert lab on analytic studies of barcoding in birds. Our involvement in barcoding now takes 3 forms: assisting the organizational development of the Consortium for the Barcode of Life and the Barcode of Life Initiative; contributing to the scientific development of the field, especially by studies in birds, and contributing to public understanding of the science and technology of barcoding and its applications through improved visualization techniques and preparation of brochures and other broadly accessible means, including this website. While the Sloan Foundation continues to support CBOL through a grant to the Smithsonian Institution, it does not provide financial support for barcoding research itself or support to the PHE for its research in this field.