Rapid data release for barcode data

At the Fourth International Barcode of Life Conference in Adelaide, there was general recognition that the initiative’s remarkable success in generating barcodes is outstripping the relatively slow process of releasing experimental data after academic publication. Of approximately 1.4 million barcode records in BOLD at the time, fewer than 300 thousand sequences with species names were publicly available, and the proportion of barcode sequences that are published and have species names appears to be be falling further behind over time, as the rate of barcoding specimens increases. Given that privately held sequence data does not contribute to the overarching goal of creating a community resource for society and science, this stimulated many discussions on how to proceed. Many cited the rapid data release policies hammered out by the genomics community as a precedent.

At a 1996 summit in Bermuda, leaders of the scientific community agreed on a groundbreaking set of principles requiring that all DNA sequence data be released in publicly-accessible databases within twenty-four hours after generation. These “Bermuda Principles” (also known as the “Bermuda Accord”) contravened the typical practice in the sciences of making experimental data available only after publication. These principles represent a significant achievement…and have established rapid pre-publication data release as the norm in genomics and other fields.  https://en.wikipedia.org/wiki/Bermuda_Principles

Human genomics is not the same as biodiversity genomics–barcodes are derived from a multitude of often irreplaceable specimens for one, but the general principle of rapid data-release contributing to a community resource, for what is after all, an enterprise funded by society, surely holds.

What follows is one strategy for academic publication AND rapid data release which we hope will encourage others. With the assistance of ZooKeys (open access), GenBank, and BOLD, on December 8, 2011, a brief “Project Description” of a barcode dataset (see below), completed just two weeks earlier, was published coincident with release of sequence data in GenBank and BOLD, with a full descriptive paper summarizing the dataset to follow in the next six months. A set of explicit statements regarding use of early release data (see below) is included in the Project Description.

Title: Project Description: DNA Barcodes of Bird Species in the National Museum of Natural History, Smithsonian Institution, USA

Abstract: The Division of Birds, National Museum of Natural History, Smithsonian Institution in Washington, DC, has obtained and released DNA barcodes for 2,808 frozen tissue samples. Of the 1,403 species represented by these samples, 1,147 species have not been barcoded previously. This data release increases the number of bird species with standard barcodes by 91%. These records meet the data standard of the Consortium for the Barcode of Life and they have the reserved keyword BARCODE in GenBank. The data are now available on GenBank and the Barcode of Life Data Systems.

Excerpt regarding use of early release data:

The authors invite the research community to examine and analyze the data in their current form with the following understandings:

• As with all data released on GenBank, the National Center for Biotechnology Information places no restriction on their use or distribution.

• The authors intend to publish a descriptive paper summarizing the dataset and its implications for bird barcoding and any taxonomic issues arising from the data. Publication of this data release paper is anticipated by 1 June 2012. In accordance with the Fort Lauderdale Principles (Welcome Trust 2011), the authors ask the community to respect our intent to publish on these topics and not to submit manuscripts for this purpose based on this dataset.

• Use of this dataset for purposes other than those described above are welcome and encouraged, contingent on proper citation of this publication.

• The authors invite members of the community to examine the data and test their accuracy relative to other datasets. We welcome your comments, suggestions and corrections. BOLD 3.0 includes the capability to submit annotations to data submitters and we encourage readers to use this new system to submit observations on this dataset.

• The species determinations are not yet final. Some of the species identification may be change by the time of publication of the data release paper (anticipated by 1 June 2012).

I hope to soon see more public barcode data, following this and other pathways!

Leave a Reply