Blog

Rapid data release for barcode data

At the Fourth International Barcode of Life Conference in Adelaide, there was general recognition that the initiative’s remarkable success in generating barcodes is outstripping the relatively slow process of releasing experimental data after academic publication. Of approximately 1.4 million barcode records in BOLD at the time, fewer than 300 thousand sequences with species names were publicly available, and the proportion of barcode sequences that are published and have species names appears to be be falling further behind over time, as the rate of barcoding specimens increases. Given that privately held sequence data does not contribute to the overarching goal of creating a community resource for society and science, this stimulated many discussions on how to proceed. Many cited the rapid data release policies hammered out by the genomics community as a precedent.

At a 1996 summit in Bermuda, leaders of the scientific community agreed on a groundbreaking set of principles requiring that all DNA sequence data be released in publicly-accessible databases within twenty-four hours after generation. These “Bermuda Principles” (also known as the “Bermuda Accord”) contravened the typical practice in the sciences of making experimental data available only after publication. These principles represent a significant achievement…and have established rapid pre-publication data release as the norm in genomics and other fields.  https://en.wikipedia.org/wiki/Bermuda_Principles

Human genomics is not the same as biodiversity genomics–barcodes are derived from a multitude of often irreplaceable specimens for one, but the general principle of rapid data-release contributing to a community resource, for what is after all, an enterprise funded by society, surely holds.

What follows is one strategy for academic publication AND rapid data release which we hope will encourage others. With the assistance of ZooKeys (open access), GenBank, and BOLD, on December 8, 2011, a brief “Project Description” of a barcode dataset (see below), completed just two weeks earlier, was published coincident with release of sequence data in GenBank and BOLD, with a full descriptive paper summarizing the dataset to follow in the next six months. A set of explicit statements regarding use of early release data (see below) is included in the Project Description.

Title: Project Description: DNA Barcodes of Bird Species in the National Museum of Natural History, Smithsonian Institution, USA

Abstract: The Division of Birds, National Museum of Natural History, Smithsonian Institution in Washington, DC, has obtained and released DNA barcodes for 2,808 frozen tissue samples. Of the 1,403 species represented by these samples, 1,147 species have not been barcoded previously. This data release increases the number of bird species with standard barcodes by 91%. These records meet the data standard of the Consortium for the Barcode of Life and they have the reserved keyword BARCODE in GenBank. The data are now available on GenBank and the Barcode of Life Data Systems.

Excerpt regarding use of early release data:

The authors invite the research community to examine and analyze the data in their current form with the following understandings:

• As with all data released on GenBank, the National Center for Biotechnology Information places no restriction on their use or distribution.

• The authors intend to publish a descriptive paper summarizing the dataset and its implications for bird barcoding and any taxonomic issues arising from the data. Publication of this data release paper is anticipated by 1 June 2012. In accordance with the Fort Lauderdale Principles (Welcome Trust 2011), the authors ask the community to respect our intent to publish on these topics and not to submit manuscripts for this purpose based on this dataset.

• Use of this dataset for purposes other than those described above are welcome and encouraged, contingent on proper citation of this publication.

• The authors invite members of the community to examine the data and test their accuracy relative to other datasets. We welcome your comments, suggestions and corrections. BOLD 3.0 includes the capability to submit annotations to data submitters and we encourage readers to use this new system to submit observations on this dataset.

• The species determinations are not yet final. Some of the species identification may be change by the time of publication of the data release paper (anticipated by 1 June 2012).

I hope to soon see more public barcode data, following this and other pathways!

DNA Barcoding Prizes for first Nature, Science publications

First proposed in 2003, the DNA barcoding initiative has generated more than 1000 scientific publications, but none so far in the de facto top science journals, Nature and Science. The barcode library contains over 1 million records from over 100 thousand species, suggesting opportunities for new insights into large-scale patterns and processes in biodiversity. Yet so far relatively few papers have attempted synthetic exploration of this unprecedented genetic resource beyond species identification. To encourage high-profile discovery, Program for the Human Environment is offering $5000 prizes for the first DNA barcoding papers in Nature and Science, as announced earlier this month at the close of Fourth International Barcode of Life Conference, University of Adelaide, Australia. To qualify, the paper must embrace DNA barcoding either in the title or abstract, and cite CBOL and iBOL in the acknowledgments.

Zookeys

Open access journal ZooKeys announced the first rapid-release, large dataset of bird barcodes, based on 2,808 frozen tissue samples from the Division of Birds, National Museum of Natural History, Smithsonian Institution, Washington, D.C. This is the first barcode dataset explicitly published under the Fort Lauderdale Principles (Welcome Trust 2003) of early data release adopted by the genomics community. Mark and co-authors hope this is a model that will encourage and allow barcoders to rapidly release data and receive academic credit.

Trinity School TeaBOL

The Trinity School TeaBOL project is featured in cover article for December/January issue of The Helix, an Australian science magazine for children 10+

Barcode High

The November/December 2011 online issue of The Scientist published an article by Mark on the High School Barcode Project, Barcode High.

Budget Hero

The Alfred P. Sloan and Richard Lounsbery foundations have supported the creation of an interactive game to reduce the USA federal government budget deficit. The game has been played more than 150,000 times. It’s fun, more fun than actually reducing the deficit. Try the fantasy version at Budget Hero.

Name the Scientist

The New York Times created an interactive game “name the scientist” in which Jesse Ausubel is the wrong answer to the 4th question. Richard Dawkins is the right answer. Still we are flattered to be in such company. Enjoy the game.