On Tuesday 6 November Jesse Ausubel delivered the Enrico Fermi Colloquium on “Macroscopes for Science” at the University of Florence, Italy, hosted by laser expert Roberto Bini. The hour-long audio of the lecture is at the link above, with the slides also showing once their turns come.
News
Barcode stats reveal progress, challenges, opportunities
As Dirk Steinke’s recent blog post demonstrated, since the seminal 2003 Proc Royal Soc London B Biol Sci paper by Hebert, Cywinska, Ball, and DeWaard, barcoders around the world have been generating scientific papers at a steadily growing pace. For more on the big picture, here I share three barcode stat visuals put together in preparation for the Third European Congress for the Barcode of Life (ECBOL3) in September.
Q: How many specimens have been barcoded?
A: A lot.
As of September 2012, about 600,000 specimens have barcode records in GenBank, about half of which qualify for BARCODE[keyword] based on CBOL data standards. This reminds me to recognize the special challenges barcoding has as a genomics project–the target number of specimens is enormous and each requires expert identification and long-term storage in a museum or herbarium.
In addition to GenBank/BOLD public records, at the time of the survey there were another 1.2 million barcode records in BOLD which lack species names. Probably most represent what Rod Page called “dark taxa“–difficult to identify specimens from undescribed species. It is an unsolved puzzle how much effort to devote to barcoding specimens that can’t or haven’t been identified to species. On the one hand this approach speeds species discovery, as documented in blog post cited above; on the other hand, many specimens will wait a very long time for the right taxonomist to come along and in the meantime the sequences alone may not be very useful to science. I should point out that for many dark barcodes, the sequences are public, are labeled with an order level identifier (e.g. Vertebrata) and BIN (see below), and include specimen photographs.
One possible solution is assigning “names” based on barcode sequences themselves, such as Barcode Index Number (BIN) system instituted in BOLD. This sidesteps the wait for an expert human to assign a traditional Latin binomial but does not link the sequence to other biological information about the organism the way a species name does. Researchers recently estimated there are about 8.7 million eukaryotic species, of which about 2 million are named (Mora et al PLoS Biol 2011). Given the very large array of undescribed (mostly small) life, how should barcoders proceed? The Human Genome Project seized on what was a radical idea and technologically difficult at the time–namely, sequencing the whole genome rather than just the expressed genes. Does an analogous approach of sequencing the whole eukaryotic biome of 8.7 million predicted species make sense? Let’s say we had sequences for all these forms–what new knowledge or capabilities would we have? I favor a stepwise approach focused on barcoding organisms already named, particularly those are already in collections and those important to society. There will be plenty of species discovery along the way.
Other dark barcodes are simply records for which the researchers have assigned a species name but are not posting it publicly. The importance of making sequence data public quickly was recognized at the 4th International Barcode of Life Conference held in Aidelaide last year (for one example of rapid publication of DNA barcode data see Schindel 2011 ZooKeys). Open access, data sharing, and transparency have been embraced by many scientific fields and their funders and I hope barcoders already have or are moving to adopt these principles.
Q: How many species have been barcoded?
A: A lot.
GenBank holds barcode sequences for about 100,000 species, mostly insects, vertebrates, and plants, and about 40,000 qualify for BARCODE keyword. Nearly all BARCODE records so far are from animals, mostly lepidoptera and vertebrates.
Q: What groups important to science or society have few barcodes?
A: Quite a few.
These suggest opportunities for scientific progress and grant support. They include human and animal disease vectors, agricultural pests, threatened and endangered species, and notable marine groups.
Powerpoint of slides available here.
Marine Census lecture
The Carnegie Capital Science Evening 18 October 2012 was devoted to the Census of Marine Life. Jesse Ausubel’s lecture “Every Fish in the Sea: Findings of the 1st Census of Marine Life” is posted on YouTube.
How accurate are BARCODE databases?
DNA barcode databases are a kind of wikipedia of DNA identifiers, with contributions by thousands of researchers. How accurate are they? How do records that meet the BARCODE standard compare to routine GenBank records? How many BARCODE records represent pseudogenes masquering as their functional counterparts?
In case you missed this, Kevin Kerr and I recently analyzed sequencing error among 11,000 avian BARCODEs representing 2,700 bird species (PLoS ONE e43992 2012), using a frequency matrix approach to look at patterns of variation. As illustrated below, we found that very low frequency nucleotide variants (VLFs) found in single individuals of a species (labeled “singletons” in figure) are strongly concentrated at the ends of the barcode segment, consistent with sequencing error.
In contrast, very low frequency variants found in two or more individuals of a species (labeled “shared” in figure) provided a nice control–these were relatively evenly distributed, consistent with biological origin. Not surprisingly, given that most of the very rare nucleotide variants were associated with amino acid substitutions, very rare amino acid variants showed the same distribution patterns.
In addition to analyzing sequencing error, we closely examined the small fraction (0.1%) of BARCODEs with multiple very low frequency variants shared among individuals of a species. Based on review of trace files deposited as part of BARCODE standard, these unusually divergent versions of COI turned out to be overlooked cryptic pseudogenes lacking stop codons!
We were able to calculate an error rate for the dataset, using observation that most (94%) second codon positions were >99.9% conserved, which meant that nearly all sequencing errors at second position sites would be detectable as very low frequency (<0.1%) variants. The calculated upper limit of sequencing error was 8 x 10-5 errors/nucleotide, which is 1-2 orders of magnitude higher than generally cited for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Overall, we found about 3% of BARCODEs have 1 or more errors (ave 1.4). To our knowledge, this is the first assessment of sequencing error for a large public sequence database with multiple contributors. It might be useful to annotate those records with probable sequencing errors or that represent cryptic pseudogenes; I believe that annotation is possible in BOLD and not in GenBank.
In addition to confirming the high quality of the avian BARCODE database we were able to demonstrate significant quality improvement in avian BARCODE and non-BARCODE COI records deposited in GenBank over the past decade as shown at right (bars indicate 95% confidence interval).
The frequency matrix we describe has potential application for genetic database quality assessment, discovery of cryptic pseudogenes, and studies of low-level variation.
Our results were presented at the Third European Congress for the Barcode of Life (ECBOL3) held at Royal Flemish Academy of Belgium for Sciences and the Arts (KVAB) in Brussels in September (group photo below).
Powerpoint based on PLoS ONE article is available here: freq matrix stoeckle 8nov2012
Ocean Champion
On 26 October 2012, Monmouth University (New Jersey) named Jesse Ausubel the 2012 National Champion of the Ocean. We are honored to join lustrous company and appreciate that the award recognizes the work of the entire Census of Marine Life community. The award ceremony included an excellent seminar organized by Tony MacDonald of the Urban Coastal Institute featuring Admiral (Ret.) Paul Gaffney; Vice Admiral Richard Larrabee, USCG (Ret.), Port Commerce Director, Port Authority of NY/NJ; Lawrence Dickerson, President and CEO, Diamond Offshore Drilling; and Christopher Koch, President and CEO, World Shipping Council. We post Jesse’s talk on Wealth from Oceans.
National Ocean Champion awardees
2012 Jesse H. Ausubel, Co-Founder, Census of Marine Life; Alfred P. Sloan Foundation and The Rockefeller University
2011 Jean-Michel Cousteau, Founder, Ocean Futures Society
2010 Carl Safina, President and Co-Founder, Blue Ocean Institute
2009 Lillian C. Borrone, Former Executive Director, Port Authority of NY/NJ
2008 Representative James Saxton (NJ) and Shirley Pomponi, Executive Director, Harbor Branch Oceanographic Institute
2007 Jerry Schubel, President and CEO of the Aquarium of the Pacific, and Ted Ames, Director of the Lobster Hatchery in Stonington, Maine
2006 Robert Gagosian, past President and Director of the Woods Hole Oceanographic Institution
2005 Admiral James Watkins, Chair of the U.S. Commission on Ocean Policy and the Honorable Leon Panetta, Chair of the Pew Ocean Commission
SSC description
Michael Ojovan, now at Imperial College (London), and Russian colleagues have published an excellent compact description of self-sinking capsules for disposal of hazardous waste and probing the deep Earth. See “The self-disposal option: Self-descending tungsten capsules could be used to dispose of heat-generating high-level waste tens of kilometers below the Earth’s surface.â€
Switch article in MV Gazette
At the new Martha’s Vineyard Film Center, Jesse hosted a well-attended viewing of the energy documentary Switch https://www.switchenergyproject.com on which he advised the film team of Scott Tinker and Harry Lynch. The Vineyard Gazette ran a helpful article, “Shades of gray in energy production,†about the film.
Smithsonian exhibit about Census of Marine Life
The National Museum of Natural History of the Smithsonian Institution in Washington DC has added a small but excellent exhibit about the Census of Marine Life to the Sant Hall of Ocean Life. A couple of dozen CoML alumni participated in a ceremony to welcome the exhibit, which includes the splendid gold medal of the International Cosmos Prize awarded in 2011 to the Census Steering Committee.

European Barcode Conference
With about 130 other experts from 28 countries, Mark Stoeckle and Jesse Ausubel attended the European Consortium for the Barcode of Life (ECBOL), Royal Belgian Institute of Natural Sciences, and Royal Museum for Central Africa 3rd ECBOL conference under the theme “Barcoding of Organisms of Policy Concern†at the Royal Flemish Academy of Belgium for Sciences and the Arts in Brussels. Mark presented his work on very low frequency nucleotide variants. Among many excellent presentations were reports on the flora of Wales, on orchids, and on forensic entomology. Thanks to Marc de Meyer, Thierry Backeljau and Pedro Crous for organizing the meeting.
Forest work reported in The New Republic magazine
Science editor Judith Shulevitz wrote a very good piece covering a lot of our work about forests, farms, and land use titled “Defusing the War of Words Over Organic Food” posted 12 September 2012 in The New Republic magazine.