DNA Barcoding – Page 17 – The Rockefeller University

Identifying forensic flies with DNA

February 9, 2010

800px-Sarcophaga_nodosa In forensic investigation, insect evidence helps date the time of death, as the various species that colonize corpses exhibit different stages of development according to time and temperature. Determining the post-mortem interval (PMI) rests on accurate species identification, including of immature forms. In Dec 2009 Int J Legal Med researchers from University of Wollongong, Australia, test DNA-based identification of Sarcophagidae flies, which lack distinguishing features as immature forms, and their adult identification requires “meticulous examination of subtle morphological differences, including regional hair presence and colour, body pigmentation and bristle length, placement and abundance”, and even then may need genitalic dissection for confirmation. As a result, sarcophagid flies are little used in forensic study, although being viviparous, they are “prospectively more reliable for PMI estimations compared with other initial dipteran colonisers” [the latter are mostly egg-laying species (e.g. callophorid blowflies), which hatch only if certain environmental conditions are met, adding uncertainty to PMI determinations].

The researchers successfully recovered COI barcodes, without evidence of pseudogenes, from 85 adult specimens representing 16 species, using a single primer pair with degenerate bases previously applied to forensic blowflies (Nelson et al 2007 Med Vet Entomol). In NJ analysis, 14 of 16 species showed single clusters distinct from other species; the remaining 2 species showed deep divergences which the authors surmise may indicate cryptic species, perhaps more likely given that “taxonomic descriptions of the Australian Sarcophagidae have not been updated since the 1950s”.

Meikeljohn and colleagues demonstrate efficacy of COI barcodes as species-level identifiers for Australian sarcophagids. The tight intra-specific clustering in these flies appears identical to that seen in diverse animal groups including vertebrates, for example, yet flies are presumably several orders of magnitude more abundant. (As an aside, although the authors report their sequences and associated specimen data are deposited in BOLD, their data are not visible in “Public Projects”–I hope the authors will amend this.) What then limits mitochondrial variation within species? Or in the language of population genetics, why are effective population sizes for animal species uniformly small, unrelated to census population sizes? Like the nature of dark matter, explanation(s) await.

Addendum 11 Feb 2010: Dr. Meikeljohn reports that the sequences and associated data are scheduled to appear in BOLD and NCBI GenBank as soon article appears in print edition.

Trans-Atlantic DNA survey reveals overlooked avian diversity in scientific heartland

January 31, 2010

Untitled In January 2010 J Ornithol (open access article) researchers from Norway Natural History Museum, Swedish Museum of Natural History, University of Guelph, and Rockefeller University (myself) survey mitochondrial differences in 296 species representing 97-98% of Scandanavian breeding birds. 283 (95.6%) of species formed unique clusters; the remaining 13 species formed 5 clusters consisting of 2-4 species with shared or overlapping barcodes, which might reflect young species, hybridization with introgression, and/or a single gene pool. Surprisingly for such a relatively small geographic area, large sequence differences were found in 4 species, all of which have large breeding ranges that extend outside of Scandanavia; the authors propose these represent “a mixture of separate lineages that evolved in allopatry” and advise further sampling to “elucidate the phylogeographic history”.

Johnsen et al take advantage of existing barcode library to compare species whose breeding ranges extend across the Atlantic. 19 (25%) of the 78 showed intercontinental divergences typical of species-level differences, including 8 species that had not been identified in prior work (data re-compiled in figure below). Most were inland species with discontinuous breeding ranges but there were unexpected exceptions such as Steller’s eider (Polysticta stelleri), which has what appears to be a continuous circumpolar breeding range. Three of the species formed paraphyletic clusters when combined with N American congeners, suggesting the inter-continental “conspecifics” are not even each other’s closest relatives.

Untitled6a In my view, this paper demonstrates that a survey approach produces a high level of discovery and hypothesis-generating, and leads me to question how well we understand diversity in birds, which are generally considered the taxonomically best-known large group of animals. Many of the species in the present study have been known to science for over 250 years, are resident in densely-settled, scientifically-advanced regions, and yet Johnsen and colleagues demonstrate hidden diversity. In 1946, Ernst Mayr compiled a world list of 8,616 species, which he judged to be “within 5 percent and certainly 10% of the final total”. The current IOC World Bird List v 2.3 recognizes 10,322 species (19% higher than Mayr’s estimate) and there is a steady stream of splits of existing forms, fueled by DNA sequence data. I believe DNA barcoding offers a way complete this process in a timely manner. If we analyzed multiple individuals from each of world’s named species, there would still be many areas of uncertainty, but at least the larger differences would be known. It is a scientific embarrassment that we are still discovering lineages that have been reproductively isolated for millions of years, in everyday birds no less!

There are over 300,000 avian tissue samples in the world’s museums, representing over 7,000 species (Stoeckle and Winker, Auk 2009). By my calculation, a modest number of these have been analyzed to date for species-level differences. For instance, by my count GenBank contains 13,361 cytochrome b sequences representing 4,320 avian species, and the All Birds Barcoding Initiative (ABBI) has so far collected 17,250 sequences representing 2,969 species. A concerted project of the world’s avian tissue collections employing DNA barcoding approach suggests an unmatched opportunity for large-scale, species-level genetics with many discoveries and hypothesis-generating findings which will inform various areas in evolutionary science. For instance, population genetics modeling starts with correctly identifying breeding populations (ie species). These samples may be eventually be analyzed in small batches, assuming they are not lost or destroyed, but the pace of standard research practices brings to mind the story of the Dead Sea Scrolls. Some were published soon after discovery in 1946, but the rest fell under the control of a committee of scholars and remained hidden not only from public but from other scholars for more than 40 years. When the monopoly was broken in 1991 (by researchers using a desktop computer to reconstruct texts from published concordances), some complained:

“Dr. Frank M. Cross, a scholar at the Harvard Divinity School who has worked with the scrolls since the 1950’s, said in a telephone interview that the publication of these unauthorized versions, which he described as “pirated,” would have no effect on the pace and publication schedules involving the actual scrolls. He defended his colleagues from the frequent charges of undue secrecy and procrastination, saying the critics did not understand the difficulties of working with the remaining unpublished documents that are mostly a collection of fragile fragments of parchment.” New York Times September 5, 1991

Names

January 22, 2010

In Systema Natura 250 (Andrew Polaszek, ed; CRC Press), a new collection of essays on the state of taxonomy, David Schindel and Scott Miller address how to speed up “naming” of specimens without causing chaos, in chapter entitled “Provisional nomenclature: The on-ramp to taxonomic names.” The authors observe the increasing numbers of undescribed and undescribable specimens (eg fragments, mixed environmental samples) and propose to standardize provisional names (preferred designation of these standardized hu7 temporary placeholders is “taxon label”). As they note, there are many provisional names in GenBank (e.g. Ocyptamus sp. MZH S143_2004), so this is not a change in usual practice, except that the format of provisional names is standardized. As a starting point, Schindel and Miller propose a scheme developed by Council of the Heads of Australian Herbariums (CHAH) and recommend review by Biodiversity Information Standards (TDWG). The CHAH format is:

Genus_name sp. Locality (Voucher identifier) Source, where “(Voucher-specimen identifier) is a two-part field consisting of a collector’s name and the voucher specimen number attached to the exemplar of the taxon concept,” and “Source refers to the name of the concept’s proposer.” Regarding sequence data as identifiers, such labels could be generated by a clustering algorithm for DNA barcodes for example. Schindel and Miller discuss short and long-term advantages to taxonomic workflow, academic credit, and scientific sharing.

A standardized format for provisional names is a simple, powerful proposal with many downstream benefits. I hope TDWG will adopt!

Deciphering tropical vines with DNA

January 16, 2010

In Dec 2009 Am J Botany researchers from University of Texas apply DNA to sort out species limits of tropical vines in genus Psiguria, part of cucurbit family (Cucurbitaceae) that includes melons, gourds, cucumbers, pumpkins, and squash. Psiguria sp. tax morphological identification due to sometimes drastic changes in leaf and flower structure within and over lifetime of individuals, and frequent absence of male and/or female flowers (approximately 15% of herbarium sheets contain no flowering parts). International Plant Names Index (IPNI) lists 17 species in the genus, although some names are applied to type specimens only. As an aside, IPNI is a model for a dynamic taxonomic names database, containing “names and associated basic bibliographical details of seed plants, ferns and fern allies.” From the website: “[IPNI’s]… goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. The data are freely available and are gradually being standardized and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community. IPNI is the product of a collaboration between The Royal Botanic Gardens, Kew, The Harvard University Herbaria, and the Australian National Herbarium.”

Steele and colleagues analyzed 70 Psiguria specimens representing 6 named species, and 14 from closely-related genera, obtained from 9 herbaria in US, Bolivia, and Germany. The target regions were 8 chloroplast intergenic spacers, plus nuclear ITS1, ITS2, and a serine/threonine phophatase intron that prior work suggested might be helpful, for a total length of about 7.7 kb chloroplast and 2 kb nuclear DNA. The authors surmise that the standard barcodes for land plants, namely coding regions of chloroplast genes rbcL and matK, are unlikely to be effective in discriminating Psiguria sp. vines. This may well be true but I hope they will determine rbcL and matK for their well-documented specimens, as this is essential first-pass information for a standardized identification system. Most non-specialists testing an unknown root or leaf will not know if it is a Psiguria sp or even a member of cucurbit family.

On the basis of combined morphologic and molecular data, the researchers conclude the six Psigura species are valid, with caveat that “the molecular results may suggest more than six species” and so “future collections of Psiguria and additional sequencing of molecular markers may contribute to the discovery of additional species.” Finally, Steele and colleagues characterize what they call DNA barcodes, in this case diagnostic nucleotides that uniquely identify the 6 species (1-5 diagnostic nucleotides per species, distributed across 5 chloroplast intergenic spacers). Identification of species-level diagnostic nucleotide characters, together with the relevant primer and amplification protocols, as done here, is a welcome addition to the more usual phylogenetic analysis. However, as mentioned above, for this information to fit into a standardized approach, sequences for the defined markers rbcL and matK are also needed, because that is what will be tested first, except in the minority of situations where the operator already knows the unknown is one of the six Psiguria species.

High school students explore urban environment with DNA

January 4, 2010

What sorts of DNA can be found in an urban environment? Last year I helped supervise Trinity High School students Brenda Tan and Matt Cost in an investigation of New York City apartments, sidewalks, and supermarkets with DNA barcoding. Brenda and Matt spent 4 months collecting and documenting everyday items that might contain DNA, and delivered specimens to Center for Conservation Genetics, American Museum of Natural History for testing; 151 (70%) of 217 items yielded DNA barcodes, including a feather duster (ostrich), a hot dog from a street vendor (cow), a dog biscuit (American bison), and a fly in a shipment of grapefruit from Texas (Oriental latrine fly Chrysomya megacephala, an invasive species in southern U.S.). Among other surprising results, the student investigators found 95 different animal species, 16% of human and pet food items mislabeled, and a genetically distinct mystery cockroach that might be a new subspecies or species. I encourage you to peruse the Rockefeller University DNAHouse site which includes their narrative and Q+A reports, spreadsheets detailing specimens and results, and high-resolution images, including of cockroach!

DNAHouse-zoo-composite_lg

Following example of 2008 student-led “Sushigate.” Brenda and Matt’s DNAHouse study is capturing wide public interest, including stories in New York Times, New York Post, NPR, NBC TV, and over 230 media sites in 9 languages and 30 countries so far. If high school students can make original discoveries with important regulatory and scientific implications using DNA barcoding, then wide application to food products, products from protected and regulated species, detection of invasive species, and biodiversity surveys, including by interested public, is not far off. The most important for general public is food, and I expect to see growing attention on the part of regulatory agencies, distributers, retailers, and consumers to identifying mislabeled food products using DNA barcodes.

1/1000 animal diversity mapped

December 24, 2009

F1large2 In 16 December 2009 Biol Lett, researchers from University of Guelph, University of British Columbia, and Agriculture and Agri-Food Canada report on COI barcodes for 11,289 individuals representing 1,327 species of Lepidoptera (moths and butterflies) collected in eastern North America. This large collection revealed the same patterns of highly restricted intraspecific variation, uncommon barcode sharing, and overlooked diversity seen in numerous smaller studies: average variation within species was 0.43%, while average among congeneric species was 7.7%, 18-fold higher. Only nine cases (0.7%) of barcode sharing between species were observed, and at the same time, large divergences (>2%) suggesting overlooked taxa were found in 67 (5.1%) of cases (in some cases morphological and ecological differences supporting species status were observed). The survey included multiple individuals per species collected at sites 500 to 2800 km apart, with “no significant increase in genetic distances with geographical separation.” Hebert, deWaard, and Landry conclude “an effective identification system can be constructed for the Lepidoptera fauna of eastern North American without extensive geographical surveys of each species,” and that, given likely similar patterns in most terrestrial and marine fauna, “a comprehensive barcode library for animal life can be assembled rapidly,” with diverse benefits to society and science.

In my view, this study should lay to rest the early and persisting worries of some taxonomists that single gene DNA barcoding would distinguish species only in limited situations. For example, in a 2004 PLoS Biol commentary, Moritz and Cicero cautioned “But to determine when and where [DNA barcoding]…is applicable, we now need to discover the boundary conditions.” The 2009 answer is that there are no major restrictions to wide application of DNA barcoding in animals, taxonomically or geographically, and the one regularly encountered limitation is very young species, which represent a small fraction of recognized taxa even in intensively studied groups. At the same time DNA barcoding speeds taxonomic assessment by flagging genetically distinct forms, many of which are found represent unrecognized species, including species that would likely otherwise remain hidden indefinitely. Together with prior work this study refutes a widely-cited (pre-barcoding) estimate that 23% of animal taxa have shared or overlapping mitochondrial DNA sequences (Funk and Omland Ann Rev Genet 2003); this estimate presumably reflected biases in then-existing databases. In closing, I note that Hebert, deWaard, and Landry offer a new yardstick, namely the fraction of the animal kingdom mapped, lifting our eyes up to the goal of a rapid identification system for all eukaryotic life.

Finding out what lies beneath

December 14, 2009

dairy What lives in soil? In August 2009 Pesq Agropec Bras (open access) an international cohort of 10 researchers from Canada, France, US, Taiwan, and Russia examine prospects for speeding assessment of soil animal diversity with COI DNA barcoding. As test sets, Rougerie and colleagues explore taxonomic and sequence diversity in earthworms and collembolans (springtails). These two groups comprise a similar number of named species (earthworms, 6000; springtails, 7900), but these totals likely underestimate true diversity, particularly for springtails, which are tiny (0.2 – 6.0 mm), challenging collecting and morphology.

For earthworms, the researchers analyzed COI sequences from 457 specimens collected in 13 countries around the world (including North America, South America, Caribbean, Europe, Middle East, Southeast Asia, and Australia); these represented at least 49 genera in 8 families. 87 species were identified by morphology, representing about 1/2 of specimens; the remainder, mostly those from Philippines and Brazil, could not be identified to species. Applying a threshold approach, the researchers found 192 (10% cutoff) and 211 (4% cutoff) genetic clusters, including two or more divergent clusters in 13 (15%) of named species. None of species showed sequence sharing or overlaps.

For springtails, 695 specimens from a similar global distribution of sites were analyzed, representing 88 genera in 16 familes; only 44 species were formally identified with morphologic characters, consistent with the “difficult and poorly known taxonomy of these organisms.” Of note, the authors report that “a specific protocol was developed for [collembolans] so that voucher specimens could be recollected after DNA extraction and thus be used for further morphological examination.” Sequence comparisons showed a “typical…bimodal distribution of intra- versus interspecific divergences such as the one also reported in earthworms.” Applying distance thresholds as above gave 215 (10% cutoff) and 227 (4% cutoff) genetic clusters. In conclusion, efficient assessment of soil animal diversity calls out for DNA barcoding.

On a separate note, soil animals may be useful for understanding mitochondrial evolution. Even factoring in species diversity, these animals appear to have enormous population sizes, given densities up to 4 x 10³ earthworms/m² and 1.8 x 10⁶ collembolans/m³, yet show typical bimodal pattern of intra- << inter-specific variation noted above. Along these lines, in November 2009 Nature Nick Lane explores whys of mitochondrial evolution and speciation, including a possible “radically new picture of mitochondrial genes being tightly regulated by selection.” Stay tuned!

DNA to help skates

December 2, 2009

In current Aquatic Conserv Marine Freshwater Ecosys, researchers from Muséum national d’Histoire naturelle, France, report on 80 years of taxonomic confusion that has contributed to near extinction for a once abundant north Atlantic skate. Iglésias and colleagues found that two forms, lumped together in 1926 as European common skate (Dipturus batis, Linnaeus 1798), in fact represent distinct species with morphologic, genetic (in mitochondrial genome), and life history differences.

skate As the researchers report, this taxonomic oversight obscured the disappearance of one species, the flapper skate (D. cf. intermedia) because it was confused with the less threatened blue skate (D. cf. flossada). Iglesias and colleagues marketplace survey revealed additional sources of confusion. They analyzed 4,110 skates landed over a 2 year period from 103 fishing cruises in four main French ports by 41 different French commercial trawlers, and found that five skate species (included the two named above) from two genera are variously lumped together under just two marketplace names, the aforementioned “European common skate (D. batis)” and “longnose skate (D. oxyrinchus);” according to their analysis the latter species, formerly common, is also locally extirpated, and most specimens with this name represent other species.

For newly rediscovered blue and flapper skates, the researchers report 20 diagnostic substitutions in approximately 2600-nucleotide segment spanning 12s and 16s RNA. Other than the 10 mitochondrial sequences included in this report, I find only two other D. batis sequences in GenBank (and none as yet under either of two resurrected names). It is remarkable that so little genetic information has been collected for such recently abundant, commercially important (annual landings in 1000’s of tons), and now threatened species. To aid standardized application of molecular identification techniques, I hope the authors will also analyze COI barcode sequences for their specimens. Then I look forward to school children aiding conservation and helping find new species by DNA barcoding specimens from their local fish markets!

Identifying ocean’s racehorses with DNA

November 19, 2009

frozen-tuna Bluefin tuna are enormous (up to 15 ft/4.5 m, 680 kg/1500 lbs), high-speed (up to 54 km/h, as fast as racehorses) creatures that roam across oceans and return to ancestral waters to spawn. High demand has fueled intensive fishing by international fleets, resulting in 90% population declines heading towards extinction for all three species, Southern (Thunnus maccoyii), Northern (T. thynnus), and Pacific bluefin (T. orientalis). This week in PLoS ONE researchers from the American Museum of Natural History describe DNA-based identification of bluefin and other tuna species using character analysis of COI barcode sequences. Lowenstein and colleagues’ report provides a basis for routine identification of marketplace items to inform consumers and enable enforcement of regulations, including a proposed listing as endangered under Convention of International Trade in Endangered Species (CITES).

bluefi1 The eight species in genus Thunnus are not discriminated by regularly used nuclear loci and differ by about 1% or less in mitochondrial coding regions (e.g., Ward et al 2005 Phil Trans R Soc B), challenging DNA-based identification. To construct a diagnostic key, Lowenstein and colleagues analyzed 89 COI sequences in GenBank representing the eight tuna species and by visual inspection found 14 sites that provided 17 “compound characteristic attributes (CAs)” (terminology from Sarkar et al 2008 Mol Ecol Res). Turning marketplace detectives, the AMNH team collected 68 sushi samples from 31 establishments in New York and Denver over 6 month period in 2008. Nearly one-third (22; 32%) of samples were sold as species contradicted by the molecular data, including items from over half (19; 61%) of the restaurants.

Lowenstein found their character-based identifications were more accurate and precise than those provided by BOLD ID engine (www.barcodinglife.org), largely reflecting that the ID engine uses a 2% cutoff for assigning specimens to species, which encompasses all eight Thunnus sp. In addition, BOLD is a workbench for researchers and so contains many as yet unpublished sequences from ongoing studies; these need to be viewed as provisional data. Indeed, in constructing their key Lowenstein and colleagues set aside 2 of the 89 GenBank tuna sequences as these grouped with other species. These anomalous sequences might reflect hybridization or introgression which is reported to occur in 2-3% of Atlantic bluefin, for example (Viñas and Tudela 2009 PLoS ONE). In this study, researchers from Universitat de Girona, Spain and World Wildlife Fund describe a DNA-based method for distinguishing tuna species using mitochondrial control region and nuclear ITS. Here again the method is validated using published data, in this case 42 GenBank records representing the 8 species. As an aside, I find it remarkable there are so few records that might enable identification of such commercially-important and now endangered species. These two studies establish a scientific and possible legal standard for tuna identification. Now we begin.

Tracking disease vectors with DNA

November 7, 2009

mosquitofed What hosts sustain arthropod disease vectors when they are not biting humans? In September 2009 PLoS ONE, researchers from Doñana Research Station, Seville, Spain, report on a “universal DNA barcoding method to identify vertebrate hosts from arthropod bloodmeals.” The investigators collected “wildlife engorged mosquitoes, culicoids [biting midges] and sand flies (Phlebotomiae)…using CDC traps supplied with dry ice to attract ectoparasites through light and CO2.”

To design vertebrate-specific primers that would not amplify the more abundant arthropod DNA, Alcaide and colleagues “downloaded all vertebrate COI sequences (N = 18,2980 from the Classes Mammalia, Aves, Amphibia, and Reptilia that were available in the public domain managed by BOLD Systems database in January 2009” and compared these to “6,784 arthropod COI sequences from taxonomic groups that included blood-feeding species.” From this comparison they designed degenerate (multiple nucleotides at some positions) primers that were >99% matched to vertebrate target sequences and >99% mismatched to invertebrate targets. It would be helpful in this and other studies if the description of new primer(s) gave the position of the 3′ end of each primer as compared to mouse mitochondrial COI for instance. This would make it clear which portion of the COI barcode region is being amplified.

The first pass test with these primers gave PCR products in 43 of 100 mosquito bloodmeals, and reamplification with a slightly different primer set yielded sequenceable products in 97 of 100 cases; this re-amplification protocol was applied to the other vector species with “satisfactory” results. All except 5 matched at >99% level to vertebrate sequences from museum voucher specimens. For 3 of the uncertain identity sequences, they used the closest BOLD matches and knowledge of local fauna to “deduce that these species could be the Iberian hare Lepus granatensis, the red-legged partridge Alectoris rufa and the Egyptian mongoose Herpestes ichneumon.” The other two without close matches were from ticks collected while still feeding so the hosts were known. By my count they detected 18 mammalian and 26 avian host species in arthropod bloodmeals; to me this is remarkable variety given the relatively small number of bloodmeals tested. I look forward to learning more through DNA tracking of biting arthropods.

Rockefeller University

Program for the Human Environment

Area of Research: DNA Barcoding