High school students explore urban environment with DNA

What sorts of DNA can be found in an urban environment? Last year I helped supervise Trinity High School students Brenda Tan and Matt Cost in an investigation of New York City apartments, sidewalks, and supermarkets with DNA barcoding. Brenda and Matt spent 4 months collecting and documenting everyday items that might contain DNA, and delivered specimens to Center for Conservation Genetics, American Museum of Natural History for testing; 151 (70%) of 217 items yielded DNA barcodes, including a feather duster (ostrich), a hot dog from a street vendor (cow), a dog biscuit (American bison), and a fly in a shipment of grapefruit from Texas (Oriental latrine fly Chrysomya megacephala, an invasive species in southern U.S.). Among other surprising results, the student investigators found 95 different animal species, 16% of human and pet food items mislabeled, and a genetically distinct mystery cockroach that might be a new subspecies or species. I encourage you to peruse the Rockefeller University DNAHouse site which includes their narrative and Q+A reports, spreadsheets detailing specimens and results, and high-resolution images, including of cockroach!

DNAHouse-zoo-composite_lg

Following example of 2008 student-led “Sushigate.” Brenda and Matt’s DNAHouse study is capturing wide public interest, including stories in New York Times, New York Post, NPR, NBC TV, and over 230 media sites in 9 languages and 30 countries so far. If high school students can make original discoveries with important regulatory and scientific implications using DNA barcoding, then wide application to food products, products from protected and regulated species, detection of invasive species, and biodiversity surveys, including by interested public, is not far off. The most important for general public is food, and I expect to see growing attention on the part of regulatory agencies, distributers, retailers, and consumers to identifying mislabeled food products using DNA barcodes.

1/1000 animal diversity mapped

F1large2In 16 December 2009 Biol Lett, researchers from University of Guelph, University of British Columbia, and Agriculture and Agri-Food Canada report on COI barcodes for 11,289 individuals representing 1,327 species of Lepidoptera (moths and butterflies) collected in eastern North America. This large collection revealed the same patterns of highly restricted intraspecific variation, uncommon barcode sharing, and overlooked diversity seen in numerous smaller studies: average variation within species was 0.43%, while average among congeneric species was 7.7%, 18-fold higher. Only nine cases (0.7%) of barcode sharing between species were observed, and at the same time, large divergences (>2%) suggesting overlooked taxa were found in 67 (5.1%) of cases (in some cases morphological and ecological differences supporting species status were observed). The survey included multiple individuals per species collected at sites 500 to 2800 km apart, with “no significant increase in genetic distances with geographical separation.” Hebert, deWaard, and Landry conclude “an effective identification system can be constructed for the Lepidoptera fauna of eastern North American without extensive geographical surveys of each species,” and that, given likely similar patterns in most terrestrial and marine fauna, “a comprehensive barcode library for animal life can be assembled rapidly,” with diverse benefits to society and science.

In my view, this study should lay to rest the early and persisting worries of some taxonomists that single gene DNA barcoding would distinguish species only in limited situations. For example, in a 2004 PLoS Biol commentary, Moritz and Cicero cautioned “But to determine when and where [DNA barcoding]…is applicable, we now need to discover the boundary conditions.” The 2009 answer is that there are no major restrictions to wide application of DNA barcoding in animals, taxonomically or geographically, and the one regularly encountered limitation is very young species, which represent a small fraction of recognized taxa even in intensively studied groups. At the same time DNA barcoding speeds taxonomic assessment by flagging genetically distinct forms, many of which are found represent unrecognized species, including species that would likely otherwise remain hidden indefinitely. Together with prior work this study refutes a widely-cited (pre-barcoding) estimate that 23% of animal taxa have shared or overlapping mitochondrial DNA sequences (Funk and Omland Ann Rev Genet 2003); this estimate presumably reflected biases in then-existing databases. In closing, I note that Hebert, deWaard, and Landry offer a new yardstick, namely the fraction of the animal kingdom mapped, lifting our eyes up to the goal of a rapid identification system for all eukaryotic life.

Finding out what lies beneath

dairyWhat lives in soil? In August 2009 Pesq Agropec Bras (open access) an international cohort of 10 researchers from Canada, France, US, Taiwan, and Russia examine prospects for speeding assessment of soil animal diversity with COI DNA barcoding. As test sets, Rougerie and colleagues explore taxonomic and sequence diversity in earthworms and collembolans (springtails). These two groups comprise a similar number of named species (earthworms, 6000; springtails, 7900), but these totals likely underestimate true diversity, particularly for springtails, which are tiny (0.2 – 6.0 mm), challenging collecting and morphology.

For earthworms, the researchers analyzed COI sequences from 457 specimens collected in 13 countries around the world (including North America, South America, Caribbean, Europe, Middle East, Southeast Asia, and Australia); these represented at least 49 genera in 8 families. 87 species were identified by morphology, representing about 1/2 of specimens; the remainder, mostly those from Philippines and Brazil, could not be identified to species. Applying a threshold approach, the researchers found 192 (10% cutoff) and 211 (4% cutoff) genetic clusters, including two or more divergent clusters in 13 (15%) of named species. None of species showed sequence sharing or overlaps.

For springtails, 695 specimens from a similar global distribution of sites were analyzed, representing 88 genera in 16 familes; only 44 species were formally identified with morphologic characters, consistent with the “difficult and poorly known taxonomy of these organisms.” Of note, the authors report that “a specific protocol was developed for [collembolans] so that voucher specimens could be recollected after DNA extraction and thus be used for further morphological examination.” Sequence comparisons showed a “typical…bimodal distribution of intra- versus interspecific divergences such as the one also reported in earthworms.” Applying distance thresholds as above gave 215 (10% cutoff) and 227 (4% cutoff) genetic clusters. In conclusion, efficient assessment of soil animal diversity calls out for DNA barcoding.

On a separate note, soil animals may be useful for understanding mitochondrial evolution. Even factoring in species diversity, these animals appear to have enormous population sizes, given densities up to 4 x 103 earthworms/m2 and 1.8 x 106 collembolans/m3, yet show typical bimodal pattern of intra- << inter-specific variation noted above. Along these lines, in November 2009 Nature Nick Lane explores whys of mitochondrial evolution and speciation, including a possible “radically new picture of mitochondrial genes being tightly regulated by selection.” Stay tuned!

DNA to help skates

In current Aquatic Conserv Marine Freshwater Ecosys, researchers from Muséum national d’Histoire naturelle, France, report on 80 years of taxonomic confusion that has contributed to near extinction for a once abundant north Atlantic skate. Iglésias and colleagues found that two forms, lumped together in 1926 as European common skate (Dipturus batis, Linnaeus 1798), in fact represent distinct species with morphologic, genetic (in mitochondrial genome), and life history differences.

skateAs the researchers report, this taxonomic oversight obscured the disappearance of one species, the flapper skate (D. cf. intermedia) because it was confused with the less threatened  blue skate (D. cf. flossada).  Iglesias and colleagues marketplace survey revealed additional sources of confusion. They analyzed 4,110 skates landed over a 2 year period from 103 fishing cruises in four main French ports by 41 different French commercial trawlers, and found that five skate species (included the two named above) from two genera are variously lumped together under just two marketplace names, the aforementioned “European common skate (D. batis)” and “longnose skate (D. oxyrinchus);” according to their analysis the latter species, formerly common, is also locally extirpated, and most specimens with this name represent other species.

For newly rediscovered blue and flapper skates, the researchers report 20 diagnostic substitutions in approximately 2600-nucleotide segment spanning 12s and 16s RNA. Other than the 10 mitochondrial sequences included in this report, I find only two other D. batis sequences in GenBank (and none as yet under either of two resurrected names). It is remarkable that so little genetic information has been collected for such recently abundant, commercially important (annual landings in 1000’s of tons), and now threatened species. To aid standardized application of molecular identification techniques, I hope the authors will also analyze COI barcode sequences for their specimens. Then I look forward to school children aiding conservation and helping find new species by DNA barcoding specimens from their local fish markets!

Identifying ocean’s racehorses with DNA

frozen-tunaBluefin tuna are enormous (up to 15 ft/4.5 m, 680 kg/1500 lbs), high-speed (up to 54 km/h, as fast as racehorses) creatures that roam across oceans and return to ancestral waters to spawn. High demand has fueled intensive fishing by international fleets, resulting in 90% population declines heading towards extinction for all three species, Southern (Thunnus maccoyii), Northern (T. thynnus), and Pacific bluefin (T. orientalis). This week in PLoS ONE researchers from the American Museum of Natural History describe DNA-based identification of bluefin and other tuna species using character analysis of COI barcode sequences. Lowenstein and colleagues’ report provides a basis for routine identification of marketplace items to inform consumers and enable enforcement of regulations, including a proposed listing as endangered under Convention of International Trade in Endangered Species (CITES).

bluefi1The eight species in genus Thunnus are not discriminated by regularly used nuclear loci and differ by about 1% or less in mitochondrial coding regions (e.g., Ward et al 2005 Phil Trans R Soc B), challenging DNA-based identification. To construct a diagnostic key, Lowenstein and colleagues analyzed 89 COI sequences in GenBank representing the eight tuna species and by visual inspection found 14 sites that provided 17 “compound characteristic attributes (CAs)” (terminology from Sarkar et al 2008 Mol Ecol Res). Turning marketplace detectives, the AMNH team collected 68 sushi samples from 31 establishments in New York and Denver over 6 month period in 2008. Nearly one-third (22; 32%) of samples were sold as species contradicted by the molecular data, including items from over half (19; 61%) of the restaurants.

Lowenstein found their character-based identifications were more accurate and precise than those provided by BOLD ID engine (www.barcodinglife.org), largely reflecting that the ID engine uses a 2% cutoff for assigning specimens to species, which encompasses all eight Thunnus sp.  In addition, BOLD is a workbench for researchers and so contains many as yet unpublished sequences from ongoing studies; these need to be viewed as provisional data. Indeed, in constructing their key Lowenstein and colleagues set aside 2 of the 89 GenBank tuna sequences as these grouped with other species. These anomalous sequences might reflect hybridization or introgression which is reported to occur in 2-3% of Atlantic bluefin, for example (Viñas and Tudela 2009 PLoS ONE). In this study, researchers from Universitat de Girona, Spain and World Wildlife Fund describe a DNA-based method for distinguishing tuna species using mitochondrial control region and nuclear ITS. Here again the method is validated using published data, in this case 42 GenBank records representing the 8 species.  As an aside, I find it remarkable there are so few records that might enable identification of such commercially-important and now endangered species. These two studies establish a scientific and possible legal standard for tuna identification. Now we begin.

Tracking disease vectors with DNA

mosquitofedWhat hosts sustain arthropod disease vectors when they are not biting humans? In September 2009 PLoS ONE, researchers from Doñana Research Station, Seville, Spain, report on a “universal DNA barcoding method to identify vertebrate hosts from arthropod bloodmeals.” The investigators collected “wildlife engorged mosquitoes, culicoids [biting midges] and sand flies (Phlebotomiae)…using CDC traps supplied with dry ice to attract ectoparasites through light and CO2.”

To design vertebrate-specific primers that would not amplify the more abundant arthropod DNA, Alcaide and colleagues “downloaded all vertebrate COI sequences (N = 18,2980 from the Classes Mammalia, Aves, Amphibia, and Reptilia that were available in the public domain managed by BOLD Systems database in January 2009” and compared these to “6,784 arthropod COI sequences from taxonomic groups that included blood-feeding species.” From this comparison they designed degenerate (multiple nucleotides at some positions) primers that were >99% matched to vertebrate target sequences and >99% mismatched to invertebrate targets. It would be helpful in this and other studies if the description of new primer(s) gave the position of the 3′ end of each primer as compared to mouse mitochondrial COI for instance. This would make it clear which portion of the COI barcode region is being amplified.

The first pass test with these primers gave PCR products in 43 of 100 mosquito bloodmeals, and reamplification with a slightly different primer set yielded sequenceable products in 97 of 100 cases; this re-amplification protocol was applied to the other vector species with “satisfactory” results. All except 5 matched at >99% level to vertebrate sequences from museum voucher specimens. For 3 of the uncertain identity sequences, they used the closest BOLD matches and knowledge of local fauna to “deduce that these species could be the Iberian hare Lepus granatensis, the red-legged partridge Alectoris rufa and the Egyptian mongoose Herpestes ichneumon.” The other two without close matches were from ticks collected while still feeding so the hosts were known. By my count they detected 18 mammalian and 26 avian host species in arthropod bloodmeals; to me this is remarkable variety given the relatively small number of bloodmeals tested. I look forward to learning more through DNA tracking of biting arthropods.

Tropical tree identification with DNA

2frenchguianacanopy-405002-swTwo groups of researchers explore tropical forest plots with DNA barcodes in October 2009 PLoS ONE and Proc Natl Acad Sci USA (both open access, the latter Twittered!). It is just three months ago a community standard for DNA barcoding land plants was announced, namely the plastid genes rbcL and matK, with species-level identification in 72% of cases tested and identification to “species groups” in the remainder. The two papers mentioned above represent the early roll-out so we can expect much more will be learned about DNA barcoding in plants in particular and about plant biology in general.

In PLoS ONE, researchers from France, French Guiana, and New York apply DNA barcoding to two 1-hectare plots in the “pristine lowland tropical rainforest” of central French Guiana, which represents one of the largest tracts of intact Amazonian rainforest. Working out of the Nouragues Research Station (“gateway to European rainforest”) Gonzales and colleagues collected leaf and cambium (living outer layer of wood) samples from all trees 10 cm or greater in diameter, with the assistance of professional tree climbers for large trees and use of climbing spikes for smaller specimens. The extreme efforts required to collect morphologically-identifiable specimens highlights the desirability of a DNA-based approach that could be applied nearer to ground level! A total of 1073 trees were sampled, which were sorted into 301 morphospecies; of these, 254 (85%) were “matched to a reference voucher with an acceptable species name…[encompassing] 143 genera and 54 angiosperm families, so that is a lot of tree diversity! For comparison there about 1000 native tree species in all of North America. PCR was carried out for multiple loci: in addition to above-mentioned standards rbcL and matK, these included plastid genes rpoC1, rpoB, and ycf5, non-coding trnL and psb-trnH, and nuclear ITS. The researchers also applied DNA barcoding to “juveniles” i.e. saplings in the same plots, of which just 27% could be identified to species, plus another 45% to morphotype, and 11% to genus (this leaves 17% not identified to genus). Not surprisingly given the diversity of species, sample types, markers, and uncertainties in the underlying taxonomy, the researchers’ results are complex. Regarding tissue types, they obtained amplifiable DNA from most or all leaf and cambium samples, with high success for some markers (e.g., rbcL sequencing rate 93%), supporting ground-level sampling strategies. Regarding markers, they had difficulty amplifying matK (68% success) and ITS (41%). Similar to prior observations, the overall rate for species-level identification using plastid markers plateaued at about 70%, thus two loci capture most of what is available from this genetic compartment.

In Proc Natl Acad Sci USA, researchers from Smithsonian Institution, Smithsonian Tropical Research Institute (STRI), Imperial College, and Harvard University apply DNA barcoding with rbcLmatK, and trnH-psbA to 1035 tree samples representing 296 species in STRI’s 1,000 x 500 m Forest Dynamics Plot on Barro Colorado Island, Panama. They had similar sequencing success to Guiana study (rbcL, 93%; trnH-psbA, 94%; matK, 69%). Overall success at species-level identification was 92% for rbcL + matK; 95% for rbcL + trnH-psbA, and 98% for all three markers, with the denominator in these comparisons apparently being #samples with available sequences. I am uncertain as to why species-level identification was higher in Panama as compared to Guiana study;  the total number of samples and species is similar so presumably this reflects particular aspects of the species composition such as recent radiations in these locations. Kress and colleagues constructed a supermatrix with this data, generating a “robust community phylogeny for 281 of the 296 species in the plot.” They conclude “DNA barcodes stand poised to serve as an efficient and effective approach to building community phylogenies…[aiding] understanding niche conservation and the dynamics of species composition at landscape and global scales.” Sounds promising!

World species census updated

How many species are there? One widely cited estimate, now 24 years old, is 1.7 million named species (EO Wilson 1985  Science 230:1227). This estimate is updated in detailed form in September 2009 publication from Australian Government “Numbers of Living Species in Australia and the World, 2nd edition” by Arthur Chapman (illustrated report open access for perusing online or as pdf for download). According to Chapman’s analysis, there are 1.9 million published species in the world. Approximately 18,000 new species are described each year, 75% of which are invertebrates, 11% vascular plants, and 7% vertebrates. Chapman estimates the true number of world species is about 11 million. The largest uncertainties, for which it is estimated fewer than 10% of species have been named, are for fungi, single-celled eukaryotes (protocista, cyanophyta, chromista), and “prokaryotes”, i.e. eubacteria and archaea.

chapman

This overview brings to mind pictures of the distribution of matter and dark matter in the universe. On a large scale, is the “density” of species uniform? For example, given there about about 10,000 bird and about 40,000 fish species, do fish take up 4x as much diversity space? We know on a small scale there are some “high-density” closely-related groups of species, like cichlid fishes in Africa, but can we map the distribution of diversity on a larger scale? Large databases of homologous sequences representing diverse species (aka DNA barcodes; as of today, BOLD has over 700,000 records representing over 64,000 species) and new mathematical approaches to calculating diversity from nucleotide sequences (eg Sirovich 2009 PLoS ONE; I am co-author) may help provide a biological macroscope (Ausubel PNAS 2009) for understanding the genetic structure of biodiversity, complementary to the historical view expressed in the Tree of Life.

Finding out what small herbivores eat

What do animals eat? For many animals other than large, diurnal, terrestrial species, this is surprisingly hard to study. In August 2009 Frontiers Zool researchers from Norway and France apply standardized DNA analysis, and compare with microscopic techniques, for diets of two arctic voles, Microtus oeconomus (Tundra vole) and Myodes rufocanus (Grey red-backed vole) collected in July and September in northern Norway. Soininen and colleagues analyzed stomach contents of 48 individuals using a microscope and a DNA sequencer, the latter to analyze amplified P6 loop (length 10-46 bp) of chloroplast trnL intron. As previously described by some of the same authors (Taberlet et al 2007) P6 loop is amplifiable from diverse gymnosperms and angiosperms with a single set of primers, however not surprisingly this very short segment often does not provide species-level identification even with local flora.

varanger2

For microhistological analysis, the authors first prepared a photographic guide by collecting samples of all vascular plant species in study area; the samples were dried, scraped to reveal epidermis, bleached, boiled in table vinegar, then 40x micrographs were taken. Stomach content samples were filtered, bleached, and 1 droplet was examined on a microscope slide, counting 25 bits of identifiable material; if >95% of material was unidentifiable, a new slide was prepared. In 4 individuals, no slide with adequate amount of microscopically identifiable material count be made. For DNA analysis, The P6 loop was amplified, using tagged primers that identified each individual, and the pooled material was analyzed by pyrosequencing, and the sequences were compared to a database of 842 species representing “all widespread and/or ecologically important taxa of the arctic flora”. With standardized DNA approach (the authors call this DNA barcoding although it does not use recently agreed-upon standard loci) “75% of sequences were identified at least to genus level, whereas with microhistological method, less than 20% of the identified fragments could be specified at this level”.

As a result of greater resolution as compared to microscopy, DNA identified more plant species and genera in vole diets (for M. oeconomus, 13 species/9 genera vs 9 species/5 genera; for M. rufocanus 17 species/8 genera vs 11/7). Both methods showed large variation among individuals. Limitations to DNA approach include possible overrepresentation of species with chloroplast-rich tissues and inability of P6 to detect fungi, horsetails, and mosses. Looking ahead, researchers conclude “DNA-based technology makes it possible to study vole-plant interaction by non-destructive sampling of faeces in the natural habitats of voles”, first identifying rodent species using a mitochondrial DNA marker (and potentially sex and individual identification with Y-chromosome and microsatellite detection) and then diet analysis. I conclude standardized DNA analysis opens wide avenues for ecology.

Counting zooplankton diversity with DNA

net1Marine zooplankton comprise an enormous mass of diverse organisms distributed throughout the world’s oceans from deep waters to surface. Zooplankton include representatives of at least dozen phyla, some of which are larval forms of much larger animals, and challenge identification with their diversity and tiny size.  In current BMC Genomics (open access) researchers from University of Tokyo and Osaka Medical College, as part of Census of Marine Zooplankton (CMarZ) program of the Census of Marine Life (CoML),  apply single-gene sequencing to the task. Machida and colleagues collected at a Micronesia site using a single pass with 2m^2 plankton net from depth of 721 meters to surface, obtaining 60 mL of of zooplankton (large organisms, up to 4 cm, were discarded). Rather than direct DNA sequencing, the researchers isolated mRNA from the pooled sample and constructed a cDNA library from which they analyzed 1,336 inserts. The rationale for these extra steps was to avoid sequencing pseudogenes present in genomic DNA (but not transcribed into mRNA). It would be interesting to know if this strategy was based on experience or is a theoretical precaution.

1471-2164-10-438-18Machida and colleagues found evidence for 189 species, only 10 of which could be confidently matched to reference sequences. This report demonstrates that this sort of “kitchen blender” approach, which has previously been applied largely to bacterial and archaeal communities, shows promise for assemblages of eukaryotes and reveals surprisingly few organisms have reference sequences in databases. Identified organisms included several copepods as well as presumably larval forms of Sthenoteuthis oualaniensis (Purple-back flying squid) and Coryphaena hippurus (Common dolphinfish)!

Species identification by DNA opens major avenues for for ecosystem research. The NJ tree at left suggests that even in absence of close matches, 500 bp of mtDNA is sufficient to sort most specimens into appropriate higher-level groups. To better understand the changing oceans, we need biological monitoring machines akin to physical instruments for studying weather and climate, which routinely monitor thousands of sites. It seems to me the only practical way to monitor biological “weather” is by repeatedly sampling species assemblages at multiple points, and particularly in aqueous environments, automated species identification with DNA will be an important analytic method.