Tropical tree identification with DNA

2frenchguianacanopy-405002-swTwo groups of researchers explore tropical forest plots with DNA barcodes in October 2009 PLoS ONE and Proc Natl Acad Sci USA (both open access, the latter Twittered!). It is just three months ago a community standard for DNA barcoding land plants was announced, namely the plastid genes rbcL and matK, with species-level identification in 72% of cases tested and identification to “species groups” in the remainder. The two papers mentioned above represent the early roll-out so we can expect much more will be learned about DNA barcoding in plants in particular and about plant biology in general.

In PLoS ONE, researchers from France, French Guiana, and New York apply DNA barcoding to two 1-hectare plots in the “pristine lowland tropical rainforest” of central French Guiana, which represents one of the largest tracts of intact Amazonian rainforest. Working out of the Nouragues Research Station (“gateway to European rainforest”) Gonzales and colleagues collected leaf and cambium (living outer layer of wood) samples from all trees 10 cm or greater in diameter, with the assistance of professional tree climbers for large trees and use of climbing spikes for smaller specimens. The extreme efforts required to collect morphologically-identifiable specimens highlights the desirability of a DNA-based approach that could be applied nearer to ground level! A total of 1073 trees were sampled, which were sorted into 301 morphospecies; of these, 254 (85%) were “matched to a reference voucher with an acceptable species name…[encompassing] 143 genera and 54 angiosperm families, so that is a lot of tree diversity! For comparison there about 1000 native tree species in all of North America. PCR was carried out for multiple loci: in addition to above-mentioned standards rbcL and matK, these included plastid genes rpoC1, rpoB, and ycf5, non-coding trnL and psb-trnH, and nuclear ITS. The researchers also applied DNA barcoding to “juveniles” i.e. saplings in the same plots, of which just 27% could be identified to species, plus another 45% to morphotype, and 11% to genus (this leaves 17% not identified to genus). Not surprisingly given the diversity of species, sample types, markers, and uncertainties in the underlying taxonomy, the researchers’ results are complex. Regarding tissue types, they obtained amplifiable DNA from most or all leaf and cambium samples, with high success for some markers (e.g., rbcL sequencing rate 93%), supporting ground-level sampling strategies. Regarding markers, they had difficulty amplifying matK (68% success) and ITS (41%). Similar to prior observations, the overall rate for species-level identification using plastid markers plateaued at about 70%, thus two loci capture most of what is available from this genetic compartment.

In Proc Natl Acad Sci USA, researchers from Smithsonian Institution, Smithsonian Tropical Research Institute (STRI), Imperial College, and Harvard University apply DNA barcoding with rbcLmatK, and trnH-psbA to 1035 tree samples representing 296 species in STRI’s 1,000 x 500 m Forest Dynamics Plot on Barro Colorado Island, Panama. They had similar sequencing success to Guiana study (rbcL, 93%; trnH-psbA, 94%; matK, 69%). Overall success at species-level identification was 92% for rbcL + matK; 95% for rbcL + trnH-psbA, and 98% for all three markers, with the denominator in these comparisons apparently being #samples with available sequences. I am uncertain as to why species-level identification was higher in Panama as compared to Guiana study;  the total number of samples and species is similar so presumably this reflects particular aspects of the species composition such as recent radiations in these locations. Kress and colleagues constructed a supermatrix with this data, generating a “robust community phylogeny for 281 of the 296 species in the plot.” They conclude “DNA barcodes stand poised to serve as an efficient and effective approach to building community phylogenies…[aiding] understanding niche conservation and the dynamics of species composition at landscape and global scales.” Sounds promising!

World species census updated

How many species are there? One widely cited estimate, now 24 years old, is 1.7 million named species (EO Wilson 1985  Science 230:1227). This estimate is updated in detailed form in September 2009 publication from Australian Government “Numbers of Living Species in Australia and the World, 2nd edition” by Arthur Chapman (illustrated report open access for perusing online or as pdf for download). According to Chapman’s analysis, there are 1.9 million published species in the world. Approximately 18,000 new species are described each year, 75% of which are invertebrates, 11% vascular plants, and 7% vertebrates. Chapman estimates the true number of world species is about 11 million. The largest uncertainties, for which it is estimated fewer than 10% of species have been named, are for fungi, single-celled eukaryotes (protocista, cyanophyta, chromista), and “prokaryotes”, i.e. eubacteria and archaea.

chapman

This overview brings to mind pictures of the distribution of matter and dark matter in the universe. On a large scale, is the “density” of species uniform? For example, given there about about 10,000 bird and about 40,000 fish species, do fish take up 4x as much diversity space? We know on a small scale there are some “high-density” closely-related groups of species, like cichlid fishes in Africa, but can we map the distribution of diversity on a larger scale? Large databases of homologous sequences representing diverse species (aka DNA barcodes; as of today, BOLD has over 700,000 records representing over 64,000 species) and new mathematical approaches to calculating diversity from nucleotide sequences (eg Sirovich 2009 PLoS ONE; I am co-author) may help provide a biological macroscope (Ausubel PNAS 2009) for understanding the genetic structure of biodiversity, complementary to the historical view expressed in the Tree of Life.

A Scalable Method for Analysis and Display of DNA Sequences

Together with colleagues at Mt. Sinai School of Medicine, we report a new mathematical approach to the genetic structure of biodiversity, using indicator vectors calculated from short DNA sequences. Sirovich L, Stoeckle MY, Zhang Y (2009) A Scalable Method for Analysis and Display of DNA Sequences. PLoS ONE 4(10): e7051. This method is scalable to the largest datasets envisioned in this field and provides a macroscopic view of “biodiversity space”. It offers a complement to tree-building techniques and could enable automated classification at various taxonomic levels.

From the Abstract:

The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA data.

To download zip files containing MatLab code and datasets utilized in this paper, select the following links: