Together with colleagues at Mt. Sinai School of Medicine, we report a new mathematical approach to the genetic structure of biodiversity, using indicator vectors calculated from short DNA sequences. Sirovich L, Stoeckle MY, Zhang Y (2009) A Scalable Method for Analysis and Display of DNA Sequences. PLoS ONE 4(10): e7051. This method is scalable to the largest datasets envisioned in this field and provides a macroscopic view of “biodiversity space”. It offers a complement to tree-building techniques and could enable automated classification at various taxonomic levels.
From the Abstract:
The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA data.
To download zip files containing MatLab code and datasets utilized in this paper, select the following links:
- PLoS_ThreeGroups_v1_2.zip (updated March 2010)
- PLoS_Bird_Analysis_v1_2.zip (updated March 2010)