A Scalable Method for Analysis and Display of DNA Sequences

Together with colleagues at Mt. Sinai School of Medicine, we report a new mathematical approach to the genetic structure of biodiversity, using indicator vectors calculated from short DNA sequences. Sirovich L, Stoeckle MY, Zhang Y (2009) A Scalable Method for Analysis and Display of DNA Sequences. PLoS ONE 4(10): e7051. This method is scalable to the largest datasets envisioned in this field and provides a macroscopic view of “biodiversity space”. It offers a complement to tree-building techniques and could enable automated classification at various taxonomic levels.

From the Abstract:

The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA data.

To download zip files containing MatLab code and datasets utilized in this paper, select the following links: