Indicator Vector and TreeParser Software for Generating Klee diagrams
Klee diagrams generated by indicator vector analysis are alternatives to trees for representing nucleotide sequence alignments (Sirovich et al 2009, 2010). Indicator vectors are digital representations of nucleotide sequences in vector space, and can represent single sequences or sets of sequences. Indicator vector correlations, which are similar but not identical to distances, are displayed in a Klee diagram, a heat map of the indicator vector correlation matrix. The indicator vector/Klee method is computationally and visually scalable to large sets of aligned sequences--it readily displays patterns among 10K sequences in a single-page figure, for example.
What follows is a step-by-step protocol for applying indicator vector analysis to a set of aligned nucleotide sequences.
The order of sequences in an alignment does not affect indicator vector calculations. However, for the resulting Klee diagram, it is useful to arrange the sequences to approximate evolutionary relationships. Here we provide a web-based program, TreeParser, that arranges aligned sequences to follow the order of terminals in a phylogenetic tree. TreeParser input files are 1) an aligned fasta file and 2) a Newick tree file in text format, and the output is a re-ordered .fas file. The example below utilizes a neighbor-joining (NJ) tree generated in MEGA as the template for ordering sequences.
Sirovich L, Stoeckle MY, Zhang Y (2009) A scalable method for analysis and display of DNA sequences. PLoS ONE 4:e7051.
Sirovich L, Stoeckle MY, Zhang Y (2010) Structural analysis of biodiversity. PLoS ONE 5:e9266.
Step 1: Run FASTA file through MEGA to create a phylogenetic tree. Under File menu, select "Write Tree in Table Format", then save output as text file.
Step 2: Upload your FASTA file and the MEGA tree text file to TreeParser software using windows below. (Large files may take a few minutes to process)