New code for analyzing eDNA sequences using DADA2 pipeline

For the past year MIchael Coogan, now a grad student in marine science at the U. of New Hampshire, has helped Mark Stoeckle and PHE with improved software for our eDNA studies. See summary below. A pdf of the R code is available here. If you have questions, please write to Mark.

The goal is to adapt the DADA2 pipeline to Mark Stoeckle’s 12S experiment. Sample sequences will be identified using 12S reference file containing sequences of 262 unique vertebrates found around New York. The starting point is a set of Illumina-sequenced paired-end fastq files that have been split (or demultiplexed) by sample and which have barcodes/adapters already removed. The end product will be a sequence table, analogous to the ubiquitous “OTU table”, which records the number of times sample sequences were observed in each sample. The key difference between the output of DADA2 and standard OTU analyses is that DADA2 infers sample sequences exactly rather than clustering sequences into fuzzy OTUs which hide and complicate biological variation.