This post will argue that including technical replicates – repeated measurements on identical samples – is worthwhile in NGS experiments, and it will guide you through one way to use them productively. In that endeavor, we will look at a human 16S dataset to get an overall sense of the scale of variability between replicates, and then we will explore how this variability may affect common analyses and what we can do about it. This is a fairly long post, so it will be posted in three parts: in the first part, we will examine measurement error in an important dataset; in the second, we will explore how this error may impact inferential analyses; and in the third, we will present a solution that partially(!) addresses measurement error in this context.
I was grateful to be invited to Braunschweig as a keynote speaker at the Thunen Symposium on Soil Metagenomics this week. It was a real privilege to have this opportunity to get more up-to-date with many frontiers in soil science and to be brought into the discussion on analysis.
Nevertheless, I heard a lot of frustration from (often junior, often from non-EU or -US institutions) folks with trying to navigate which of the myriad tools to use for analysing sequencing data. I can talk for hours and write hundreds of pages about what I think are the best tools for relative abundance, diversity, network estimation & pangenomic analysis, but I think basic principles are more important. This post is intended to lay out some of these principles...