This post is partly a continuation of Part 1. Prof Noah Fierer Re: Do single species of plants harbor the same species of microorganisms? where we looked at one of Dr. Fierier's blog post's. In summary, it highlighted the complicating factor in soil biology is we understand little about microbial soil communities, and there is no universal "correct" or "best" microbial community. With this in mind, how do we get value out of DNA sequencing, or really any other biological test? I found a similar train of thought at the Soil Health Innovations Conference that was best summarized by these quote:
"All soil tests are wrong, but only some are useful." -Robin Kootz
This next quote in a recent opinion paper put out by Dr. Fierer, "How Microbes Can and Cannot be Used to Asses Soil Health" I would recommend reading at some point, if you are interested in specifics of how DNA sequencing and soil microbial ecology.
There is no ‘optimal’ soil nor a universal set of ideal soil characteristics . Although certain soil indicators may not always be relevant when trying to assess soil health, including soil texture, bulk density, pH, and organic carbon concentrations, their interpretation will always be highly context.
So what tests are useful? And how do we make them useful if they are "wrong"?
The answer and overarching theme of this post is CONTEXT.
By context, I mean comparing results and looking at similarities, difference, or change. Because there is no "best" soil or microbial community and we still understand so little, providing context is the best way to get value.
In scientific practice, it is generally understood that no test you perform will be perfectly accurate. However, as long as the test is done the same way each time, all the samples should have the same bias. If all the results are wrong the same way, then we can still compare them to each other. This is a huge part of microbial ecology right now... because a lot of it is wrong... but this doesn't mean it isn't useful. I'll talk more later on this in coming posts.
Example 1: Time as context
Sometimes the easiest context to provide for a single producer or system is to simply collect data over time. If you look at the figure below we can see that the more "context" or time points the more meaning we can gain from it.
We see above that early there was likely some kind of disturbance and then the system likely stabilizes. Perhaps the system just started up. However, we would never see this difference properly if a time series was not recorded for long enough or enough regularity.
To determine if this test is useful we could look at the "quality" of the vermicompost at these time points. Was there a noticeable difference at time 3 vs time 1 and 8? If so, I would say this test is worth continuing. If there is no apparent difference in quality or effectiveness, then perhaps we should be looking at a different measurement. This data could be abiotic tests like pH or moisture or a biological test like diversity or bacterial to fungal ratio.
Example 2: Peers as Context
This example is a better reflection of what this current study is, a comparison of "peers" or a "peer group" (credit to Dr. Kristin Veum - ARS for the term). We can justify comparing all of your samples to each other because you all pretty much have the same goal of producing biologically diverse compost for agricultural use. Would it make sense to compare some of the vermicompost properties to agricultural soil? Sure because it could be applied to soils. But what about comparing vermicompost to ocean water? It doesn't really make sense to do that. My point is, you need to be careful about what you are comparing and justify the comparison.
Lets looks at a similar type of example, but instead of over time we are going to add more "peers" or producers of the same product. Instead of time we are going to look at 2 different "tests" or variables, one on each axis. For this example, the axis values or units do not have any real meaning.
Lets start with vermicomposter Bob. Bob by himself, doesn't tell us much.
Bob's vermicomposter friends also have the same tests done and they compare data.
We see that Bob and his friends have different results, but they can't figure out why. So they get more vermicomposters from the whole state to also do the same tests. And they end up with a chart like this.
Now we start to see some clusters. Lets explore those groups of samples more.
It turns out all of the vermicomposters in the orange circle use CFT. The blue circle uses outdoor windrows, and the green circle are bin vermicomposters. But what about Jim? Jim is new to vermicomposting and uses CTF's. His system is producing "poor quality" vermicompost and his worms are struggling. You can start to see from whatever these two tests are, they can be used to now predict vermicomposting style. Additionally we might be able to say something about "quality."
We could do this same analysis but add a third test or variable and create a 3 dimensional plot as well. With enough samples you can start to generate plots that look like this.
Courtesy: https://towardsdatascience.com/dimensionality-reduction-for-data-visualization-pca-vs-tsne-vs-umap-be4aa7b1cb29
I hope this is isn't either too simple or too scary. These are some of the types of analysis we hope to do with your sequencing data. I hope you can start to see that when you begin to explore unknowns, usually the most powerful tool is using the context of what you already know.
Comments