Introduction

paramedic is a R package for estimating microbial concentration. paramedic uses information from 16S count data (compositional data on all taxa) and absolute data on a subset of taxa (e.g., qPCR or flow cytometry) to estimate the absolute abundance of all taxa. The method accounts for differing taxon detection efficiencies between the two methods, and produces prediction and confidence intervals as well as point estimates of the absolute abundance of all taxa.

The author and maintainer of the paramedic package is Brian Williamson. For details on the method, check out our preprint.

Installation

Currently, the package may only be downloaded and installed from GitHub using the devtools package. Type the following commands in your R console to install devtools and paramedic as needed:

# install.packages("devtools")
# devtools::install_github("statdivlab/paramedic@main")

Quick Start

This section should serve as a quick guide to using the paramedic package – we will cover the main functions using a data example.

First, load the paramedic and rstan packages:

library("rstan")
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
library("paramedic")
## Loading required package: Rcpp
## paramedic version 0.1.3.5: Predicting Absolute and Relative Abundance by Modeling Efficiency to Derive Intervals and Concentrations
## Package created on 2021-06-28

Next, we load in some example data:

data(example_16S_data)
data(example_qPCR_data)

example_16S_data is a matrix of q = 433 taxa observed from 16S sequencing in n = 20 samples. We do not have primers to observe the concentration of all taxa, but we can do this for q_obs = 7 taxa. The observed concentration of the 7 taxa in the 20 samples (in 16S gene copies per unit volume) is in example_qPCR_data. These data mimic data observed by the University of Washington (UW) Sexually Transmitted Infections Cooperative Research Center (STICRC).

We are now ready to estimate concentrations. We pass the data to run_paramedic, the workhorse function of the package:

## -------------------------------------------------------------
## Obtain estimates of concentration for the first seven taxa
## -------------------------------------------------------------
## this is a small number of iterations, only for illustration
## also, shows how to use control parameters for rstan::stan
mod <- paramedic::run_paramedic(W = example_16S_data[, 1:8], V = example_qPCR_data,
    n_iter = 30, n_burnin = 25, n_chains = 1, stan_seed = 4747,
    control = list(max_treedepth = 15))
## 
## SAMPLING FOR MODEL 'variable_efficiency' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 8.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.85 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 3
## Chain 1:            adapt_window = 20
## Chain 1:            term_buffer = 2
## Chain 1: 
## Chain 1: Iteration:  1 / 30 [  3%]  (Warmup)
## Chain 1: Iteration:  3 / 30 [ 10%]  (Warmup)
## Chain 1: Iteration:  6 / 30 [ 20%]  (Warmup)
## Chain 1: Iteration:  9 / 30 [ 30%]  (Warmup)
## Chain 1: Iteration: 12 / 30 [ 40%]  (Warmup)
## Chain 1: Iteration: 15 / 30 [ 50%]  (Warmup)
## Chain 1: Iteration: 18 / 30 [ 60%]  (Warmup)
## Chain 1: Iteration: 21 / 30 [ 70%]  (Warmup)
## Chain 1: Iteration: 24 / 30 [ 80%]  (Warmup)
## Chain 1: Iteration: 26 / 30 [ 86%]  (Sampling)
## Chain 1: Iteration: 28 / 30 [ 93%]  (Sampling)
## Chain 1: Iteration: 30 / 30 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.010518 seconds (Warm-up)
## Chain 1:                0.001636 seconds (Sampling)
## Chain 1:                0.012154 seconds (Total)
## Chain 1:
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.9, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat
mod_summ <- summary(mod$stan_fit, probs = c(0.025, 0.975))$summary
mod_samps <- extract(mod$stan_fit)

The output from run_paramedic may be manipulated as any output from rstan::stan might; however, there are certain posterior summaries that are of particular interest in this problem, namely the posterior mean estimates of concentrations and efficiencies, along with interval estimates for these quantities. The next function extracts these point and interval estimators:

## -------------------------------------------------------------
## Extract posterior estimates for the first three taxa
## -------------------------------------------------------------
summs <- paramedic::extract_posterior_summaries(stan_mod = mod_summ,
    stan_samps = mod_samps,
    taxa_of_interest = 1:3, mult_num = 1, level = 0.95,
    interval_type = "wald")

and we may finally print them out:

## -------------------------------------------------------------
## Print out the posterior mean efficiencies and concentrations
## -------------------------------------------------------------
summs$estimates
##            [,1]         [,2]         [,3]
## 1  3.332132e-01 3.689357e+02 2.106022e-01
## 2  1.298799e-01 2.284346e+02 6.373446e-02
## 3  1.555831e+00 1.455176e+02 1.036781e+00
## 4  2.461307e+01 2.225623e-03 1.729249e-02
## 5  7.215584e+01 4.040974e+01 7.283377e-01
## 6  2.593184e+02 1.042068e+00 2.322630e-02
## 7  4.024012e+01 8.902963e-03 2.728638e-02
## 8  1.487867e-03 2.225860e-03 1.564062e+00
## 9  4.014350e-01 7.589127e+03 1.876758e-03
## 10 2.479431e-04 2.225652e-03 1.943784e+00
## 11 1.200725e+02 3.636011e+02 6.481326e+00
## 12 3.687596e+02 3.899757e+01 8.026380e+00
## 13 3.305934e-04 8.645593e+03 2.069066e-05
## 14 4.439689e-02 2.225763e-03 3.342122e+00
## 15 1.652900e-04 4.451806e-03 3.650459e+00
## 16 7.306072e+00 4.452018e-03 1.588019e+00
## 17 1.821682e+02 8.459541e-02 4.385349e-01
## 18 2.479394e-04 2.225623e-03 1.193138e-06
## 19 3.305873e-04 2.225753e-03 1.112984e+00
## 20 5.347797e-01 4.451798e-03 2.236127e+00
summs$est_efficiency
##      e[1]      e[2]      e[3]      e[4]      e[5]      e[6]      e[7]   sigma_e 
## 0.1815300 0.1583407 0.1388907 0.5119174 0.2135972 0.1740900 1.7257625 3.2414754

This output shows that we have point estimates for the true concentration of each taxon within each participant, and the efficiencies of each taxon comparing br16S to qPCR.

Specifying different hierarchical models

For details on the hierarchical modelling options available in paramedic, see the vignette on hierarchical model specificaion.