Introduction

paramedic is a R package for estimating microbial concentration. paramedic uses information from 16S count data (compositional data on all taxa) and absolute data on a subset of taxa (e.g., qPCR or flow cytometry) to estimate the absolute abundance of all taxa. The method accounts for differing taxon detection efficiencies between the two methods, and produces prediction and confidence intervals as well as point estimates of the absolute abundance of all taxa.

The author and maintainer of the paramedic package is Brian Williamson. For details on the method, check out our paper.

Installation

Currently, the package may only be downloaded and installed from GitHub using the devtools package. Type the following commands in your R console to install devtools and paramedic as needed:

# install.packages("devtools")
# devtools::install_github("statdivlab/paramedic@main")

Quick Start

This section should serve as a quick guide to using the paramedic package – we will cover the main functions using a data example.

First, load the paramedic and rstan packages:

## Warning: package 'rstan' was built under R version 4.3.2
## Loading required package: StanHeaders
## Warning: package 'StanHeaders' was built under R version 4.3.2
## 
## rstan version 2.32.5 (Stan version 2.32.2)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
## Loading required package: Rcpp
## paramedic version 0.1.3.5: Predicting Absolute and Relative Abundance by Modeling
## Efficiency to Derive Intervals and Concentrations
## Package created on 2023-12-26

Next, we load in some example data:

data(example_16S_data)
data(example_qPCR_data)

example_16S_data is a matrix of q = 433 taxa observed from 16S sequencing in n = 20 samples. We do not have primers to observe the concentration of all taxa, but we can do this for q_obs = 7 taxa. The observed concentration of the 7 taxa in the 20 samples (in 16S gene copies per unit volume) is in example_qPCR_data. These data mimic data observed by the University of Washington (UW) Sexually Transmitted Infections Cooperative Research Center (STICRC).

We are now ready to estimate concentrations. We pass the data to run_paramedic, the workhorse function of the package:

## -------------------------------------------------------------
## Obtain estimates of concentration for the first seven taxa
## -------------------------------------------------------------
## this is a small number of iterations, only for illustration
## also, shows how to use control parameters for rstan::stan
mod <- paramedic::run_paramedic(W = example_16S_data[, 1:8], V = example_qPCR_data,
    n_iter = 30, n_burnin = 25, n_chains = 1, stan_seed = 4747,
    control = list(max_treedepth = 15))
## 
## SAMPLING FOR MODEL 'variable_efficiency' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000154 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.54 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 3
## Chain 1:            adapt_window = 20
## Chain 1:            term_buffer = 2
## Chain 1: 
## Chain 1: Iteration:  1 / 30 [  3%]  (Warmup)
## Chain 1: Iteration:  3 / 30 [ 10%]  (Warmup)
## Chain 1: Iteration:  6 / 30 [ 20%]  (Warmup)
## Chain 1: Iteration:  9 / 30 [ 30%]  (Warmup)
## Chain 1: Iteration: 12 / 30 [ 40%]  (Warmup)
## Chain 1: Iteration: 15 / 30 [ 50%]  (Warmup)
## Chain 1: Iteration: 18 / 30 [ 60%]  (Warmup)
## Chain 1: Iteration: 21 / 30 [ 70%]  (Warmup)
## Chain 1: Iteration: 24 / 30 [ 80%]  (Warmup)
## Chain 1: Iteration: 26 / 30 [ 86%]  (Sampling)
## Chain 1: Iteration: 28 / 30 [ 93%]  (Sampling)
## Chain 1: Iteration: 30 / 30 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.019 seconds (Warm-up)
## Chain 1:                0.002 seconds (Sampling)
## Chain 1:                0.021 seconds (Total)
## Chain 1:
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## https://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.9, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
mod_summ <- summary(mod$stan_fit, probs = c(0.025, 0.975))$summary
mod_samps <- extract(mod$stan_fit)

The output from run_paramedic may be manipulated as any output from rstan::stan might; however, there are certain posterior summaries that are of particular interest in this problem, namely the posterior mean estimates of concentrations and efficiencies, along with interval estimates for these quantities. The next function extracts these point and interval estimators:

## -------------------------------------------------------------
## Extract posterior estimates for the first three taxa
## -------------------------------------------------------------
summs <- paramedic::extract_posterior_summaries(stan_mod = mod_summ,
    stan_samps = mod_samps,
    taxa_of_interest = 1:3, mult_num = 1, level = 0.95,
    interval_type = "wald")

and we may finally print them out:

## -------------------------------------------------------------
## Print out the posterior mean efficiencies and concentrations
## -------------------------------------------------------------
summs$estimates
##            [,1]         [,2]         [,3]
## 1  3.332132e-01 3.689357e+02 2.106022e-01
## 2  1.298799e-01 2.284346e+02 6.373446e-02
## 3  1.555831e+00 1.455176e+02 1.036781e+00
## 4  2.461307e+01 2.225623e-03 1.729249e-02
## 5  7.215584e+01 4.040974e+01 7.283377e-01
## 6  2.593184e+02 1.042068e+00 2.322630e-02
## 7  4.024012e+01 8.902963e-03 2.728638e-02
## 8  1.487867e-03 2.225860e-03 1.564062e+00
## 9  4.014350e-01 7.589127e+03 1.876758e-03
## 10 2.479431e-04 2.225652e-03 1.943784e+00
## 11 1.200725e+02 3.636011e+02 6.481326e+00
## 12 3.687596e+02 3.899757e+01 8.026380e+00
## 13 3.305934e-04 8.645593e+03 2.069066e-05
## 14 4.439689e-02 2.225763e-03 3.342122e+00
## 15 1.652900e-04 4.451806e-03 3.650459e+00
## 16 7.306072e+00 4.452018e-03 1.588019e+00
## 17 1.821682e+02 8.459541e-02 4.385349e-01
## 18 2.479394e-04 2.225623e-03 1.193138e-06
## 19 3.305873e-04 2.225753e-03 1.112984e+00
## 20 5.347797e-01 4.451798e-03 2.236127e+00
summs$est_efficiency
##      e[1]      e[2]      e[3]      e[4]      e[5]      e[6]      e[7]   sigma_e 
## 0.1815300 0.1583407 0.1388907 0.5119174 0.2135972 0.1740900 1.7257625 3.2414754

This output shows that we have point estimates for the true concentration of each taxon within each participant, and the efficiencies of each taxon comparing br16S to qPCR.

Specifying different hierarchical models

For details on the hierarchical modelling options available in paramedic, see the vignette on hierarchical model specificaion.

Running analyses in parallel

RStan supports parallelization, which we can use within paramedic to speed up computation. In the following example, we re-analyze the data referenced above, but run the models in parallel. Depending on the number of cores on your machine, this can result in significant reductions in computation time.

# parallel::detectCores() will run with multiple cores if available
# follow the steps in the RStan documentation
# (https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started#how-to-use-rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
## -------------------------------------------------------------
## Obtain estimates of concentration for the first seven taxa
## -------------------------------------------------------------
## this is a small number of iterations, only for illustration
## also, shows how to use control parameters for rstan::stan
mod <- paramedic::run_paramedic(W = example_16S_data[, 1:8], V = example_qPCR_data,
    n_iter = 30, n_burnin = 25, n_chains = 1, stan_seed = 4747,
    control = list(max_treedepth = 15))
## 
## SAMPLING FOR MODEL 'variable_efficiency' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000174 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.74 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 3
## Chain 1:            adapt_window = 20
## Chain 1:            term_buffer = 2
## Chain 1: 
## Chain 1: Iteration:  1 / 30 [  3%]  (Warmup)
## Chain 1: Iteration:  3 / 30 [ 10%]  (Warmup)
## Chain 1: Iteration:  6 / 30 [ 20%]  (Warmup)
## Chain 1: Iteration:  9 / 30 [ 30%]  (Warmup)
## Chain 1: Iteration: 12 / 30 [ 40%]  (Warmup)
## Chain 1: Iteration: 15 / 30 [ 50%]  (Warmup)
## Chain 1: Iteration: 18 / 30 [ 60%]  (Warmup)
## Chain 1: Iteration: 21 / 30 [ 70%]  (Warmup)
## Chain 1: Iteration: 24 / 30 [ 80%]  (Warmup)
## Chain 1: Iteration: 26 / 30 [ 86%]  (Sampling)
## Chain 1: Iteration: 28 / 30 [ 93%]  (Sampling)
## Chain 1: Iteration: 30 / 30 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.019 seconds (Warm-up)
## Chain 1:                0.002 seconds (Sampling)
## Chain 1:                0.021 seconds (Total)
## Chain 1:
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## https://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.9, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
mod_summ <- summary(mod$stan_fit, probs = c(0.025, 0.975))$summary
mod_samps <- extract(mod$stan_fit)