Run a Stan algorithm to estimate concentrations from combined absolute and relative abundance data

Estimate concentrations (and efficiencies, if applicable) by combining absolute and relative abundance data.

run_paramedic(
  W,
  V,
  X = V[, 1, drop = FALSE],
  k = 0,
  n_iter = 10500,
  n_burnin = 10000,
  n_chains = 4,
  stan_seed = 4747,
  centered = FALSE,
  inits_lst = NULL,
  sigma_beta = sqrt(50),
  sigma_Sigma = sqrt(50),
  alpha_sigma = 2,
  kappa_sigma = 1,
  alpha_phi = 0,
  beta_phi = 0,
  sigma_xi = 1,
  ...
)

Arguments

W: The relative abundance data, e.g., from broad range 16S sequencing with "universal" primers. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W and V, and the column must have the same name in W and V.
V: The absolute abundance data, e.g., from taxon-specific absolute primers. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W and V, and the column must have the same name in W and V.
X: The covariate data. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W, V, and X, and the column must have the same name in W, V, and X. If X only consists of the subject identifiers, then no covariates are used.
k: the number of batches that the relative abundance data W were analyzed in. If k = 0 (the default), then batch effects are not considered.
n_iter: The total number of iterations per chain to be run by the Stan algorithm. Defaults to 10500.
n_burnin: The total number of warmups per chain to be run by the Stan algorithm. Defaults to 10000.
n_chains: The total number of chains to be run by the Stan algorithm. Defaults to 4.
stan_seed: The random number seed to initialize.
centered: If TRUE, uses a centered parameterization on the true concentrations $\mu$; if FALSE (the default), uses a non-centered parameterization.
inits_lst: An optional list of initial values of the parameters. Must be a named list; see stan.
sigma_beta: Hyperparameter specifying the prior variance on $\beta_0$. Defaults to $\sqrt{50}$.
sigma_Sigma: Hyperparameter specifying the prior variance on $\Sigma$. Defaults to $\sqrt{50}$.
alpha_sigma: Hyperparameter specifying the shape parameter of the prior distribution on $\sigma_e$. Defaults to 2.
kappa_sigma: Hyperparameter specifying the scale parameter of the prior distribution on $\sigma_e$. Defaults to 1.
alpha_phi: Hyperparameter specifying the shape parameter of the prior distribution on $\phi$. Defaults to 0; a negative binomial model can be specified if both alpha_phi and beta_phi are nonzero.
beta_phi: Hyperparameter specifying the rate parameter of the prior distribution on $\phi$. Defaults to 0; a negative binomial model can be specified if both alpha_phi and beta_phi are nonzero.
sigma_xi: Hyperparameters specifying the variance of efficiencies over batches. Only used if k is greater than zero. Defaults to 1.
...: other arguments to pass to sampling (e.g., control).

Value

An object of class stanfit.

Details

We fit a hierarchical model in Stan to the data, with goal to estimate true concentration for all taxa. There are two available hierarchical models. The two only differ in the way that the true concentration is parameterized. While these hierarchical models are mathematically identical, they are fit differently by Stan. The first hierarchical model is a "centered" model, where $$\beta_0 ~ N(0, \sqrt{50})$$ $$\log \Sigma ~ N(0, \sqrt{50})$$ $$\log \mu ~ N(\beta_0 + X\beta_1, \Sigma).$$ We call this model variable_efficiency_centered.stan. The second hierarchical model is a "noncentered" model, where $$\beta_0 ~ N(0, \sqrt{50})$$ $$\log \Sigma ~ N(0, \sqrt{50})$$ $$\log \gamma ~ N(0, I)$$ $$\log \mu = \sqrt{\Sigma} \log \gamma + \beta_0 + X\beta_1.$$ We call this model variable_efficiency.stan. In most cases, we suggest using the noncentered model. However, if the model does not converge in a reasonable amount of time using the noncentered model, consider trying the centered model.

Examples

# load the package, read in example data
library("paramedic")
data(example_16S_data)
data(example_qPCR_data)

# run paramedic (with an extremely small number of iterations, for illustration only)
# on only the first 10 taxa
mod <- run_paramedic(W = example_16S_data[, 1:10], V = example_qPCR_data,
n_iter = 30, n_burnin = 25, n_chains = 1, stan_seed = 4747)
#> 
#> SAMPLING FOR MODEL 'variable_efficiency' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.000125 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.25 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: WARNING: There aren't enough warmup iterations to fit the
#> Chain 1:          three stages of adaptation as currently configured.
#> Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
#> Chain 1:          the given number of warmup iterations:
#> Chain 1:            init_buffer = 3
#> Chain 1:            adapt_window = 20
#> Chain 1:            term_buffer = 2
#> Chain 1: 
#> Chain 1: Iteration:  1 / 30 [  3%]  (Warmup)
#> Chain 1: Iteration:  3 / 30 [ 10%]  (Warmup)
#> Chain 1: Iteration:  6 / 30 [ 20%]  (Warmup)
#> Chain 1: Iteration:  9 / 30 [ 30%]  (Warmup)
#> Chain 1: Iteration: 12 / 30 [ 40%]  (Warmup)
#> Chain 1: Iteration: 15 / 30 [ 50%]  (Warmup)
#> Chain 1: Iteration: 18 / 30 [ 60%]  (Warmup)
#> Chain 1: Iteration: 21 / 30 [ 70%]  (Warmup)
#> Chain 1: Iteration: 24 / 30 [ 80%]  (Warmup)
#> Chain 1: Iteration: 26 / 30 [ 86%]  (Sampling)
#> Chain 1: Iteration: 28 / 30 [ 93%]  (Sampling)
#> Chain 1: Iteration: 30 / 30 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.016 seconds (Warm-up)
#> Chain 1:                0.002 seconds (Sampling)
#> Chain 1:                0.018 seconds (Total)
#> Chain 1: 
#> Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
#> https://mc-stan.org/misc/warnings.html#bfmi-low
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: The largest R-hat is 2.5, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat

Run a Stan algorithm to estimate concentrations from combined absolute and relative abundance data

Arguments

Value

Details

See also

Examples