R/predict_paramedic.R
predict_paramedic.Rd
Predict concentrations (and efficiencies, if applicable), absolute abundances, and relative abundances based on the posterior distributions from a previously fitted model resulting from a call to run_paramedic
.
predict_paramedic(
W,
V,
X = V[, 1, drop = FALSE],
beta_0,
beta_1 = array(0, dim = c(nrow(beta_0), ncol(X) - 1, ncol(W) - 1)),
Sigma,
sigma_e = matrix(0, nrow = nrow(Sigma), ncol = ncol(Sigma)),
phi = matrix(1, nrow = nrow(Sigma), ncol = nrow(X)),
k = 0,
n_iter = 10500,
n_burnin = 10000,
n_chains = 4,
stan_seed = 4747,
alpha_sigma = 2,
kappa_sigma = 1,
alpha_phi = 0,
beta_phi = 0,
sigma_xi = 1,
...
)
The new relative abundance data, e.g., from broad range 16S sequencing with "universal" primers. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W and V, and the column must have the same name in W and V.
The new absolute abundance data, e.g., from taxon-specific absolute primers. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W and V, and the column must have the same name in W and V.
The new covariate data. Expects data (e.g., matrix, data.frame, tibble) with sample identifiers in the first column. Sample identifiers must be the same between W, V, and X, and the column must have the same name in W, V, and X. If X only consists of the subject identifiers, then no covariates are used.
The posterior distribution on \(\beta_0\) (the intercept in the distribution on concentrations \(\mu\)), resulting from a call to run_paramedic
.
The posterior distribution on \(\beta_1\) (the slope in the distribution on concentrations \(\mu\); only used if X consists of more than just sample identifiers), resulting from a call to run_paramedic
.
The posterior distribution on \(\Sigma\) (the variance in the distribution on concentrations \(\mu\)).
The posterior distribution on \(\sigma_e\) (the variance of the efficiencies); only used if both alpha_sigma
and kappa_sigma
below are nonzero.
The posterior distribution on \(\phi\) (the overdispersion parameter on V); only used if both alpha_phi
and beta_phi
below are nonzero.
the number of batches that the relative abundance data W were analyzed in. If k = 0 (the default), then batch effects are not considered. (currently not used)
The total number of iterations per chain to be run by the Stan algorithm. Defaults to 10500.
The total number of warmups per chain to be run by the Stan algorithm. Defaults to 10000.
The total number of chains to be run by the Stan algorithm. Defaults to 4.
The random number seed to initialize.
Hyperparameter specifying the shape parameter of the prior distribution on \(\sigma_e\). Defaults to 2.
Hyperparameter specifying the scale parameter of the prior distribution on \(\sigma_e\). Defaults to 1.
Hyperparameter specifying the shape parameter of the prior distribution on \(\phi\). Defaults to 0; a negative binomial model can be specified if both alpha_phi
and beta_phi
are nonzero.
Hyperparameter specifying the rate parameter of the prior distribution on \(\phi\). Defaults to 0; a negative binomial model can be specified if both alpha_phi
and beta_phi
are nonzero.
Hyperparameters specifying the variance of efficiencies over batches. Only used if k
is greater than zero. Defaults to 1. (currently not used)
other arguments to pass to sampling
(e.g., control).
An object of class stanfit
.
Using the posterior distributions from a call to run_paramedic
, generate predicted values.
stan
and sampling
for specific usage of the stan
and sampling
functions; run_paramedic
for usage of the run_paramedic
function.
# load the package, read in example data
library("paramedic")
data(example_16S_data)
data(example_qPCR_data)
# generate a train/test split
set.seed(1234)
folds <- sample(1:2, size = nrow(example_16S_data), replace = TRUE)
# run paramedic (with an extremely small number of iterations, for illustration only)
# on only the first 10 taxa, and on a subset of the data
mod <- run_paramedic(W = example_16S_data[folds == 1, 1:10],
V = example_qPCR_data[folds == 1, ], n_iter = 30, n_burnin = 25,
n_chains = 1, stan_seed = 4747)
#>
#> SAMPLING FOR MODEL 'variable_efficiency' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 5.3e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.53 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: There aren't enough warmup iterations to fit the
#> Chain 1: three stages of adaptation as currently configured.
#> Chain 1: Reducing each adaptation stage to 15%/75%/10% of
#> Chain 1: the given number of warmup iterations:
#> Chain 1: init_buffer = 3
#> Chain 1: adapt_window = 20
#> Chain 1: term_buffer = 2
#> Chain 1:
#> Chain 1: Iteration: 1 / 30 [ 3%] (Warmup)
#> Chain 1: Iteration: 3 / 30 [ 10%] (Warmup)
#> Chain 1: Iteration: 6 / 30 [ 20%] (Warmup)
#> Chain 1: Iteration: 9 / 30 [ 30%] (Warmup)
#> Chain 1: Iteration: 12 / 30 [ 40%] (Warmup)
#> Chain 1: Iteration: 15 / 30 [ 50%] (Warmup)
#> Chain 1: Iteration: 18 / 30 [ 60%] (Warmup)
#> Chain 1: Iteration: 21 / 30 [ 70%] (Warmup)
#> Chain 1: Iteration: 24 / 30 [ 80%] (Warmup)
#> Chain 1: Iteration: 26 / 30 [ 86%] (Sampling)
#> Chain 1: Iteration: 28 / 30 [ 93%] (Sampling)
#> Chain 1: Iteration: 30 / 30 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.009 seconds (Warm-up)
#> Chain 1: 0.002 seconds (Sampling)
#> Chain 1: 0.011 seconds (Total)
#> Chain 1:
#> Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
#> https://mc-stan.org/misc/warnings.html#bfmi-low
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: The largest R-hat is 1.9, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
# generate predictions on test data
ext_mod <- rstan::extract(mod)
#> Error in (function (classes, fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) return(methods[[1L]]) else if (length(methods) == 0L) { cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) } else stop("Internal error in finding inherited methods; didn't return a unique method", domain = NA)})(list("paramedic"), new("nonstandardGenericFunction", .Data = function (object, ...) { standardGeneric("extract")}, generic = structure("extract", package = "rstan"), package = "rstan", group = list(), valueClass = character(0), signature = "object", default = NULL, skeleton = (function (object, ...) stop(gettextf("invalid call in method dispatch to '%s' (no default method)", "extract"), domain = NA))(object, ...)), <environment>): unable to find an inherited method for function 'extract' for signature '"paramedic"'
beta0_post <- ext_mod$beta_0
#> Error in eval(expr, envir, enclos): object 'ext_mod' not found
sigma_post <- ext_mod$Sigma
#> Error in eval(expr, envir, enclos): object 'ext_mod' not found
sigmae_post <- ext_mod$sigma_e
#> Error in eval(expr, envir, enclos): object 'ext_mod' not found
pred_mod <- predict_paramedic(W = example_16S_data[folds == 2, 1:10],
V = example_qPCR_data[folds == 2, ], n_iter = 75, n_burnin = 25,
beta_0 = beta0_post, Sigma = sigma_post, sigma_e = sigmae_post,
n_chains = 1, stan_seed = 1234)
#> Error in eval(expr, envir, enclos): object 'sigma_post' not found