A fast approximation to emuFit from radEmu.

fastEmuFit(
  reference_set = "data_driven",
  reference_set_size = 50,
  reference_set_covariate = NULL,
  Y,
  X = NULL,
  formula = NULL,
  data = NULL,
  test_kj = NULL,
  cluster = NULL,
  penalize = TRUE,
  B = NULL,
  fitted_model = NULL,
  refit = TRUE,
  return_wald_p = FALSE,
  compute_cis = TRUE,
  verbose = FALSE,
  ...
)

Arguments

reference_set

The reference set to use in the identifiability constraint. The user can input a reference set as a vector of numbers that represent indices for columns of the Y matrix, or names that correspond with column names of the Y matrix. If a reference set is not provided, by default, this is set to data_driven, and fastEmuFit will identify a reference set of typical taxa of size reference_set_size. If data_driven_ss or data_driven_thin, a data-driven reference set will be determined using sample splitting or Poisson thinning respectively.

reference_set_size

The size of the reference set if it is data-driven, default is set to 50. We recommend a reference set of size 30-100 for the best balance of computational efficiency and estimation precision.

reference_set_covariate

If the reference set is data-driven, which covariates should it be chosen relative to. By default, this will be all covariates in the model (ignoring the intercept). However, if a model includes a main covariate that will be tested and several precision variables, we recommend choosing the reference set with respect to the main covariate of interest. This argument should be a vector of numbers that correspond to column indices in the X design matrix. If you don't know which columns correspond to each covariate in your design matrix, run the function radEmu::make_design_matrix() to see the design matrix for your model.

Y

an n x J matrix or dataframe of nonnegative observations, or a phyloseq or TreeSummarizedExperiment object containing an otu table and sample data.

X

an n x p matrix or dataframe of covariates (optional, either include X or formula and data)

formula

a one-sided formula specifying the form of the mean model to be fit

data

an n x p data frame containing variables given in formula

test_kj

a data frame whose rows give coordinates (in category j and covariate k) of elements of B to construct hypothesis tests for. If test_kj is not provided, all elements of B save the intercept row will be tested.

cluster

a numeric vector giving cluster membership for each row of Y to be used in computing GEE test statistics. Default is NULL, in which case rows of Y are treated as independent.

penalize

logical: should Firth penalty be used in fitting model? Default is TRUE.

B

starting value of coefficient matrix (p x J). If not provided, B will be initiated as a zero matrix.

fitted_model

a fitted model produced by a call to fastEmu::fastEmuFit or radEmu::emuFit; to be provided if score tests are to be run without refitting the full unrestricted model. Default is NULL.

refit

logical: if B or fitted_model is provided, should full model be fit (TRUE) or should fitting step be skipped (FALSE), e.g., if score tests are to be run on an already fitted model. Default is TRUE.

return_wald_p

logical: return p-values from Wald tests? Default is FALSE. These can only be returned if estimate_full_model is TRUE.

compute_cis

logical: compute and return Wald CIs? Default is TRUE. These can only be returned if estimate_full_model is TRUE.

verbose

provide updates as model is being fitted and score tests are run? Defaults to FALSE.

...

Additional arguments to radEmu:::emuFit. See possible arguments with ?radEmu::emuFit.

Value

A list that includes all elements of an emuFit object from radEmu::emuFit(), as well as additional elements. See the documentation in ?radEmu::emuFit for a full description of the elements in an emuFit object.The emuFit object includes the matrix coef, which provides estimates for all parameters and score statistics and p-values for all parameters that were tested. The returned object also includes reference_set and reference_set_names, which give the indices of the reference set in terms of columns of the Y matrix and category names respectively, of the categories (taxa) that were used as a reference set of "typical taxa" for the identifiability constraint. Other elements of the list correspond to score tests. included_categories gives the set of categories used for the reduced model for each score test, score_test_hyperparams provides the hyperparameters related to estimation under the null hypothesis for each score test. If return_null_B or return_score_components were set to TRUE, then null_B or score_components will also be returned, which respectively give the estimated B values under the null hypothesis and the components of the robust score test that are run, for each score test.