A fast approximation to emuFit from radEmu.

fastEmuFit(
  reference_set = "data_driven",
  reference_set_size = 30,
  Y,
  X = NULL,
  formula = NULL,
  data = NULL,
  test_kj = NULL,
  cluster = NULL,
  penalize = TRUE,
  B = NULL,
  fitted_model = NULL,
  refit = TRUE,
  fastEmu_refit = FALSE,
  return_wald_p = FALSE,
  compute_cis = TRUE,
  run_score_tests = TRUE,
  verbose = FALSE,
  ...
)

Arguments

reference_set

The reference set to use in the identifiability constraint. The user can input a reference set as a vector of numbers that represent indices for columns of the Y matrix, or names that correspond with column names of the Y matrix. If a reference set is not provided, by default, this is set to data_driven, and fastEmuFit will identify a reference set of typical taxa of size reference_set_size. If data_driven_ss or data_driven_thin, a data-driven reference set will be determined using sample splitting or Poisson thinning respectively. The reference set can either be a single object, or a list of objects of length p, for each row of the beta matrix.

reference_set_size

The size of the reference set if it is data-driven, default is set to 30. We recommend a reference set of size 30-100 for the best balance of computational efficiency and estimation precision.

Y

an n x J matrix or dataframe of nonnegative observations, or a phyloseq or TreeSummarizedExperiment object containing an otu table and sample data.

X

an n x p matrix or dataframe of covariates (optional, either include X or formula and data)

formula

a one-sided formula specifying the form of the mean model to be fit

data

an n x p data frame containing variables given in formula

test_kj

a data frame whose rows give coordinates (in category j and covariate k) of elements of B to construct hypothesis tests for. If test_kj is not provided, all elements of B save the intercept row will be tested.

cluster

a numeric vector giving cluster membership for each row of Y to be used in computing GEE test statistics. Default is NULL, in which case rows of Y are treated as independent.

penalize

logical: should Firth penalty be used in fitting model? Default is TRUE.

B

starting value of coefficient matrix (p x J). If not provided, B will be initiated as a zero matrix.

fitted_model

a fitted model produced by a call to fastEmu::fastEmuFit or radEmu::emuFit; to be provided if score tests are to be run without refitting the full unrestricted model. Default is NULL.

refit

logical: if B or fitted_model is provided, in the radEmu estimation step, should estimation be rerun? Default is TRUE.

fastEmu_refit

logical: if fitted_model is provided that has been produced by a call to fastEmu::fastEmuFit, should estimation and reference set step be skipped (FALSE), e.g. if score tests are to be run on an already fitted fastEmuFit model. Default is FALSE.

return_wald_p

logical: return p-values from Wald tests? Default is FALSE. These can only be returned if estimate_full_model is TRUE.

compute_cis

logical: compute and return Wald CIs? Default is TRUE. These can only be returned if estimate_full_model is TRUE.

run_score_tests

logical: perform robust score testing? Default is TRUE.

verbose

provide updates as model is being fitted and score tests are run? Defaults to FALSE.

...

Additional arguments to radEmu:::emuFit. See possible arguments with ?radEmu::emuFit.

Value

A list that includes all elements of an emuFit object from radEmu::emuFit(), as well as additional elements. See the documentation in ?radEmu::emuFit for a full description of the elements in an emuFit object.The emuFit object includes the matrix coef, which provides estimates for all parameters and score statistics and p-values for all parameters that were tested. The returned object also includes reference_set and reference_set_names, which give the indices of the reference set in terms of columns of the Y matrix and category names respectively, of the categories (taxa) that were used as a reference set of "typical taxa" for the identifiability constraint. Other elements of the list correspond to score tests. included_categories gives the set of categories used for the reduced model for each score test, score_test_hyperparams provides the hyperparameters related to estimation under the null hypothesis for each score test. If return_null_B or return_score_components were set to TRUE, then null_B or score_components will also be returned, which respectively give the estimated B values under the null hypothesis and the components of the robust score test that are run, for each score test.