emuFit
from radEmu
.fastEmuFit.Rd
A fast approximation to emuFit
from radEmu
.
fastEmuFit(
reference_set = "data_driven",
reference_set_size = 50,
reference_set_covariate = NULL,
Y,
X = NULL,
formula = NULL,
data = NULL,
test_kj = NULL,
cluster = NULL,
penalize = TRUE,
B = NULL,
fitted_model = NULL,
refit = TRUE,
return_wald_p = FALSE,
compute_cis = TRUE,
verbose = FALSE,
...
)
The reference set to use in the identifiability constraint.
The user can input a reference set as a vector of numbers that represent indices
for columns of the Y
matrix, or names that correspond with column names of
the Y
matrix. If a reference set is not provided, by default, this is set
to data_driven
, and fastEmuFit
will identify a reference set of typical
taxa of size reference_set_size
. If data_driven_ss
or
data_driven_thin
, a data-driven reference set will be determined using sample
splitting or Poisson thinning respectively.
The size of the reference set if it is data-driven, default
is set to 50
. We recommend a reference set of size 30-100 for the best balance
of computational efficiency and estimation precision.
If the reference set is data-driven, which covariates should
it be chosen relative to. By default, this will be all covariates in the model (ignoring
the intercept). However, if a model includes a main covariate that will be tested and several
precision variables, we recommend choosing the reference set with respect to the main
covariate of interest. This argument should be a vector of numbers that correspond to
column indices in the X
design matrix. If you don't know which columns correspond to
each covariate in your design matrix, run the function radEmu::make_design_matrix()
to
see the design matrix for your model.
an n x J matrix or dataframe of nonnegative observations, or a phyloseq
or TreeSummarizedExperiment
object containing an otu table and sample data.
an n x p matrix or dataframe of covariates (optional, either include X
or formula
and data
)
a one-sided formula specifying the form of the mean model to be fit
an n x p data frame containing variables given in formula
a data frame whose rows give coordinates (in category j and
covariate k) of elements of B to construct hypothesis tests for. If test_kj
is not provided, all elements of B save the intercept row will be tested.
a numeric vector giving cluster membership for each row of Y to be used in computing GEE test statistics. Default is NULL, in which case rows of Y are treated as independent.
logical: should Firth penalty be used in fitting model? Default is TRUE.
starting value of coefficient matrix (p x J). If not provided, B will be initiated as a zero matrix.
a fitted model produced by a call to fastEmu::fastEmuFit or radEmu::emuFit; to be provided if score tests are to be run without refitting the full unrestricted model. Default is NULL.
logical: if B or fitted_model is provided, should full model be fit (TRUE) or should fitting step be skipped (FALSE), e.g., if score tests are to be run on an already fitted model. Default is TRUE.
logical: return p-values from Wald tests? Default is FALSE. These can only be
returned if estimate_full_model
is TRUE.
logical: compute and return Wald CIs? Default is TRUE. These can only be
returned if estimate_full_model
is TRUE.
provide updates as model is being fitted and score tests are run? Defaults to FALSE.
Additional arguments to radEmu:::emuFit. See possible arguments with ?radEmu::emuFit
.
A list that includes all elements of an emuFit
object from radEmu::emuFit()
, as
well as additional elements. See the documentation in ?radEmu::emuFit
for a full description of the
elements in an emuFit
object.The emuFit
object includes the matrix coef
, which provides
estimates for all parameters and score statistics and p-values for all parameters that were tested.
The returned object also includes reference_set
and reference_set_names
, which give the
indices of the reference set in terms of columns of the Y
matrix and category names respectively,
of the categories (taxa) that were used as a reference set of "typical taxa" for the identifiability
constraint. Other elements of the list correspond to score tests. included_categories
gives the
set of categories used for the reduced model for each score test, score_test_hyperparams
provides
the hyperparameters related to estimation under the null hypothesis for each score test. If return_null_B
or return_score_components
were set to TRUE
, then null_B
or score_components
will also be returned, which respectively give the estimated B values under the null hypothesis and the
components of the robust score test that are run, for each score test.