A function to choose a data-driven reference set.




The output from radEmu::emuFit when applied to data with a constraint over all taxa.


The size of the reference set if it is data-driven, default is set to 50. We recommend a reference set of size 30-100 for the best balance of computational efficiency and estimation precision.


If the reference set is data-driven, which covariates should it be chosen relative to. By default, this will be all covariates in the model (ignoring the intercept). However, if a model includes a main covariate that will be tested and several precision variables, we recommend choosing the reference set with respect to the main covariate of interest. This argument should be a vector of numbers that correspond to column indices in the X design matrix.


The constraint function (by default the smoothed median).


A list including the set of taxa of size reference_set_size with the smallest absolute L2 norm of estimated log fold-differences for the specified covariate(s), relative to the smoothed median log fold-difference over all taxa. Also a vector of length p (the number of columns in the design matrix) of differences between the constraint function over log fold-differences across all taxa and the constraint function log fold-difference over the reference set. Also the estimated B matrix, with each row shifted according to the constraint over the new reference set.