Produces point estimates, interval estimates, and p values for an arbitrary
functional (mean, geometric mean, proportion, odds, hazard) of a
variable of class integer
, or numeric
when
regressed on an arbitrary number of covariates. Multiple Partial F-tests can
be specified using the U
function.
Usage
regress(
fnctl,
formula,
data,
intercept = TRUE,
weights = rep(1, nrow(data.frame(data))),
subset = rep(TRUE, nrow(data.frame(data))),
robustSE = TRUE,
conf.level = 0.95,
exponentiate = fnctl != "mean",
replaceZeroes,
useFdstn = TRUE,
suppress = FALSE,
na.action,
method = "qr",
qr = TRUE,
singular.ok = TRUE,
contrasts = NULL,
init = NULL,
ties = "efron",
offset,
control = list(...),
...
)
Arguments
- fnctl
a character string indicating the functional (summary measure of the distribution) for which inference is desired. Choices include
"mean"
,"geometric mean"
,"odds"
,"rate"
,"hazard"
.- formula
an object of class
formula
as might be passed tolm
,glm
, orcoxph
. Functions of variables, specified usingdummy
orpolynomial
may also be included informula
.- data
a data frame, matrix, or other data structure with matching names to those entered in
formula
.- intercept
a logical value indicating whether a intercept exists or not. Default value is
TRUE
for all functionals. Intercept may also be removed if a "-1" is present informula
. If "-1" is present informula
butintercept = TRUE
is specified, the model will fit without an intercept. Note that whenfnctl = "hazard"
, the intercept is always set toFALSE
because Cox proportional hazards regression models do not explicitly estimate an intercept.- weights
vector indicating optional weights for weighted regression.
- subset
vector indicating a subset to be used for all inference.
- robustSE
a logical indicator that standard errors (and confidence intervals) are to be computed using the Huber-White sandwich estimator. The default is TRUE.
- conf.level
a numeric scalar indicating the level of confidence to be used in computing confidence intervals. The default is 0.95.
- exponentiate
a logical indicator that the regression parameters should be exponentiated. This is by default true for all functionals except the mean.
- replaceZeroes
if not
FALSE
, this indicates a value to be used in place of zeroes when computing a geometric mean. IfTRUE
, a value equal to one-half the lowest nonzero value is used. If a numeric value is supplied, that value is used. Defaults toTRUE
whenfnctl = "geometric mean"
. This parameter is alwaysFALSE
for all other values offnctl
.- useFdstn
a logical indicator that the F distribution should be used for test statistics instead of the chi squared distribution even in logistic regression models. When using the F distribution, the degrees of freedom are taken to be the sample size minus the number of parameters, as it would be in a linear regression model.
- suppress
if
TRUE
, and a model which requires exponentiation (for instance, regression on the geometric mean) is computed, then a table with only the exponentiated coefficients and confidence interval is returned. Otherwise, two tables are returned - one with the original unexponentiated coefficients, and one with the exponentiated coefficients.- na.action, qr, singular.ok, offset, contrasts, control
optional arguments that are passed to the functionality of
lm
orglm
.- method
the method to be used in fitting the model. The default value for
fnctl = "mean"
andfnctl = "geometric mean"
is"qr"
, and the default value forfnctl = "odds"
andfnctl = "rate"
is"glm.fit"
. This argument is passed into the lm() or glm() function, respectively. You may optionally specifymethod = "model.frame"
, which returns the model frame and does no fitting.- init
a numeric vector of initial values for the regression parameters for the hazard regression. Default initial value is zero for all variables.
- ties
a character string describing method for breaking ties in hazard regression. Only
efron
,breslow
, orexact
is accepted. See more details in the documentation for this argument in the survival::coxph function. Default toefron
.- ...
additional arguments to be passed to the
lm
function call
Value
An object of class uRegress is returned. Parameter estimates, confidence intervals, and p values are contained in a matrix $augCoefficients.
Details
Regression models include linear regression (for the ``mean'' functional), logistic regression with logit link (for the ``odds'' functional), Poisson regression with log link (for the ``rate'' functional), linear regression of a log-transformed outcome (for the ``geometric mean'' functional), and Cox proportional hazards regression (for the hazard functional).
Currently, for the hazard functional, only `coxph` syntax is supported; in other words, using `dummy`, `polynomial`,
and U
functions will result in an error when `fnctl = hazard`.
Note that the only possible link function in `regress` with `fnctl = odds"` is the logit link. Similarly, the only possible link function in `regress` with `fnctl = "rate"` is the log link.
Objects created using the
U
function can also be passed in. If the
U
call involves a partial formula of the form
~ var1 + var2
, then regress
will return a multiple-partial
F-test involving var1
and var2
. If an F-statistic will already be
calculated regardless of the U
specification,
then any naming convention specified via name ~ var1
will be ignored.
The multiple partial tests must be the last terms specified in the model (i.e. no other predictors can
follow them).
Examples
# Loading dataset
data(mri)
# Linear regression of atrophy on age
regress("mean", atrophy ~ age, data = mri)
#>
#> Call:
#> regress(fnctl = "mean", formula = atrophy ~ age, data = mri)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -36.870 -8.589 -0.870 7.666 51.203
#>
#> Coefficients:
#> Estimate Naive SE Robust SE 95%L 95%H
#> [1] Intercept -16.06 6.256 6.701 -29.22 -2.907
#> [2] age 0.6980 0.08368 0.09002 0.5213 0.8747
#> F stat df Pr(>F)
#> [1] Intercept 5.75 1 0.0168
#> [2] age 60.12 1 < 0.00005
#>
#> Residual standard error: 12.36 on 733 degrees of freedom
#> Multiple R-squared: 0.08669, Adjusted R-squared: 0.08545
#> F-statistic: 60.12 on 1 and 733 DF, p-value: 2.988e-14
#>
# Linear regression of atrophy on sex and height and their interaction,
# with a multiple-partial F-test on the height-sex interaction
regress("mean", atrophy ~ height + sex + U(hs=~height:sex), data = mri)
#>
#> Call:
#> regress(fnctl = "mean", formula = atrophy ~ height + sex + U(hs = ~height:sex),
#> data = mri)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -28.092 -8.799 -0.935 8.100 48.862
#>
#> Coefficients:
#> Estimate Naive SE Robust SE 95%L 95%H
#> [1] Intercept 47.60 16.28 14.75 18.64 76.57
#> [2] height -0.09272 0.1026 0.09296 -0.2752 0.08978
#> [3] sexMale 36.80 23.93 23.60 -9.543 83.14
#> [4] height:sexMale -0.1690 0.1442 0.1411 -0.4460 0.1079
#> F stat df Pr(>F)
#> [1] Intercept 10.41 1 0.0013
#> [2] height 0.99 1 0.3189
#> [3] sexMale 2.43 1 0.1194
#> [4] height:sexMale 1.44 1 0.2312
#>
#> Residual standard error: 12.51 on 731 degrees of freedom
#> Multiple R-squared: 0.06687, Adjusted R-squared: 0.06304
#> F-statistic: 16.9 on 3 and 731 DF, p-value: 1.274e-10
#>
# Logistic regression of sex on atrophy
mri$sex_bin <- ifelse(mri$sex == "Female", 1, 0)
regress("odds", sex_bin ~ atrophy, data = mri)
#>
#> Call:
#> regress(fnctl = "odds", formula = sex_bin ~ atrophy, data = mri)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.636 -1.129 0.741 1.110 1.985
#>
#> Coefficients:
#>
#> Raw Model:
#> Estimate Naive SE Robust SE F stat df Pr(>F)
#> [1] Intercept 1.430 0.2372 0.2421 34.87 1 < 0.00005
#> [2] atrophy -0.03962 6.296e-03 6.504e-03 37.12 1 < 0.00005
#>
#> Transformed Model:
#> e(Est) e(95%L) e(95%H) F stat df Pr(>F)
#> [1] Intercept 4.177 2.597 6.719 34.87 1 < 0.00005
#> [2] atrophy 0.9612 0.9490 0.9735 37.12 1 < 0.00005
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1018.91 on 734 degrees of freedom
#> Residual deviance: 975.38 on 733 degrees of freedom
#> AIC: 979.38
#>
#> Number of Fisher Scoring iterations: 4
#>
# Cox regression of age on survival
library(survival)
regress("hazard", Surv(obstime, death)~age, data=mri)
#>
#> Call:
#> regress(fnctl = "hazard", formula = Surv(obstime, death) ~ age,
#> data = mri)
#>
#>
#> Coefficients:
#>
#> Raw Model:
#> Estimate Naive SE Robust SE F stat df Pr(>F)
#> [1] age 0.06795 0.01354 0.01412 23.17 1 < 0.00005
#>
#> Transformed Model:
#> e(Est) e(95%L) e(95%H) F stat df Pr(>F)
#> [1] age 1.070 1.041 1.100 23.17 1 < 0.00005
#> n = 735, number of events= 133
#> Overall significance test:
#> Likelihood ratio test= 22.33 on 1 df, p=2.292e-06
#> Wald test = 25.17 on 1 df, p=5.24e-07
#> Score (logrank) test = 25.44 on 1 df, p=4.554e-07
#>
#>