Skip to contents

Produces table of relevant descriptive statistics for an arbitrary number of variables of class integer, numeric, Surv, Date, or factor. Descriptive statistics can be obtained within strata, and the user can specify that only a subset of the data be used. Descriptive statistics include the count of observations, the count of cases with missing values, the mean, standard deviation, geometric mean, minimum, and maximum. The user can specify arbitrary quantiles to be estimated, as well as specifying the estimation of proportions of observations within specified ranges.

Usage

descrip(
  ...,
  strata = NULL,
  subset = NULL,
  probs = c(0.25, 0.5, 0.75),
  geomInclude = FALSE,
  replaceZeroes = FALSE,
  restriction = Inf,
  above = NULL,
  below = NULL,
  labove = NULL,
  rbelow = NULL,
  lbetween = NULL,
  rbetween = NULL,
  interval = NULL,
  linterval = NULL,
  rinterval = NULL,
  lrinterval = NULL
)

Arguments

...

an arbitrary number of variables for which descriptive statistics are desired. The arguments can be vectors, matrices, or lists. Individual columns of a matrix or elements of a list may be of class numeric, factor, Surv, or Date. Factor variables are converted to integers. Character vectors will be coerced to numeric. Variables may be of different lengths, unless strata or subset are non-NULL. A single data.frame or tibble may also be entered, in which case each variable in the object will be described.

strata

a vector, matrix, or list of stratification variables. Descriptive statistics will be computed within strata defined by each unique combination of the stratification variables, as well as in the combined sample. If strata is supplied, all variables must be of that same length.

subset

a vector indicating a subset to be used for all descriptive statistics. If subset is supplied, all variables must be of that same length.

probs

a vector of probabilities between 0 and 1 indicating quantile estimates to be included in the descriptive statistics. Default is to compute 25th, 50th (median) and 75th percentiles.

geomInclude

if not FALSE (the default), includes the geometric mean in the descriptive statistics.

replaceZeroes

if not FALSE (the default), this indicates a value to be used in place of zeroes when computing a geometric mean. If TRUE, a value equal to one-half the lowest nonzero value is used. If a numeric value is supplied, that value is used for all variables.

restriction

a value used for computing restricted means, standard deviations, and geometric means with censored time-to-event data. The default value of Inf will cause restrictions at the highest observation. Note that the same value is used for all variables of class Surv.

above

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than each element of above.

below

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than each element of below.

labove

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than or equal to each element of labove.

rbelow

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than or equal to each element of rbelow.

lbetween

a vector of values with -Inf and Inf appended is used as cutpoints to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between successive elements of lbetween, with the left-hand endpoint included in each interval.

rbetween

a vector of values with -Inf and Inf appended is used as cutpoints to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between successive elements of rbetween, with the right-hand endpoint included in each interval.

interval

a two-column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with neither endpoint included in each interval.

linterval

a two-column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the left-hand endpoint included in each interval.

rinterval

a two-column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the right-hand endpoint included in each interval.

lrinterval

a two-column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with both endpoints included in each interval.

Value

An object of class uDescriptives is returned. Descriptive statistics for each variable in the entire subsetted sample, as well as within each stratum if any is defined, are contained in a matrix with rows corresponding to variables and strata and columns corresponding to the descriptive statistics. Descriptive statistics include

  • N: the number of observations.

  • Msng: the number of observations with missing values.

  • Mean: the mean of the nonmissing observations (this is potentially a restricted mean for right-censored time-to-event data).

  • Std Dev: the standard deviation of the nonmissing observations (this is potentially a restricted standard deviation for right-censored time to event data).

  • Geom Mn: the geometric mean of the nonmissing observations (this is potentially a restricted geometric mean for right-censored time to event data). Nonpositive values in the variable will generate NA, unless replaceZeroes was specified.

  • Min: the minimum value of the nonmissing observations (this is potentially restricted for right-censored time-to-event data).

  • Quantiles: columns corresponding to the quantiles specified by probs (these are potentially restricted for right-censored time-to-event data).

  • Max: the maximum value of the nonmissing observations (this is potentially restricted for right-censored time-to-event data).

  • Proportions: columns corresponding to the proportions as specified by above, below, labove, rbelow, lbetween, rbetween, interval, linterval, rinterval, and lrinterval.

  • restriction: the threshold for restricted means, standard deviations, and geometric means.

  • firstEvent: the time of the first event for censored time-to-event variables.

  • lastEvent: the time of the last event for censored time-to-event variables.

  • isDate: an indicator that the variable is a Date object.

Details

This function depends on the survival R package. You should execute library(survival) if that library has not been previously installed. Quantiles are computed for uncensored data using the default method in quantile(). For variables of class factor, descriptive statistics will be computed using the integer coding for factors. For variables of class Surv, estimated proportions and quantiles will be computed from Kaplan-Meier estimates, as will be restricted means, restricted standard deviations, and restricted geometric means. For variables of class Date, estimated proportions will be labeled using the Julian date since January 1, 1970.

Examples


# Read in the data
data(mri) 

# Create the table 
descrip(mri)
#>             N     Msng  Mean       Std Dev    Min        25%        Mdn      
#>     ptid:     735     0   368.0      212.3     1.000      184.5      368.0   
#>  mridate:     735     0 1992-05-09   111.9   1991-10-19 1992-01-10 1992-07-05
#>      age:     735     0   74.57      5.451     65.00      71.00      74.00   
#>      sex:     735     0   1.498      0.5003    1.000      1.000      1.000   
#>     race:     735     0   3.509      0.9580    1.000      4.000      4.000   
#>   weight:     735     0   159.9      30.74     74.00      138.5      158.0   
#>   height:     735     0   165.8      9.710     139.0      158.0      165.9   
#>  packyrs:     735     1   19.60      27.11     0.0000     0.0000     6.500   
#>  yrsquit:     735     0   9.661      14.10     0.0000     0.0000     0.0000  
#>    alcoh:     735     0   2.109      4.852     0.0000     0.0000    0.01920  
#>  physact:     735     0   1.922      2.052     0.0000     0.5538     1.312   
#>      chf:     735     0  0.05578     0.2297    0.0000     0.0000     0.0000  
#>      chd:     735     0   0.3347     0.6862    0.0000     0.0000     0.0000  
#>   stroke:     735     0   0.2367     0.6207    0.0000     0.0000     0.0000  
#> diabetes:     735     0   0.1075     0.3099    0.0000     0.0000     0.0000  
#>  genhlth:     735     0   2.588      0.9382    1.000      2.000      3.000   
#>      ldl:     735    10   125.8      33.60     11.00      102.0      125.0   
#>      alb:     735     2   3.994      0.2690    3.200      3.800      4.000   
#>      crt:     735     2   1.064      0.3030    0.5000     0.9000     1.000   
#>      plt:     735     7   246.0      65.80     92.00      201.8      239.0   
#>      sbp:     735     0   131.1      19.66     78.00      118.0      130.0   
#>      aai:     735     9   1.103      0.1828    0.3171     1.027      1.112   
#>      fev:     735    10   2.207      0.6875    0.4083     1.745      2.158   
#>     dsst:     735    12   41.06      12.71     0.0000     32.00      40.00   
#>  atrophy:     735     0   35.98      12.92     5.000      27.00      35.00   
#>    whgrd:     735     1   2.007      1.410     0.0000     1.000      2.000   
#>   numinf:     735     0   0.6109     0.9895    0.0000     0.0000     0.0000  
#>   volinf:     735     1   3.223      17.36     0.0000     0.0000     0.0000  
#>  obstime:     735     0    1804      392.3     68.00       1837       1879   
#>    death:     735     0   0.1810     0.3852    0.0000     0.0000     0.0000  
#>              75%        Max      
#>     ptid:     551.5      735.0   
#>  mridate:   1992-08-12 1992-10-12
#>      age:     78.00      99.00   
#>      sex:     2.000      2.000   
#>     race:     4.000      4.000   
#>   weight:     179.0      264.0   
#>   height:     173.2      190.5   
#>  packyrs:     33.75      240.0   
#>  yrsquit:     18.50      56.00   
#>    alcoh:     1.144      35.00   
#>  physact:     2.513      13.81   
#>      chf:     0.0000     1.000   
#>      chd:     0.0000     2.000   
#>   stroke:     0.0000     2.000   
#> diabetes:     0.0000     1.000   
#>  genhlth:     3.000      5.000   
#>      ldl:     147.0      247.0   
#>      alb:     4.200      5.000   
#>      crt:     1.200      4.000   
#>      plt:     285.0      539.0   
#>      sbp:     142.0      210.0   
#>      aai:     1.207      1.728   
#>      fev:     2.649      4.471   
#>     dsst:     50.00      82.00   
#>  atrophy:     44.00      84.00   
#>    whgrd:     3.000      9.000   
#>   numinf:     1.000      5.000   
#>   volinf:    0.09420     197.0   
#>  obstime:      2044       2159   
#>    death:     0.0000     1.000