Title: | Analysis of Simulation Studies Including Monte Carlo Error |
---|---|
Description: | Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 <https://www.stata-journal.com/article.html?article=st0200>), further extending it with additional performance measures and functionality. |
Authors: | Alessandro Gasparini [aut, cre] , Ian R. White [aut] |
Maintainer: | Alessandro Gasparini <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.13.0.9000 |
Built: | 2024-11-12 03:56:59 UTC |
Source: | https://github.com/ellessenne/rsimsum |
autoplot
can produce a series of plot to summarise results of simulation studies. See vignette("C-plotting", package = "rsimsum")
for further details.
## S3 method for class 'multisimsum' autoplot( object, par, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
## S3 method for class 'multisimsum' autoplot( object, par, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
object |
An object of class |
par |
The parameter results to plot. |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
A ggplot
object.
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", x = TRUE ) library(ggplot2) autoplot(ms, par = "trt") autoplot(ms, par = "trt", type = "lolly", stats = "cover") autoplot(ms, par = "trt", type = "zip") autoplot(ms, par = "trt", type = "est_ba")
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", x = TRUE ) library(ggplot2) autoplot(ms, par = "trt") autoplot(ms, par = "trt", type = "lolly", stats = "cover") autoplot(ms, par = "trt", type = "zip") autoplot(ms, par = "trt", type = "est_ba")
autoplot
can produce a series of plot to summarise results of simulation studies. See vignette("C-plotting", package = "rsimsum")
for further details.
## S3 method for class 'simsum' autoplot( object, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
## S3 method for class 'simsum' autoplot( object, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
object |
An object of class |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
A ggplot
object.
data("MIsim", package = "rsimsum") s <- rsimsum::simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", x = TRUE ) library(ggplot2) autoplot(s) autoplot(s, type = "lolly") autoplot(s, type = "est_hex") autoplot(s, type = "zip", zoom = 0.5) # Nested loop plot: data("nlp", package = "rsimsum") s1 <- rsimsum::simsum( data = nlp, estvarname = "b", true = 0, se = "se", methodvar = "model", by = c("baseline", "ss", "esigma") ) autoplot(s1, stats = "bias", type = "nlp")
data("MIsim", package = "rsimsum") s <- rsimsum::simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", x = TRUE ) library(ggplot2) autoplot(s) autoplot(s, type = "lolly") autoplot(s, type = "est_hex") autoplot(s, type = "zip", zoom = 0.5) # Nested loop plot: data("nlp", package = "rsimsum") s1 <- rsimsum::simsum( data = nlp, estvarname = "b", true = 0, se = "se", methodvar = "model", by = c("baseline", "ss", "esigma") ) autoplot(s1, stats = "bias", type = "nlp")
autoplot method for summary.multisimsum objects
## S3 method for class 'summary.multisimsum' autoplot( object, par, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
## S3 method for class 'summary.multisimsum' autoplot( object, par, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
object |
An object of class |
par |
The parameter results to plot. |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
A ggplot
object.
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", x = TRUE ) sms <- summary(ms) library(ggplot2) autoplot(sms, par = "trt")
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", x = TRUE ) sms <- summary(ms) library(ggplot2) autoplot(sms, par = "trt")
autoplot method for summary.simsum objects
## S3 method for class 'summary.simsum' autoplot( object, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
## S3 method for class 'summary.simsum' autoplot( object, type = "forest", stats = "nsim", target = NULL, fitted = TRUE, scales = "fixed", top = TRUE, density.legend = TRUE, zoom = 1, zip_ci_colours = "yellow", ... )
object |
An object of class |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
A ggplot
object.
data("MIsim", package = "rsimsum") s <- rsimsum::simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", x = TRUE ) ss <- summary(s) library(ggplot2) autoplot(ss) autoplot(ss, type = "lolly")
data("MIsim", package = "rsimsum") s <- rsimsum::simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", x = TRUE ) ss <- summary(s) library(ggplot2) autoplot(ss) autoplot(ss, type = "lolly")
dropbig
is useful to identify replications with large point estimates or standard errors. Large values are defined as standardised values above a given threshold, as defined when calling dropbig
. Regular standardisation using mean and standard deviation is implemented, as well as robust standardisation using median and inter-quartile range. Further to that, the standardisation process is stratified by data-generating mechanism if by
factors are defined.
dropbig( data, estvarname, se = NULL, methodvar = NULL, by = NULL, max = 10, semax = 100, robust = TRUE )
dropbig( data, estvarname, se = NULL, methodvar = NULL, by = NULL, max = 10, semax = 100, robust = TRUE )
data |
A |
estvarname |
The name of the variable containing the point estimates. |
se |
The name of the variable containing the standard errors of the point estimates. |
methodvar |
The name of the variable containing the methods to compare. For instance, methods could be the models compared within a simulation study. Can be |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.). Can be |
max |
Specifies the maximum acceptable absolute value of the point estimates, after standardisation. Defaults to 10. |
semax |
Specifies the maximum acceptable absolute value of the standard error, after standardisation. Defaults to 100. |
robust |
Specifies whether to use robust standardisation (using median and inter-quartile range) rather than normal standardisation (using mean and standard deviation). Defaults to |
The same data.frame
given as input with an additional column named .dropbig
identifying rows that are classified as large (.dropbig = TRUE
) according to the specified criterion.
data("frailty", package = "rsimsum") frailty2 <- subset(frailty, par == "fv") # Using low values of max, semax for illustration purposes: dropbig( data = frailty2, estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", max = 2, semax = 2 ) # Using regular standardisation: dropbig( data = frailty2, estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", max = 2, semax = 2, robust = FALSE )
data("frailty", package = "rsimsum") frailty2 <- subset(frailty, par == "fv") # Using low values of max, semax for illustration purposes: dropbig( data = frailty2, estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", max = 2, semax = 2 ) # Using regular standardisation: dropbig( data = frailty2, estvarname = "b", se = "se", methodvar = "model", by = "fv_dist", max = 2, semax = 2, robust = FALSE )
A dataset from a simulation study comparing frailty flexible parametric models fitted using penalised likelihood to semiparametric frailty models. Both models are fitted assuming a Gamma and a log-Normal frailty. One thousand datasets were simulated, each containing a binary treatment variable with a log-hazard ratio of -0.50. Clustered survival data was simulated assuming 50 clusters of 50 individuals each, with a mixture Weibull baseline hazard function and a frailty following either a Gamma or a Log-Normal distribution. The comparison involves estimates of the log-treatment effect, and estimates of heterogeneity (i.e. the estimated frailty variance).
frailty frailty2
frailty frailty2
A data frame with 16,000 rows and 6 variables:
i
Simulated dataset number.
b
Point estimate.
se
Standard error of the point estimate.
par
The estimand. trt
is the log-treatment effect, fv
is the variance of the frailty.
fv_dist
The true frailty distribution.
model
Method used (Cox, Gamma
, Cox, Log-Normal
, RP(P), Gamma
, or RP(P), Log-Normal
).
An object of class data.frame
with 16000 rows and 7 columns.
frailty2
is a version of the same dataset with the model
column split into two columns, m_baseline
and m_frailty
.
data("frailty", package = "rsimsum") data("frailty2", package = "rsimsum")
data("frailty", package = "rsimsum") data("frailty2", package = "rsimsum")
Extract data slots from an object of class simsum
, summary.simsum
, multisimsum
, or summary.multisimsum
.
get_data(x, stats = NULL, ...)
get_data(x, stats = NULL, ...)
x |
An object of class |
stats |
Summary statistics to include; can be a scalar value or a vector. Possible choices are:
|
... |
Ignored. |
A data.frame
containing summary statistics from a simulation study.
data(MIsim) x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) get_data(x) # Extracting only bias and coverage: get_data(x, stats = c("bias", "cover")) xs <- summary(x) get_data(xs)
data(MIsim) x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) get_data(x) # Extracting only bias and coverage: get_data(x, stats = c("bias", "cover")) xs <- summary(x) get_data(xs)
Reports whether x is a multisimsum object
is.multisimsum(x)
is.multisimsum(x)
x |
An object to test. |
Reports whether x is a simsum object
is.simsum(x)
is.simsum(x)
x |
An object to test. |
Reports whether x is a summary.multisimsum object
is.summary.multisimsum(x)
is.summary.multisimsum(x)
x |
An object to test. |
Reports whether x is a summary.simsum object
is.summary.simsum(x)
is.summary.simsum(x)
x |
An object to test. |
Create tables in LaTeX, HTML, Markdown, or reStructuredText from objects of class simsum
, summary.simsum
, multisimsum
, summary.multisimsum
.
## S3 method for class 'simsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.simsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'multisimsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.multisimsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) kable(x, ...)
## S3 method for class 'simsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.simsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'multisimsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.multisimsum' kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...) kable(x, ...)
x |
An object of class |
stats |
Summary statistics to include. See |
digits |
Maximum number of digits for numeric columns; |
... |
Further arguments passed to |
A dataset from a simulation study comparing different ways to handle missing covariates when fitting a Cox model (White and Royston, 2009).
One thousand datasets were simulated, each containing normally distributed covariates and
and time-to-event outcome.
Both covariates have 20% of their values deleted independently of all other variables so the data became missing completely at random (Little and Rubin, 2002).
Each simulated dataset was analysed in three ways.
A Cox model was fit to the complete cases (
CC
).
Then two methods of multiple imputation using chained equations (van Buuren, Boshuizen, and Knook, 1999) were used.
The MI_LOGT
method multiply imputes the missing values of and
with the outcome included as
and
, where
is the survival time and
is the event indicator.
The
MI_T
method is the same except that is replaced by
in the imputation model.
The results are stored in long format.
MIsim MIsim2
MIsim MIsim2
A data frame with 3,000 rows and 4 variables:
dataset
Simulated dataset number.
method
Method used (CC
, MI_LOGT
or MI_T
).
b
Point estimate.
se
Standard error of the point estimate.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3000 rows and 5 columns.
MIsim2
is a version of the same dataset with the method
column split into two columns, m1
and m2
.
White, I.R., and P. Royston. 2009. Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15):1982-1998 doi:10.1002/sim.3618
data("MIsim", package = "rsimsum") data("MIsim2", package = "rsimsum")
data("MIsim", package = "rsimsum") data("MIsim2", package = "rsimsum")
multisimsum
is an extension of simsum()
that can handle multiple estimated parameters at once.
multisimsum
calls simsum()
internally, each estimands at once.
There is only one new argument that must be set when calling multisimsum
: par
, a string representing the column of data
that identifies the different estimands.
Additionally, with multisimsum
the argument true
can be a named vector, where names correspond to each estimand (see examples).
Otherwise, constant values (or values identified by a column in data
) will be utilised.
See vignette("E-custom-inputs", package = "rsimsum")
for more details.
multisimsum( data, par, estvarname, se = NULL, true = NULL, methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, df = NULL, dropbig = FALSE, x = FALSE, control = list() )
multisimsum( data, par, estvarname, se = NULL, true = NULL, methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, df = NULL, dropbig = FALSE, x = FALSE, control = list() )
data |
A |
par |
The name of the variable containing the methods to compare.
Can be |
estvarname |
The name of the variable containing the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
se |
The name of the variable containing the standard errors of the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
true |
The true value of the parameter; this is used in calculations of bias, relative bias, coverage, and mean squared error and is required whenever these performance measures are requested.
|
methodvar |
The name of the variable containing the methods to compare.
For instance, methods could be the models compared within a simulation study.
Can be |
ref |
Specifies the reference method against which relative precision will be calculated.
Only useful if |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.).
Can be |
ci.limits |
Can be used to specify the limits (lower and upper) of confidence intervals used to calculate coverage and bias-eliminated coverage.
Useful for non-Wald type estimators (e.g. bootstrap).
Defaults to |
df |
Can be used to specify that a column containing the replication-specific number of degrees of freedom that will be used to calculate confidence intervals for coverage (and bias-eliminated coverage) assuming t-distributed critical values (rather than normal theory intervals).
See |
dropbig |
Specifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Defaults to |
x |
Set to |
control |
A list of parameters that control the behaviour of
|
The following names are not allowed for estvarname
, se
, methodvar
, by
, par
: stat
, est
, mcse
, lower
, upper
, :methodvar
.
An object of class multisimsum
.
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) ms
data("frailty", package = "rsimsum") ms <- multisimsum( data = frailty, par = "par", true = c(trt = -0.50, fv = 0.75), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) ms
A dataset from a simulation study with 150 data-generating mechanisms, useful to illustrate nested loop plots. This simulation study aims to compare the Cox model and flexible parametric models in a variety of scenarios: different baseline hazard functions, sample size, and varying amount of heterogeneity unaccounted for in the model (simulated as white noise with a given variance). A Cox model and a Royston-Parmar model with 5 degrees of freedom are fit to each replication.
nlp
nlp
A data frame with 30,000 rows and 10 variables:
dgm
Data-generating mechanism, 1 to 150.
i
Simulated dataset number.
model
Method used, with 1 the Cox model and 2 the RP(5) model.
b
Point estimate for the log-hazard ratio.
se
Standard error of the point estimate.
baseline
Baseline hazard function of the simulated dataset.
ss
Sample size of the simulated dataset.
esigma
Standard deviation of the white noise.
pars
(Ancillary) Parameters of the baseline hazard function.
Further details on this simulation study can be found in the R script used to generate this dataset, available on GitHub: https://github.com/ellessenne/rsimsum/blob/master/data-raw/nlp-data.R
Cox D.R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society, Series B (Methodological) 34(2):187-220. doi:10.1007/978-1-4612-4380-9_37
Royston, P. and Parmar, M.K. 2002. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(15):2175-2197 doi:10.1002/sim.1203
Rücker, G. and Schwarzer, G. 2014. Presenting simulation results in a nested loop plot. BMC Medical Research Methodology 14:129 doi:10.1186/1471-2288-14-129
data("nlp", package = "rsimsum")
data("nlp", package = "rsimsum")
The function nsim
computes the number of simulations to perform based on the accuracy of an estimate of interest, using the following equation:
where is the specified level of accuracy of the estimate of interest you are willing to accept (i.e. the permissible difference from the true value
),
is the
quantile of the standard normal distribution,
is the
quantile of the standard normal distribution with
being the power to detect a specific difference from the true value as significant, and
is the variance of the parameter of interest.
nsim(alpha, sigma, delta, power = 0.5)
nsim(alpha, sigma, delta, power = 0.5)
alpha |
Significance level. Must be a value between 0 and 1. |
sigma |
Variance for the parameter of interest. Must be greater than 0. |
delta |
Specified level of accuracy of the estimate of interest you are willing to accept. Must be greater than 0. |
power |
Power to detect a specific difference from the true value as significant. Must be a value between 0 and 1. Defaults to 0.5, e.g. a power of 50%. |
A scalar value representing the number of simulations to perform based on the accuracy required.
Burton, A., Douglas G. Altman, P. Royston. et al. 2006. The design of simulation studies in medical statistics. Statistics in Medicine 25: 4279-4292 doi:10.1002/sim.2673
# Number of simulations required to produce an estimate to within 5% # accuracy of the true coefficient of 0.349 with a 5% significance level, # assuming the variance of the estimate is 0.0166 and 50% power: nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 5 / 100, power = 0.5) # Number of simulations required to produce an estimate to within 1% # accuracy of the true coefficient of 0.349 with a 5% significance level, # assuming the variance of the estimate is 0.0166 and 50% power: nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 1 / 100, power = 0.5)
# Number of simulations required to produce an estimate to within 5% # accuracy of the true coefficient of 0.349 with a 5% significance level, # assuming the variance of the estimate is 0.0166 and 50% power: nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 5 / 100, power = 0.5) # Number of simulations required to produce an estimate to within 1% # accuracy of the true coefficient of 0.349 with a 5% significance level, # assuming the variance of the estimate is 0.0166 and 50% power: nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 1 / 100, power = 0.5)
Print method for multisimsum objects
## S3 method for class 'multisimsum' print(x, ...)
## S3 method for class 'multisimsum' print(x, ...)
x |
An object of class |
... |
Ignored. |
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) ms data("frailty", package = "rsimsum") frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75) ms <- multisimsum(data = frailty, par = "par", estvarname = "b", true = "true") ms
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) ms data("frailty", package = "rsimsum") frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75) ms <- multisimsum(data = frailty, par = "par", estvarname = "b", true = "true") ms
Print method for simsum objects
## S3 method for class 'simsum' print(x, ...)
## S3 method for class 'simsum' print(x, ...)
x |
An object of class |
... |
Ignored. |
data("MIsim") x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) x MIsim$true <- 0.5 x <- simsum(data = MIsim, estvarname = "b", true = "true", se = "se") x
data("MIsim") x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) x MIsim$true <- 0.5 x <- simsum(data = MIsim, estvarname = "b", true = "true", se = "se") x
Print method for summary.multisimsum
objects
## S3 method for class 'summary.multisimsum' print(x, digits = 4, mcse = TRUE, ...)
## S3 method for class 'summary.multisimsum' print(x, digits = 4, mcse = TRUE, ...)
x |
An object of class |
digits |
Number of significant digits used for printing. Defaults to 4. |
mcse |
Should Monte Carlo standard errors be reported?
If |
... |
Ignored. |
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) sms <- summary(ms, stats = c("bias", "cover", "mse")) sms # Printing less significant digits: print(sms, digits = 3) # Printing confidence intervals: print(sms, digits = 3, mcse = FALSE) # Printing values only: print(sms, mcse = NULL)
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) sms <- summary(ms, stats = c("bias", "cover", "mse")) sms # Printing less significant digits: print(sms, digits = 3) # Printing confidence intervals: print(sms, digits = 3, mcse = FALSE) # Printing values only: print(sms, mcse = NULL)
Print method for summary.simsum
objects
## S3 method for class 'summary.simsum' print(x, digits = 4, mcse = TRUE, ...)
## S3 method for class 'summary.simsum' print(x, digits = 4, mcse = TRUE, ...)
x |
An object of class |
digits |
Number of significant digits used for printing. Defaults to 4. |
mcse |
Should Monte Carlo standard errors be reported?
If |
... |
Ignored. |
data("MIsim") x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) xs <- summary(x) xs # Printing less significant digits: print(xs, digits = 2) # Printing confidence intervals: print(xs, mcse = FALSE) # Printing values only: print(xs, mcse = NULL)
data("MIsim") x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) xs <- summary(x) xs # Printing less significant digits: print(xs, digits = 2) # Printing confidence intervals: print(xs, mcse = FALSE) # Printing values only: print(xs, mcse = NULL)
A dataset from a simulation study assessing the impact of misspecifying the baseline hazard in survival models on regression coefficients.
One thousand datasets were simulated, each containing a binary treatment variable with a log-hazard ratio of -0.50.
Survival data was simulated for two different sample sizes, 50 and 250 individuals, and under two different baseline hazard functions, exponential and Weibull.
Consequently, a Cox model (Cox, 1972), a fully parametric exponential model, and a Royston-Parmar (Royston and Parmar, 2002) model with two degrees of freedom were fit to each simulated dataset.
See vignette("B-relhaz", package = "rsimsum")
for more information.
relhaz
relhaz
A data frame with 1,200 rows and 6 variables:
dataset
Simulated dataset number.
n
Sample size of the simulate dataset.
baseline
Baseline hazard function of the simulated dataset.
model
Method used (Cox
, Exp
, or RP(2)
).
theta
Point estimate for the log-hazard ratio.
se
Standard error of the point estimate.
Cox D.R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society, Series B (Methodological) 34(2):187-220. doi:10.1007/978-1-4612-4380-9_37
Royston, P. and Parmar, M.K. 2002. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(15):2175-2197 doi:10.1002/sim.1203
data("relhaz", package = "rsimsum")
data("relhaz", package = "rsimsum")
simsum()
computes performance measures for simulation studies in which each simulated data set yields point estimates by one or more analysis methods.
Bias, relative bias, empirical standard error and precision relative to a reference method can be computed for each method.
If, in addition, model-based standard errors are available then simsum()
can compute the average model-based standard error, the relative error in the model-based standard error, the coverage of nominal confidence intervals, the coverage under the assumption that there is no bias (bias-eliminated coverage), and the power to reject a null hypothesis.
Monte Carlo errors are available for all estimated quantities.
simsum( data, estvarname, se = NULL, true = NULL, methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, df = NULL, dropbig = FALSE, x = FALSE, control = list() )
simsum( data, estvarname, se = NULL, true = NULL, methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, df = NULL, dropbig = FALSE, x = FALSE, control = list() )
data |
A |
estvarname |
The name of the variable containing the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
se |
The name of the variable containing the standard errors of the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
true |
The true value of the parameter; this is used in calculations of bias, relative bias, coverage, and mean squared error and is required whenever these performance measures are requested.
|
methodvar |
The name of the variable containing the methods to compare.
For instance, methods could be the models compared within a simulation study.
Can be |
ref |
Specifies the reference method against which relative precision will be calculated.
Only useful if |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.).
Can be |
ci.limits |
Can be used to specify the limits (lower and upper) of confidence intervals used to calculate coverage and bias-eliminated coverage.
Useful for non-Wald type estimators (e.g. bootstrap).
Defaults to |
df |
Can be used to specify that a column containing the replication-specific number of degrees of freedom that will be used to calculate confidence intervals for coverage (and bias-eliminated coverage) assuming t-distributed critical values (rather than normal theory intervals).
See |
dropbig |
Specifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Defaults to |
x |
Set to |
control |
A list of parameters that control the behaviour of
|
The following names are not allowed for any column in data
that is passed to simsum()
: stat
, est
, mcse
, lower
, upper
, :methodvar
, :true
.
An object of class simsum
.
White, I.R. 2010. simsum: Analyses of simulation studies including Monte Carlo error. The Stata Journal 10(3): 369-385. https://www.stata-journal.com/article.html?article=st0200
Morris, T.P., White, I.R. and Crowther, M.J. 2019. Using simulation studies to evaluate statistical methods. Statistics in Medicine, doi:10.1002/sim.8086
Gasparini, A. 2018. rsimsum: Summarise results from Monte Carlo simulation studies. Journal of Open Source Software 3(26):739, doi:10.21105/joss.00739
data("MIsim", package = "rsimsum") s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", ref = "CC") # If 'ref' is not specified, the reference method is inferred s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method")
data("MIsim", package = "rsimsum") s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", ref = "CC") # If 'ref' is not specified, the reference method is inferred s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method")
The summary()
method for objects of class multisimsum
returns confidence intervals for performance measures based on Monte Carlo standard errors.
## S3 method for class 'multisimsum' summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
## S3 method for class 'multisimsum' summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
object |
An object of class |
ci_level |
Significance level for confidence intervals based on Monte Carlo standard errors. Ignored if a |
df |
Degrees of freedom of a t distribution that will be used to calculate confidence intervals based on Monte Carlo standard errors.
If |
stats |
Summary statistics to include; can be a scalar value or a vector (for multiple summary statistics at once). Possible choices are:
|
... |
Ignored. |
An object of class summary.multisimsum
.
multisimsum()
, print.summary.multisimsum()
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) sms <- summary(ms) sms
data(frailty) ms <- multisimsum( data = frailty, par = "par", true = c( trt = -0.50, fv = 0.75 ), estvarname = "b", se = "se", methodvar = "model", by = "fv_dist" ) sms <- summary(ms) sms
The summary()
method for objects of class simsum
returns confidence intervals for performance measures based on Monte Carlo standard errors.
## S3 method for class 'simsum' summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
## S3 method for class 'simsum' summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
object |
An object of class |
ci_level |
Significance level for confidence intervals based on Monte Carlo standard errors. Ignored if a |
df |
Degrees of freedom of a t distribution that will be used to calculate confidence intervals based on Monte Carlo standard errors. If |
stats |
Summary statistics to include; can be a scalar value or a vector (for multiple summary statistics at once). Possible choices are:
Defaults to |
... |
Ignored. |
An object of class summary.simsum
.
simsum()
, print.summary.simsum()
data("MIsim") object <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) xs <- summary(object) xs
data("MIsim") object <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) xs <- summary(object) xs
Extract a tidy dataset with results from an object of class simsum
, summary.simsum
, multisimsum
, or summary.multisimsum
.
## S3 method for class 'simsum' tidy(x, stats = NULL, ...) ## S3 method for class 'summary.simsum' tidy(x, stats = NULL, ...) ## S3 method for class 'multisimsum' tidy(x, stats = NULL, ...) ## S3 method for class 'summary.multisimsum' tidy(x, stats = NULL, ...)
## S3 method for class 'simsum' tidy(x, stats = NULL, ...) ## S3 method for class 'summary.simsum' tidy(x, stats = NULL, ...) ## S3 method for class 'multisimsum' tidy(x, stats = NULL, ...) ## S3 method for class 'summary.multisimsum' tidy(x, stats = NULL, ...)
x |
An object of class |
stats |
Summary statistics to include; can be a scalar value or a vector. Possible choices are:
|
... |
Ignored. |
A data.frame
containing summary statistics from a simulation study.
data(MIsim) x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) tidy(x) # Extracting only bias and coverage: tidy(x, stats = c("bias", "cover")) xs <- summary(x) tidy(xs)
data(MIsim) x <- simsum( data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method" ) tidy(x) # Extracting only bias and coverage: tidy(x, stats = c("bias", "cover")) xs <- summary(x) tidy(xs)
A dataset from a simulation study with 4 data-generating mechanisms, useful to illustrate custom input of confidence intervals to calculate coverage probability. This simulation study aims to compare the t-test assuming pooled or unpooled variance in violation (or not) of the t-test assumptions: normality of data, and equality (or not) or variance between groups. The true value of the difference between groups is -1.
tt
tt
A data frame with 4,000 rows and 8 variables:
diff
The difference in mean between groups estimated by the t-test;
se
Standard error of the estimated difference;
conf.low
, conf.high
Confidence interval for the difference in mean as reported by the t-test;
df
The number of degrees of freedom assumed by the t-test;
repno
Identifies each replication, between 1 and 500;
dgm
Identifies each data-generating mechanism: 1 corresponds to normal data with equal variance between the groups, 2 is normal data with unequal variance, 3 and 4 are skewed data (simulated from a Gamma distribution) with equal and unequal variance between groups, respectively;
method
Analysis method: 1 represents the t-test with pooled variance, while 2 represents the t-test with unpooled variance.
Further details on this simulation study can be found in the R script used to generate this dataset, available on GitHub: https://github.com/ellessenne/rsimsum/blob/master/data-raw/tt-data.R
data("tt", package = "rsimsum")
data("tt", package = "rsimsum")