---
title: "Custom input values for confidence intervals and true values"
author: "Alessandro Gasparini"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Custom input values for confidence intervals and true values}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
options(width = 150)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center", fig.height = 6, fig.width = 6,
  out.width = "75%"
)
```

# Single Estimand

`rsimsum` supports custom input values for the true value of the estimand and for confidence intervals limits (used to calculate coverage probability).

To illustrate this feature, we can use the `tt` dataset (bundled with `rsimsum`):

```{r packages}
library(rsimsum)
data("tt", package = "rsimsum")
head(tt)
```

This includes the results of a simulation study assessing robustness of the t-test when estimating the difference between means.
The t-test assumes a t distribution, hence confidence intervals for the estimated mean are generally based on the t distribution.
See for instance the example from the t-test documentation (`?t.test`):

```{r}
t.test(extra ~ group, data = sleep)
```

We can incorporate custom confidence intervals by passing the name of two columns in `data` as the `ci.limits` argument:

```{r}
s1 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c("conf.low", "conf.high"), methodvar = "method", by = "dgm")
summary(s1, stats = "cover")
```

By doing so, we can incorporate different types of confidence intervals in the analysis of Monte Carlo simulation studies.
Compare with the default setting:

```{r}
s2 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", methodvar = "method", by = "dgm")
summary(s2, stats = "cover")
```

The `ci.limits` is also useful when using non-symmetrical confidence intervals, e.g. when using bootstrapped confidence intervals.

A pair of values can also be passed to `rsimsum` as the `ci.limits` argument:

```{r}
s3 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c(-1.5, -0.5), methodvar = "method", by = "dgm")
summary(s3, stats = "cover")
```

If you have a better example of the utility of this method please get in touch: I'd love to hear from you!

By default, `simsum` will calculate confidence intervals using normal-theory, Wald-type intervals.
It is possible to use t-based critical values by providing a column for the (replication-specific) degrees of freedom (analogously as passing confidence bounds to `ci.limits`):

```{r}
s4 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", df = "df", methodvar = "method", by = "dgm")
```

Given that the confidence intervals in (`conf.low`, `conf.high`) are obtained by using critical values from a t distribution, the results of `s4` will be equivalent to the results of `s1`:

```{r}
all.equal(tidy(s1), tidy(s4))
```

We can pass a column of values for `true` as well:

```{r}
tt$true <- -1
s5 <- simsum(data = tt, estvarname = "diff", true = "true", se = "se", ci.limits = c("conf.low", "conf.high"), methodvar = "method", by = "dgm")
summary(s5, stats = "cover")
```

Compare with the default settings:

```{r}
summary(s2, stats = "cover")
```

Finally, we could have multiple columns identifying methods as well.
This uses the `MIsim` and `MIsim2` datasets, which are bundled with {rsimsum}:

```{r}
data("MIsim", package = "rsimsum")
data("MIsim2", package = "rsimsum")
head(MIsim)
head(MIsim2)
```

The syntax when calling `simsum()` is pretty much the same:

```{r}
s6 <- simsum(data = MIsim, estvarname = "b", true = 0.50, se = "se", methodvar = "method")
s7 <- simsum(data = MIsim2, estvarname = "b", true = 0.50, se = "se", methodvar = c("m1", "m2"))
```

See the inferred methods:

```{r}
print(s6)
print(s7)
```

And of course, the estimated performance measures are the same:

```{r}
all.equal(tidy(s6)$est, tidy(s7)$est)
```

# Multiple Estimands at Once

`multisimsum` can be as flexible as `simsum`.
Remember the default behaviour:

```{r}
data("frailty", package = "rsimsum")
ms1 <- multisimsum(
  data = frailty,
  par = "par", true = c(trt = -0.50, fv = 0.75),
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist"
)
summary(ms1, stats = "bias")
```

In this example, we pass the true values of each estimand as the named vector `c(trt = -0.50, fv = 0.75)`.

Say instead we stored the true value of each estimand as a column in our dataset:

```{r}
frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75)
head(frailty)
```

With this data structure, we can pass a string value to `multisimsum` that will identify the `true` column in our dataset:

```{r}
ms2 <- multisimsum(
  data = frailty,
  par = "par", true = "true",
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist"
)
summary(ms2, stats = "bias")
```

We can confirm that we obtain the same results with the two approaches:

```{r}
identical(tidy(ms1), tidy(ms2))
```

This approach is particularly useful when the true value might vary across replications (e.g. when it depends on the simulated dataset).

Of course, it can be combined with custom confidence interval limits for coverage as well:

```{r}
frailty$conf.low <- frailty$b - qt(1 - 0.05 / 2, df = 10) * frailty$se
frailty$conf.high <- frailty$b + qt(1 - 0.05 / 2, df = 10) * frailty$se

ms3 <- multisimsum(
  data = frailty,
  par = "par", true = "true",
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist",
  ci.limits = c("conf.low", "conf.high")
)
summary(ms3, stats = "cover")
```

This will be completely different than before:

```{r}
summary(ms2, stats = "cover")
```

Multiple columns identifying methods are supported with `multisimsum()` as well; examples are omitted here, but it works analogously as with `simsum()`.