02 Quick Preview

Quick preview

## Dataset

Data used in VT are modelized by $\left{Y, T, X_1, \ldots, X_{p-2}\right}$. $p$ is the number of variables.

$Y$ is a binary outcome. In R, $Y$ is a factor. Second level of this factor will be the desirable event. ($Y=1$)
$T$ is treatment variable, $T=1$ means active treatement, $T=0$ means control treatment. In R, $T$ is numeric.
$X_i$ is covariables, $X_i$ can be categorical, continous, binary.

NOTE: if you run VT with interactions, categorical covariables must be transformed into binary variables.

Type ?formatRCTDataset for details.

Related functions/classes in aVirtualTwins package : VT.object(), vt.data(), formatRCTDataset.

VT is a two steps method but with many possibilities

let $\hat{P_{1i}} = P(Y_i = 1|T_i = 1, X_i)$
let $\hat{P_{0i}} = P(Y_i = 1|T_i = 0, X_i)$
let $X = \left{X_1, \ldots, X_{p-2}\right}$

Grow a random forest with data $\left{Y, T, X \right}$.
Grow a random forest with interaction treatement / covariable, i.e. $\left{Y, T, X, XI(T_i=0), XI(T_i=1)\right}$
Grow two random forests, one for each treatement:
- The first with data $\left{Y, X \right}$ where $T_i = 0$
- The second with data $\left{Y, X \right}$ where $T_i = 1$
Build your own model

From one of these methods you can estimate $\hat{P_{1i}}$ and $\hat{P_{0i}}$.

Related functions/classes in aVirtualTwins package : VT.difft(), vt.forest().

Define $Z_i = \hat{P_{1i}} - \hat{P_{0i}}$

Use regression tree to explain $Z$ by covariables $X$. Then subjects with predicted $Z_i$ greater than some threshold $c$ are considered to define a subgroup.
Use classification tree on new variable $Z^{}$ defined by $Z^{}_i=1$ if $Z_i > c$ and $Z^{*}_i=0$ otherwise.

The idea is to identify which covariable from $X$ described variation of $Z$.

Related function in aVirtualTwins package : vt.tree().