rcITR

rcITR-vignette

Introduction

The package rcITR is an individualized treatment rule (ITR) discovery tool that estimates ITRs under a constrained optimization framework. The main functions are rcDT() and rcRF(). The rcDT() function constructs a risk controlled decision tree (rcDT) and rcRF() constructs a risk controlled random forest (rcRF). The aim of each model is to maximize an expected benefit (Y) while constraining risk (R) at a pre-determined threshold. Each model is constructed to optimize a Lagrangian objective function which takes the form, \[ L(d) = E(Y) - \lambda(E(R) - \tau). \] Here, \(d\) is a treatment decision rule of the form \(d(X): X \rightarrow A\) for predictors \(X\) and treatments \(A\in \{0,1\}\). \(E(Y)\) and \(E(R)\) are estimated using the so-called “value” function and correspond to the expected benefit and risk if rule \(d\) were used. For example, the estimated value of treatment recommendation \(d\) would be \[E(Y) =N^{-1} \sum_{i=1}^{N} \frac{\textbf{I}_{a_i=d(\textbf{x}_i)}}{\hat{p}(a_i|\textbf{x}_i)}y_i \] where \(\textbf{x}_i\) is the covariate vector, \(a_i\) is the original treatment assignment, \(d(\textbf{x}_i)\) is the proposed treatment rule, and \(\hat{p}(a_i|\textbf{x}_i)\) is the probability of receiving treatment given the covariates. The estimated ``value" associated with the risk outcome is simiarly estimated. The pre-determined risk threshold \(\tau\) is set with knowledge of the expected range of risk values in the population of interest and \(\lambda\) is the penalty for rule \(d\) exceeding the risk threshold.

The input dataset needs to be a data.frame() with the following: (1) benefit score column, preferably labeled as y, (2) risk score column, preferably labeled as r, (3) binary treatment column, preferably labeled trt having treatment indicator scores of 0 or +1, (4) propensity score column, preferably labeled prtx, (5) columns of predictors

Constructing an rcDT Model

A single rcDT tree structrue can be constructed using the function rcDT. By default the number of observations allowed in a terminal node is 20 (min.ndsz = 20), there must be at least 5 observations from each treatment group in a terminal node (n0 = 5), and the maximum tree depth is set at 15 (max.depth = 15). The initial objective value of the root node is the maximum of \(\hat{L}(d)\) with all subjects given treatment or all patients given control. The root node is split only made if there is a binary partition on a predictor for which the Lagrangian objective increases. The same is true for additional splits so that the tree cannot grow larger unless there is an increase in overall objective value of the tree given by the split. The function requires the user to input values for \(\tau\) and \(\lambda\).

rcDT Example

We will use an example dataset generated from the function generateData() which simulates RCT data from the model with

\[ Y = 1 - X_3 + X_4 + 3\text{I}_{x \in S}\text{I}_{A = 1} + 3\text{I}_{x \notin S}\text{I}_{A = 0} + \epsilon_Y. \] \[ R = 2 + 2X_3 + X_4 + (2X_2 + 0.1)(2A-1) + \epsilon_R. \] where \(S = \{X_1\le 0.7\hspace{2mm} \cap\hspace{2mm} X_2\le 0.7\}\), \(\epsilon_Y\sim N(0,1)\), and \(\epsilon_R\sim N(0,0.5)\). Covariates \(X_1 - X_{10} \sim \text{Unif}(0,1)\). The simulated data has 1000 observations.

set.seed(123)
dat <- generateData(n = 1000)
str(dat)

## 'data.frame':    1000 obs. of  14 variables:
##  $ x1  : num  0.274 0.594 0.16 0.853 0.848 0.478 0.774 0.295 0.066 0.441 ...
##  $ x2  : num  0.16 0.145 0.149 0.514 0.493 0.616 0.447 0.056 0.005 0.222 ...
##  $ x3  : num  0.206 0.943 0.379 0.626 0.184 0.659 0.735 0.496 0.869 0.832 ...
##  $ x4  : num  0.304 0.833 0.594 0.807 0.294 0.141 0.889 0.008 0.569 0.968 ...
##  $ x5  : num  0.249 0.989 0.717 0.652 0.241 0.087 0.297 0.542 0.781 0.272 ...
##  $ x6  : num  0.44 0.397 0.372 0.529 0.074 0.717 0.243 0.844 0.995 0.105 ...
##  $ x7  : num  0.93 0.591 0.08 0.845 0.03 0.815 0.391 0.143 0.667 0.619 ...
##  $ x8  : num  0.578 0.489 0.742 0.226 0.749 0.825 0.1 0.276 0.021 0.596 ...
##  $ x9  : num  0.798 0.879 0.242 0.56 0.905 0.008 0.163 0.26 0.773 0.071 ...
##  $ x10 : num  0.311 0.325 0.87 0.329 0.126 0.356 0.931 0.875 0.82 0.022 ...
##  $ trt : num  0 1 0 1 1 0 1 1 1 0 ...
##  $ y   : num  -0.259 2.597 -0.302 2.04 -0.105 ...
##  $ r   : num  2.44 5.06 3.4 5.58 3.56 ...
##  $ prtx: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Figure 1 presents the risk and efficacy score distributions for the original treated and control groups. Risk scores are lower on average for the control arm of the study and so we may want to set the risk threshold to be around the center of the distribution of risk scores for the control arm. Note that efficacy scores do not differ much between the two groups. We will set \(\tau = 2.75\), and use a penalty of \(\lambda = 1\) for the demonstration.

Figure 1. Risk score distribution for simulated data

Below is the tree structure generated for the simulated data corresponding the a risk threshold of \(\tau = 2.75\) and penalty \(\lambda = 1\).

tre1 <- rcDT(data = dat, 
             split.var = 1:10,
             risk.threshold = 2.75,
             lambda = 1,
             efficacy = "y",
             risk = "r",
             col.trt = "trt",
             col.prtx = "prtx")
tre1$tree

##   node size n.1 n.0  trt.effect var vname cut.1 cut.2     score
## 1    0 1000 493 507 -0.06270569   2    x2     l 0.288 0.6925219
## 2   01  304 139 165  1.65841703   1    x1     l   0.7 0.9232185
## 4  011  223 104 119  3.16167696  NA  <NA>  <NA>  <NA>        NA
## 5  012   81  35  46 -2.52519851  NA  <NA>  <NA>  <NA>        NA
## 3   02  696 354 342 -0.81243281  NA  <NA>  <NA>  <NA>        NA

The output contains a summary of the tree structure. The node column begins with the root node 0 and each subsequent number indicates the direction of the split, with 1 indicating the left (less than or equal to) node and 2 indicating the right (greater than) node. The first row indicates that the covariate \(X_2\) is selected as the first splitting variable with a cut point of cut.2 = 0.375. The decision is to send treatment to the left node (cut.1 = "l"). size, n.1, and n.0 indicate there are observations in the root node, with originally treated and originally on control. The second row with node = 01 contains information from the left child node with interpretations similar to those described for the root node. The splitting information denoted by NA indicates a terminal node, 01111 for instance.

If we use the same risk threshold of \(\tau = 2.75\) but with a stricter penalty \(\lambda = 2\), we obtain the following tree structre.

tre2 <- rcDT(data = dat, 
             split.var = 1:10,
             risk.threshold = 2.75,
             lambda = 2,
             efficacy = "y",
             risk = "r",
             col.trt = "trt",
             col.prtx = "prtx")
tre2$tree

##   node size n.1 n.0  trt.effect var vname cut.1 cut.2     score
## 1    0 1000 493 507 -0.06270569   2    x2     l 0.222 0.8998027
## 2   01  239 105 134  1.64864910   1    x1     l   0.7 1.0689384
## 5  011  178  81  97  3.17618787  NA  <NA>  <NA>  <NA>        NA
## 4  012   61  24  37 -2.94499308  NA  <NA>  <NA>  <NA>        NA
## 3   02  761 388 373 -0.60015177  NA  <NA>  <NA>  <NA>        NA

Note that the tree structures generated for different penalty values are very different. The estimated risk in the training set when \(\lambda = 1\) is 2.53 and when \(\lambda = 2\) the estimated risk is 2.47. Thus we see that selection of \(\lambda\) is important as both penalties controls risk below \(\tau = 2.75\), but setting \(\lambda = 2\) may over control risk and result a loss of potential benefit.

Pruning a Tree

The function prune() determines the series of optimally pruned subtrees from a large tree. A branch of the large tree is deemed the “weakest” branch if trimming the branch from the large tree results in the smallest decrease in objective function value. This is repeated until the null tree is reached. Below is the result from the pruning procedure for \(\tau = 2.75\) and \(\lambda = 1\). Each row corresponds to the sequence of pruning to be performed in the input tree. For instance, in this case we would first trim off descendant nodes of 0111 and declare 0111 a terminal node. The column V gives the sequence of objective function values for each pruning event.

pruned1 <- prune(tre = tre1, 
                 a = 0, 
                 risk.threshold = 2.75, 
                 lambda = 1)

##   subtree node.rm size.tree size.tmnl    alpha     V Benefit  Risk
## 1       1      01         5         3    0.231 0.923   0.747 2.574
## 2       2       0         3         2    0.383 0.693   0.555 2.612
## 3       3      NA         1         1 9999.000 0.310   0.031 2.471

The first row represents the entire tree (subtree 1) and provides the weakest link (node.rm), number of total nodes in the subtree (size.tree), number of terminal nodes in the subtree (size.tmnl), the \(\alpha\) parameter, the Lagrangian “value” for the subtree (V), the benefit in the training data (Benefit), and the risk (Risk). Here, the risk is controlled in all subtrees and the third subtree also gives the greatest benefit.

Cross Validation for Model Selection

One issue with the approach above for model selection is the risk of overfitting through using the training data alone for model selection. The function rcDT.select() will perform n-fold cross validation to select the optimal tuning parameter (\(\alpha\)). The function returns the optimal model, selected tuning parameter, and several summary measures.

rcDT.fit <- rcDT.select(data = dat, 
                        split.var = 1:10,
                        risk.threshold = 2.75, 
                        efficacy = "y",
                        risk = "r",
                        col.trt = "trt",
                        col.prtx = "prtx",
                        nfolds = 5,
                        verbose = FALSE)

Note that the optimal tree presented here is selected using 5-fold cross validation and is subtree 1 from the pruning procedure. The final model has a training set benefit estimated as 3.53 and risk estimated as 2.71.

Constructing a Risk Controlled Random Forest (rcRF) Model

A single tree which is trained using all available data may be overfitted, having poor predictive ability in an external dataset. Hence, we make a decision rule using a forest of rcDT predictors. Each rcDT model is constructed to be more variable, but the aggregation of the trees in the mitigates this variance. An rcRF is contructed using the function rcRF() and requires similar inputs to the rcDT() function. To randomized the growth of trees in the forest a subset of predictors, mtry, is selected as potential splitting variables at each split which defaults to the maximum of 1/3 the number of splitting variables and 1. By default the number of trees contructed, ntree, is 500. Each tree is grown using a bootstrap sample taken from the input dataset without replacement. The function returns the bootstrap samples used in tree construction, the trees, the model parameters, and several summaries for the in- and out-of-bag samples. We leverage the out-of-bag sample, i.e. observations not in the bootstrap sample, to obtain risk control estimates which are less biased than those obtained directly from training data. Hence, several values of \(\lambda\) may need to be used in order to final an optimized model for a given risk threshold \(\tau\). The final treatment recommendation is given by taking a majority vote from the individual models in the forest.

Below is the function call for constructing an rcRF model. In the next section we compare modeling results from rcDT and rcRF procedures graphically.

set.seed(2)
rcRF.fit <- rcRF.select(data = dat, 
                        split.var = 1:10, 
                        efficacy = "y", 
                        risk = "r",
                        col.trt = "trt",
                        col.prtx = "prtx",
                        risk.threshold = 2.75,
                        ntree = 100,
                        verbose = FALSE)

Predictions for rcDT and rcRF Models

Prediction from rcDT and rcRF models can be obtained using the predict.ITR() function. The output from predict.ITR() is a list of several summaries from the model. If the model from which predictions are desired is an rcDT model, then of interest is trt.pred which gives the treatment recommendation vector for the prediction data. If the model is an rcRF, then trt.pred given the treatment recommendation based on majority voting. Also, SummaryTreat gives the probability of being recommended to active treatment (\(a = 1\)) and tree.votes provides a matrix of votes from each tree.

preds.rcDT <- predict.ITR(fit = rcDT.fit$best.tree, 
                          new.data = dat, 
                          split.var = 1:10)$trt.pred
preds.rcRF <- predict.ITR(fit = rcRF.fit$best.fit, 
                          new.data = dat, 
                          split.var = 1:10)$trt.pred

Figure 2. Prediction Comparisons for rcDT and rcRF Models.

Variable Importance

Finally, we include a function to calculate the importance of a predictor relative to the total Lagrangian objective, the benefit scores, and the risk scores. This is accomplished by using the function Variable.Importance.ITR(). All three importance measures are calculated and returned. Here the sort argument is set to FALSE so that the importances are not individually ordered.

VI <- Variable.Importance.ITR(rcRF.fit = rcRF.fit$best.fit, 
                              sort = FALSE)
do.call(cbind, VI)

##              VI VI.Efficacy     VI.Risk
## x1  0.285692680 0.347036411 0.030529479
## x2  0.663776169 0.598648287 0.899783370
## x3  0.007945324 0.008587843 0.011958512
## x4  0.007106902 0.007387884 0.012183190
## x5  0.005588163 0.005883192 0.007575996
## x6  0.010194106 0.011382871 0.010402283
## x7  0.006541436 0.007380822 0.009161867
## x8  0.006534948 0.006326655 0.010181776
## x9  0.004240365 0.004529210 0.005312606
## x10 0.002379908 0.002836826 0.002910921

The variable importance results would be interpreted as \(X_1\) and \(X_2\) being important contributors to the overall rule (VI). Both \(X_1\) and \(X_2\) contribute similarly to efficacy (benefit) scores, but \(X_2\) is strongly predictive of risk scores, i.e. important in keeping the risk constrained.

References

[In Submission] Doubleday, K., Zhou, H., Fu, H., Zhou, J. (2020), “Risk Controlled Decision Trees and Random Forests for Precision Medicine”.