E Nested cross-validation

The nested cross-validation includes two loops: the inner CV and outer CV. They serve different purposes: the inner CV is used to simulate traditional CV which is used for tuning model parameters, and the outer CV (the remaining untrained fold) is used for evaluating model performance. The aim of nested CV is to separate the parameter tuning process and model evaluation process. And different from traditional evaluation methods, the nested CV focuses on assessing the modeling procedure rather than the model itself. Therefore, in this step, we do not evaluate a model with specific hyper-parameters, instead, we obtained a validated performance for traditional CV. And in the modeling process, traditional CV was used to find optimal hyper-parameters for fitting in the whole sample data to get final models,

Procedure of nested CV. _(VAL: validation data sets)_

Figure E.1: Procedure of nested CV. (VAL: validation data sets)

Figure E.1 illustrates the process of nested cross-validation. For better presenting the process in this figure, the outer folds were simplified to three folds instead of five folds in this study, and inner CV was simplified to four-folds.

The process is described below:

Step 1: the data set is split into three folds, each fold will serve as an outer validation set once.

Step 2: the two training folds are split into four folds to perform an inner CV for tuning hyper-parameters.

Step3: the optimal hyper-parameter setting is then used to fit the two white folds (training data set) and predict on the yellow outer fold (validation data set).

Step4: Repeat this procedure three times for each yellow fold and then combine the predicted risk across the whole data to generate a validated performance.