Chapter 5 Conclusions
individual childhood cancer survivor developing POI by prespecified ages. The problem of missing data, censoring, and overfitting was carefully addressed. The validated model performance confirmed that the developed models can discriminate well between those survivors who developed POI and those survivors who did not, and the models can provide an accurate absolute estimated risk of developing POI before age 28. In this chapter, I summarized the contributions of this thesis (section 5.1), discussed the limitations (section 5.2), and made recommendations on future work (section 5.3).
5.1 Summary
In Chapter 2, the data from CCSS was cleaned and explored. A new variable, age at event, was derived based on ovarian status and other menstrual history information. To ensure the accuracy of the outcome variable, I discussed the algorithm for deriving it extensively with a pediatric endocrinologist.
The problem of missing data was addressed in Chapter 3 by employing multiple imputation. The details of imputation including imputation model selection, iteration number determination, and post-processing were described. Furthermore, special consideration was given to how to implement multiple imputation and model performance evaluation properly together so that the information in validation sets did not ‘leak’ to the training sets.
Chapter 3 also addressed the problem of censoring by assigning individuals with inverse probability censoring weights. This method took into account censored subjects by assigning weights of greater than 1 to those with observed ovarian status at a prespecified age. Since the censoring process was associated with covariates, a random survival forest was used to calculate the probability of remaining uncensored. The competing risk was considered in the formula of weights.
In Chapter 4, two modern machine learning algorithms: EN-ALR and XGBoost were employed and an “Ensemble” algorithm was used to take advantage of the two previous algorithms. The hyperparameter tuning strategy “random search” was used to find optimal hyperparameter settings. To avoid over-optimistic about the model performance, nested CV was employed to give an honest evaluation. An “Ensemble” method achieved the best performance. Its good discriminative power and calibration results (when the age threshold was less than 28) suggested that the final models could be used to predict POI in new data.
5.2 Limitations
Approximately 16% of the female CCSS participants (Figure 2.2) did not complete a questionnaire that contained the menstrual history section. In this research, we assumed those who failed to participate in the surveys had a similar pattern of POI to those who participated. However, a risk of bias would arise if the reason for not participating was associated with the menstrual status.
As a retrospective cohort study, CCSS sent out surveys containing menstrual history sections almost every seven years. This means many participants had to recall their health condition many years ago, implying a risk of recall bias. Especially, when individuals recalled the age of stopping menstruating which happened many years ago, an inaccurate age might be reported. This would influence the outcomes we used to build the models.
The validity of using the multiple imputation and IPCW methods rely on the assumption of missing at random and the assumption of independence between the event process and censoring process given observed covariates. Although the two assumptions are reasonable in this research, it cannot be proved because the missing predictors and censored outcomes are unobserved.
In terms of the final model performance, although AUC, AP showed good performance, calibration curves started deviating from the diagonal line when age was greater than 28, implying that the estimated risk of developing POI by ages over 28 needs to be improved.
5.3 Future work
Future work could be beneficial when more data is released from CCSS. This might alleviate the problem of censoring and thus improving the long-term risk prediction. The performance could be improved by tuning more hyperparameter settings. Especially for the XGBoost model, many more hyperparameters could be tuned. Furthermore, by using the “Ensemble” method, some other machine learning algorithms such as neural networks, support vector machines, and random forest could also contribute to improving ensemble performance. Finally, although the model performance was carefully evaluated in this research, it may not reflect its performance in other childhood cancer populations. An external validation study would be ideal in the future.