Development and validation of an MRI radiomics-based interpretable machine learning model for predicting the progression-free survival in locally advanced nasopharyngeal carcinoma
Introduction
Nasopharyngeal carcinoma (NPC) is a malignant tumor arising from the mucosal epithelium of the nasopharynx (1). Its geographical distribution is distinctive, primarily distributed in East Asia, Southeast Asia, and North Africa (2). At the time of initial diagnosis, over 70% of patients are diagnosed with locally advanced NPC (LANPC) (3). Some patients with early NPC attain complete clinical remission following radiotherapy (4). Nevertheless, approximately 30% to 40% of NPC patients exhibit distant metastasis or local recurrence, and their median overall survival is often less than 20 months (5). Several randomized clinical trials have demonstrated that concurrent chemoradiotherapy can further improve the progression-free survival (PFS) or overall survival rates in LANPC patients compared to radiotherapy alone (6-8). Given the tumor heterogeneity of NPC, not all patients benefit from adjuvant chemotherapy after concurrent chemoradiotherapy (9). Additionally, they may encounter significantly increased risks of toxic side effects (10). For patients experiencing treatment failure, the prognosis is often poor (11). Consequently, accurately predicting the PFS in LANPC patients is crucial for clinicians to formulate personalized treatment strategies.
Currently, magnetic resonance imaging (MRI) plays an important role in the restaging and prognostic assessment of NPC (12). T2-weighted imaging (T2WI) is preferentially used to evaluate NPC morphological and signal characteristics, such as uneven internal signals and irregular borders. Contrast-enhanced T1-weighted imaging (CET1WI) has the advantage of revealing nasopharyngeal anatomy. Incorporating T2WI and CET1WI can more accurately determine the extent of tumor invasion and growth activity and is, therefore, particularly valuable in assisting clinicians in distinguishing between tissue fibrosis and tumors that recur after radiotherapy (13,14). However, MRI-based visual evaluation may overlook crucial information that cannot be perceived by the human eye, leading to bias in the assessing efficacy and prognosis; thus, a novel diagnostic method is required (15).
Radiomics is a high-throughput technology that extracts quantitative features from standard medical images and has been widely applied in diagnosing, predicting, and evaluating the prognosis of various diseases (16,17). Several studies have demonstrated the feasibility of using radiomics method to predict PFS in LANPC patients (18-22). However, these studies were limited by small sample sizes (less than 200) and lacked independent external validation cohorts, which posed challenges to the generalization performance of the models. Machine learning has been widely used in the field of radiomics and has shown strong predictive performance (19,23). Nevertheless, most previous studies primarily focused on applying machine learning techniques to assess the predictive value of radiomics for prognosis, without conducting interpretability analysis or feature-level image visualization to uncover the underlying logic and internal decision-making process of the models (24-26). Therefore, conducting multicenter interpretable radiomics analyses will be the focus of future research (27,28).
This study aimed to construct and validate an MRI radiomics-based interpretable machine learning model to accurately predict the PFS in LANPC patients, using SHapley Additive exPlanation (SHAP) and image visualization methods to assist clinicians in formulating personalized treatment strategies. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1860/rc).
Methods
Study design
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of the Guangxi Medical University Cancer Hospital (No. KY-2022-303), Wuzhou Red Cross Hospital (No. LL-2022-59), and The Second Affiliated Hospital of Guangxi Medical University (No. KY-2022-0788). Informed consent was waived due to the retrospective nature of the study. This retrospective study enrolled 1,098 pathologically and clinically diagnosed LANPC patients from three hospitals between January 2015 and April 2020 who underwent preoperative MRI of T2WI and CET1WI sequences. Patients from hospitals I (Guangxi Medical University Cancer Hospital) and II (Wuzhou Red Cross Hospital) were randomly allocated to the training and internal validation cohorts in a ratio of 7:3. Patients from hospital III (The Second Affiliated Hospital of Guangxi Medical University) constituted the external validation cohort. Patient recruitment and study design are shown in Figure 1. Details of the inclusion and exclusion criteria for LANPC patients are depicted in Appendix 1.

The clinicopathological data of the LANPC patients included sex, age, survival status, survival time, family history, body mass index, smoking history, T-stage, N-stage, WHO classification, white blood cell count, neutrophil count, platelet count, neutrophil to lymphocyte ratio, hemoglobin level, induction chemotherapy (IC), Epstein-Barr virus (EBV)-DNA, and albumin levels. For missing data, continuous variables were filled with the mean value, and categorical variables were filled with the mode value.
Treatment and follow-up
All patients received three cycles of IC, followed by concurrent chemoradiotherapy. Additionally, patients underwent regular follow-up every 1–3 months for the first 2 years, every 6 months from years 3 to 5, and annually thereafter. The follow-up period lasted for at least 3 years. The prognostic outcome was PFS, which was described as the duration from the initial therapy to imaging or pathological evidence of disease advancement or censoring at the final follow-up.
Image preprocessing and tumor segmentation
Prior to antineoplastic therapy, all patients underwent MRI of T2WI and CET1WI sequences. The detailed magnetic resonance (MR) scanning protocols and acquisition parameters are provided in Appendix 2. To minimize the central impact of MR images from various hospitals and scanners, all initial MR images of T2WI and CET1WI sequences were subjected to suitable preprocessing. Initially, the N4 bias field correction technique was used to correct the intensity inhomogeneity in the MR images (29). Subsequently, a linear interpolation algorithm was utilized to resample the images to voxel spacings of 1×1×1 mm3 (x, y, z). Finally, through image intensity normalization, we scaled the intensity values of all images to a standard range of 0 to 100, minimizing intensity differences between different scanners.
All MR images were retrieved from the picture archiving and communication system and exported to the ITK-SNAP software (version 3.8.0). Two radiologists (reader A and reader B), both with 8 years of experience in head and neck MRI diagnosis, independently referred to coronal and sagittal images, and without having knowledge of the tumor profiles in the original MR images, used the ITK-SNAP software to manually segment the volumes of interest (VOIs) for the primary tumors on MR images of T2WI and CET1WI sequences, respectively (30). Unrelated regions such as air and bones were excluded through manual adjustment. All segmented VOIs were confirmed by an experienced radiologist (with 25 years of expertise in diagnosing head and neck MRI), and any inconsistencies were handled through consensus-building discussions. All radiologists were blinded to patients’ clinicopathological information.
Radiomics feature extraction and standardization
The open-source Python package Pyradiomics was utilized to automatically extract features from the segmented VOIs in the MR images (31). For each sequence, a total of 944 radiomic features were extracted, including 18 first-order statistical, 14 shape-based, 75 texture, 93 squared transform, and 744 wavelet features. These features are described in the Pyradiomics documentation (https://pyradiomics.readthedocs.io/en/v3.1.0/).
Given the potential batch effects from different hospitals and scanners, the dataset from hospital II in the training cohort was served as the reference cohort. A Combat harmonization algorithm was then employed to pool the radiomics data from hospitals I and III, with the aim of eliminating the batch effect (32). Subsequently, the data were standardized using the z-score method to eliminate dimensional differences among the radiomics features. The overall radiomics workflow is illustrated in Figure 2.

Stability assessment of radiomics features
To evaluate the repeatability of the VOIs segmentation, MR images of 50 patients were randomly selected from the training cohort. Readers A and B independently segmented the images, and the inter-class correlation coefficient was calculated to examine inter-reader consistency. One month later, reader A re-segmented the VOIs of these 50 patients, and the intra-class correlation coefficient was computed to evaluate intra-reader consistency. Radiomics features with inter-class and intra-class correlation coefficients both exceeding 0.75 were considered to have good reproducibility and were used in subsequent dimensionality reduction analyses.
Application of eXtreme Gradient Boosting (XGBoost) in prognostic modeling
The XGBoost is a relatively new ensemble learning algorithm that has been widely used in classification and regression tasks, but its application in prognostic analysis is relatively limited (33-35). By incorporating a Cox loss function into the XGBoost algorithm, we successfully developed a prognostic machine learning model suitable for survival data to predict PFS in LANPC patients. Detailed information of XGBoost is presented in Appendix 3. During the model training phase, we fine-tuned the hyperparameters using grid search to optimize model performance. In addition, the Harrell concordance index (C-index) was used as the primary evaluation metric to assess model performance across the training cohort and multiple validation cohorts. The optimal hyperparameter configuration was determined according to the comprehensive performance of the model in the training, internal validation, and external validation cohorts.
Construction of radiomics model
Rigorous radiomics feature selection was performed for each set of T2WI and CET1WI sequences. Initially, univariate Cox regression analysis was used to select radiomics features with statistical significance (P<0.05). The least absolute shrinkage and selection operator (LASSO) Cox stepwise regression algorithm was then applied to identify features highly correlated with PFS. The LASSO method aims to minimize the objective function, with larger values of λ leading to smaller feature coefficients. A 10-fold cross-validation strategy was employed to identify the most predictive features with non-zero coefficients. Furthermore, the XGBoost algorithm with a Cox loss function was utilized to construct a prognostic model based on the most predictive features derived from the T2WI, CET1WI sequences, and their combinations. The prediction probability of the dual-sequence XGBoost model was regarded as the radiomics score (radscore).
Development and validation of clinicopathological and combined models
For all the clinicopathological factors mentioned above, univariate Cox regression analysis was first performed, and the significant factors (P<0.05) from the univariate analysis were included in the multivariate Cox regression analysis. Independent prognostic factors were defined as those that were significant (P<0.05) in both univariate and multivariate Cox regression analyses. Subsequently, the XGBoost algorithm with a Cox loss function was used to construct both the combined XGBoost model and the clinicopathological model by integrating independent prognostic factors with or without radscore, respectively. The hyperparameters of the combined XGBoost model were optimized using grid search, and the optimized hyperparameter configuration is shown in Table S1.
Interpretability analysis of prognostic models
The SHAP approach aids clinicians enhance their understanding of the internal predictive process within the XGBoost model by quantifying the global and local average marginal contribution of each feature to all combinations of features (36). Specifically, the SHAP summary plot revealed the contribution and direction of key radiomics features in predicting PFS, while the SHAP waterfall plot elucidated the personalized contributions of radscore and key clinicopathological factors for different patients. Detailed information of SHAP is presented in Appendix 4. Image visualization technology enables pixel-level radiomics visualization of nasopharyngeal tumors. By selecting representative radiomics features from the SHAP summary plot, we perform pixel-level visualization within the maximum tumor layer. Visual feature heatmaps were drawn by normalizing and pseudo-colorizing the computed contribution of the feature value at each pixel point.
Statistical analysis
Categorical statistics are expressed as percentages (%), while quantitative statistics are represented as median (interquartile range) or mean ± standard deviation. The C-index, time-dependent receiver operating characteristic (ROC) curve, and calibration curve were utilized to assess the predictive performance of each prognostic model. Statistical differences in the C-index were tested using the compareC package. The optimal prognostic cutoff value in the training cohort was calculated using the minimum log-rank P value. Kaplan-Meier survival curves and log-rank tests were used to assess the prognostic differences between the different risk groups. The SHAP analysis was implemented through the SHAP package (https://github.com/slundberg/shap). A two-sided P value <0.05 was considered significant. All statistical analyses were performed using Python version 3.7.3 and R version 4.3.1.
Results
Characteristics of patients
The clinicopathological factors of the patients in the three cohorts are shown in Table 1. This study recruited 1,098 patients with pathologically and clinically diagnosed LANPC from three hospitals and assigned to three independent cohorts. Patients from hospitals I and II were randomly allocated to the training (n=700) and internal validation (n=300) cohorts at a ratio of 7:3, and patients from hospital III constituted the external validation cohort (n=98). In the training, internal, and external validation cohorts, the median follow-up times for patients were 56.0 (43.0, 69.0), 55.0 (43.0, 67.3), and 42.0 (37.0, 45.7) months, respectively.
Table 1
Factors | Training cohort (n=700) | Internal validation cohort (n=300) | P value | External validation cohort (n=98) |
---|---|---|---|---|
IC | 0.308 | |||
Response | 554 (79.1) | 228 (76.0) | 84 (85.7) | |
Non-response | 146 (20.9) | 72 (24.0) | 14 (14.3) | |
Sex | 0.506 | |||
Female | 162 (23.1) | 76 (25.3) | 21 (21.4) | |
Male | 538 (76.9) | 224 (74.7) | 77 (78.6) | |
History | 0.805 | |||
No | 642 (91.7) | 273 (91.0) | 82 (83.7) | |
Yes | 58 (8.3) | 27 (9.0) | 16 (16.3) | |
Smoke | 0.190 | |||
No | 553 (79.0) | 225 (75.0) | 59 (60.2) | |
Yes | 147 (21.0) | 75 (25.0) | 39 (39.8) | |
T stage | 0.692 | |||
T1 | 27 (3.9) | 8 (2.7) | 0 (0.0) | |
T2 | 168 (24.0) | 78 (26.0) | 13 (13.3) | |
T3 | 214 (30.6) | 95 (31.7) | 45 (45.9) | |
T4 | 291 (41.6) | 119 (39.7) | 40 (40.8) | |
N stage | 0.822 | |||
N0 | 9 (1.3) | 3 (1.0) | 2 (2.0) | |
N1 | 102 (14.6) | 46 (15.3) | 16 (16.3) | |
N2 | 386 (55.1) | 157 (52.3) | 39 (39.8) | |
N3 | 203 (29.0) | 94 (31.3) | 41 (41.8) | |
WHO type | >0.999 | |||
I–II | 31 (4.4) | 14 (4.7) | 8 (8.2) | |
III | 669 (95.6) | 286 (95.3) | 90 (91.8) | |
EBV-DNA | 0.793 | |||
Negative | 328 (46.9) | 144 (48.0) | 74 (75.5) | |
Positive | 372 (53.1) | 156 (52.0) | 24 (24.5) | |
Progression-free follow-up time (months) | 56.0 (43.0, 69.0) | 55.0 (43.0, 67.3) | 0.769 | 42.0 (37.0, 45.7) |
Age (years) | 47.0 (38.0, 54.0) | 46.0 (38.0, 54.0) | 0.971 | 47.0 (37.0, 56.0) |
BMI (kg/m2) | 22.2 (20.2, 24.6) | 22.1 (20.1, 24.5) | 0.866 | 22.4 (20.1, 24.6) |
WBC (×109/L) | 7.1 (5.9, 8.6) | 7.4 (5.9, 8.7) | 0.459 | 7.0 (5.8, 8.9) |
Hemoglobin (g/L) | 138.0 (126.0, 148.0) | 135.0 (126.0, 147.0) | 0.225 | 134.5 (122.0, 146.0) |
Platelet (×109/L) | 266.0 (227.0, 317.0) | 273.5 (227.5, 324.5) | 0.226 | 269.0 (226.2, 318.0) |
Neutrophil (×109/L) | 4.8 (3.7, 6.2) | 4.8 (3.6, 6.1) | 0.933 | 4.3 (3.7, 5.8) |
NLR (%) | 2.0 (1.5, 2.4) | 2.0 (1.6, 2.8) | 0.011 | 1.7 (1.2, 2.2) |
Albumin (g/L) | 41.1 (38.5, 44.0) | 41.1 (38.3, 44.1) | 0.784 | 41.1 (38.2, 43.5) |
Categorical factors were reported as number of patients (percentage); continuous factors were reported median (IQR). BMI, body mass index; EBV, Epstein-Barr virus; IC, induction chemotherapy; IQR, interquartile range; LANPC, locally advanced nasopharyngeal carcinoma; NLR, neutrophil lymphocyte ratio; WBC, white blood cell; WHO, World Health Organization.
Construction and validation of prognostic prediction model
After data harmonization and feature selection (Figure S1), the nine most predictive features were identified from the T2WI and ten from the CET1WI sequences (Tables S2-S4). The dual-sequence XGBoost model showed superior prognostic prediction than those of two single sequence models and corresponding Cox model (DeLong test, P<0.05; Table S5), with C-index values of 0.743 [95% confidence interval (CI): 0.700–0.786], 0.663 (95% CI: 0.586–0.740), and 0.657 (95% CI: 0.485–0.829) in the training, internal, and external validation cohorts, respectively. Three independent prognostic factors were identified using Cox regression: IC (P<0.01), EBV-DNA (P<0.05), and albumin (P<0.001), as detailed in Table S6. A clinicopathological XGBoost model was constructed with C-index values of 0.607 (95% CI: 0.560–0.654), 0.646 (95% CI: 0.569–0.723), and 0.636 (95% CI: 0.474–0.798) in three cohorts, respectively. Notably, the combined XGBoost model demonstrated superior predictive performance than both the clinicopathological XGBoost model (DeLong test, P<0.001) and the dual-sequence XGBoost model (DeLong test, P=0.196), with C-index values of 0.762 (95% CI: 0.720–0.804), 0.729 (95% CI: 0.662–0.796), and 0.752 (95% CI: 0.640–0.864) in three cohorts, respectively (Table 2). Time-dependent ROC and calibration curves are shown in Figure 3A,3B, respectively.
Table 2
Model | Training cohort | Internal validation cohort | External validation cohort | |||||
---|---|---|---|---|---|---|---|---|
C-index (95% CI) | P value | C-index (95% CI) | P value | C-index (95% CI) | P value | |||
Clinicopathological XGBoost | 0.607 (0.560–0.654) | <0.001 | 0.646 (0.569–0.723) | 0.072 | 0.636 (0.474–0.798) | 0.243 | ||
Dual-sequence XGBoost | 0.743 (0.700–0.786) | 0.196 | 0.663 (0.586–0.740) | 0.026 | 0.657 (0.485–0.829) | 0.083 | ||
Combined XGBoost | 0.762 (0.720–0.804) | Ref. | 0.729 (0.662–0.796) | Ref. | 0.752 (0.640–0.864) | Ref. |
C-index, concordance index; CI, confidence interval; ref., reference model; XGBoost, eXtreme Gradient Boosting.

Risk stratification analysis of prognostic model
The risk cutoff threshold determined by the minimum log-rank P value method was 1.134, and all LANPC patients were stratified into low- and high-risk groups according to the risk cutoff threshold. Kaplan-Meier curves showed prognostic significance in the training [hazard ratio (HR) =2.276; P<0.0001], internal (HR =1.645; P=0.0025), and external validation cohorts (HR =1.905; P=0.0057) (Figure 3C). The median survival time, 3-, and 5-year survival rates of LANPC patients in the low- and high-risk groups from the training, internal, and external validation cohorts are presented in Table 3. Furthermore, prognostic subgroup analyses of IC, EBV-DNA, and tumor-node-metastasis (TNM) staging demonstrated that the model performed stably across different subgroups (all log-rank P<0.05; Figure S2).
Table 3
Cohort | Risk group | Number | Median time (95% CI), years | P value | 3-year survival (95% CI) | 5-year survival (95% CI) |
---|---|---|---|---|---|---|
Training cohort | Low | 622 | 4.833 (4.167–5.812) | Ref. | 0.912 (0.890–0.934) | 0.878 (0.852–0.904) |
High | 78 | 2.458 (1.333–4.500) | <0.0001 | 0.462 (0.351–0.573) | 0.308 (0.206–0.410) | |
Internal validation cohort | Low | 272 | 4.583 (3.833–5.604) | Ref. | 0.890 (0.853–0.927) | 0.864 (0.823–0.905) |
High | 28 | 4.292 (1.604–5.688) | 0.0025 | 0.714 (0.547–0.881) | 0.679 (0.506–0.852) | |
External validation cohort | Low | 94 | 3.500 (3.104–3.812) | Ref. | 0.915 (0.859–0.971) | 0.894 (0.832–0.956) |
High | 4 | 3.083 (2.396–3.479) | 0.0057 | 0.500 (0.010–0.990) | 0.500 (0.010–0.990) |
CI, confidence interval; ref., reference cohort.
SHAP explanatory and feature visualization
As illustrated in Figure 4A, the SHAP summary plot showed the contribution of the top ten radiomics features to the output of the dual-sequence XGBoost model. Specifically, higher SHAP values were associated with poorer prognosis, while lower SHAP values indicated better prognosis. Among these features, the SHAP values of LargeAreaEmphasis [gray level size zone matrix (GLSZM), f1] and ZoneVariance (GLSZM, f2) exhibited a roughly positive relationship with their respective feature values. Therefore, f1 and f2 were identified as hazardous features for LANPC prognosis. In contrast, SmallAreaEmphasis (GLSZM, f3), ranked third in terms of contribution, displayed a roughly inverse relationship between its SHAP value and feature value, suggesting that f3 is a protective feature for LANPC prognosis. In the case analysis, the SHAP waterfall plot revealed that radscore was the most contributing predictor in the combined XGBoost model. Additionally, the SHAP dependence scatter plot illustrated the relationship between feature values and their corresponding SHAP values (Figure S3).

Two LANPC patients with similar clinicopathological stages but different risk levels were selected for pixel-level radiomics visualization analysis. Compared with patient 1, patient 2 exhibited a significantly enhanced signal intensity in the largest layer of tumor on T2WI sequence images. Furthermore, the tumor signals in both T2WI and CET1WI sequence images of patient 2 demonstrated more significant irregularity and uneven distribution, indicating higher heterogeneity within the tumor region (Figure 4B). In contrast, the signal distribution within the tumor region of patient 1 was relatively uniform.
Discussion
In this study, we established and validated an interpretable machine learning model integrating MRI radiomics information and important clinicopathological factors to predict PFS in LANPC patients. The application of the SHAP algorithm elucidated the inference procedure of the model and the contribution degree of each predictor. Additionally, image visualization technology clarified the underlying reasons for difference in prognosis among various cases by utilizing pixel-level visual information, thereby providing valuable insights for medical decision support.
Previous studies have shown that radiomics analysis based on MRI or computed tomography (CT) images exhibits high accuracy in predicting PFS in NPC patients (18-22,37). Notably, the research conducted by de Oliveira et al. (38), which employed texture analysis through multi-slice spiral CT, further substantiates the significant clinical utility of radiomics features in tumor characterization and treatment decision-making processes. These findings provide a robust theoretical foundation for the development of the prognostic model presented in this study. However, these studies were limited by small sample sizes and a lack of multicenter data validation, which restricted the generalization performance of the models. In contrast, this study enrolled 1,098 LANPC patients and included an independent external validation cohort. Moreover, this study employed the XGBoost algorithm with a Cox loss function as a tool for prognostic analysis of LANPC, successfully constructing and validating a prognostic model suitable for survival data. As a powerful machine learning algorithm, XGBoost with the Cox loss function is capable of capturing the complex relationships between variables and survival outcomes (33-35,39). Compared to the Cox regression analysis in this study, XGBoost demonstrated superior performance (C-index 0.743 vs. 0.682). Additionally, when compared to clinical-radiomics model developed by previous researchers in multicenter cohorts (40), the XGBoost model in this study showed better performance in the external validation cohort (C-index 0.752 vs. 0.717). This improvement is likely attributed to the careful design of image preprocessing, feature selection and modeling strategies in this study.
Some studies have used 2-year PFS as the survival endpoint to construct prognostic models and successfully predict patients’ PFS (22), but the follow-up period was relatively short. In contrast, this study analyzed the 3- and 5-year survival rates of LANPC patients, providing longer follow-up data. Furthermore, the Kaplan-Meier survival analysis accurately classified patients into low- and high-risk groups, with the model’s performance remaining stable across all subgroups (all log-rank P<0.05), further validating the reliability of the XGBoost prognostic model.
Three independent prognostic factors were identified through Cox regression analysis, including IC, EBV-DNA, and albumin, which were similar to the findings of previous studies (41-43). It was worth noting that T- and N-stage were not independent prognostic factors in this study. The possible reasons were as follows: First, all the patients enrolled in this study had LANPC, and their clinical stages were narrow and similar. Secondly, T1 staging and N0 staging accounted for only 3.8% and 1.2% respectively in the training cohort, and although clinical staging was effective, there was still a large bias. Considering the wide application of TNM staging system in clinical research, this study still constructed a TNM staging model.
Although most previous studies have used machine learning algorithms to construct prognostic models, some of them did not thoroughly analyze the importance of features, which made the models difficult to interpret (24-26). While some studies employed traditional feature importance ranking methods to quantify the importance of features, these methods were unable to reveal the positive or negative contributions of features to patient prognosis (15,34). In contrast, this study introduced the SHAP and image visualization techniques for interpretability analysis of the XGBoost model. The SHAP summary plot revealed the extent to which each radiomics feature contributed to the predictions of the dual-sequence XGBoost model. In this study, the LargeAreaEmphasis feature made the greatest contribution to the model’s prediction and was identified as a hazardous feature through univariate Cox regression analysis (HR =2.427; P=0.007). Previous studies have highlighted the role of the LargeArea-Emphasis feature in quantifying tumor heterogeneity and potentially characterizing the prognosis of local recurrence (44,45). In the case analysis, the SHAP waterfall plot revealed that radscore was the most important factor affecting the prognosis of LANPC patients, indicating that radiomics information based on MR images played an important role in the prognosis prediction of the combined XGBoost model. In addition, based on SHAP interpretability analysis, this study innovatively incorporated image visualization technology to achieve pixel-level radiomics visualization of tumor regions. Notably, the same radiomics features may exhibit different prognostic manifestations in patients with similar clinicopathological stages. For example, the visualization of the ZoneVariance feature showed that patients with poorer prognosis exhibited high signal intensity with uneven distribution, while patients with better prognosis displayed a uniform low signal distribution. This suggests that tumor heterogeneity may be one of the reasons for poor prognosis in LANPC patients. The SmallAreaEmphasis feature was identified as a prognostic protective feature (HR =0.462; P=0.031), which is consistent with previous studies (45). SHAP and image visualization technologies bridge the gap between machine learning and human understanding, revealing the underlying causes of prognostic differences among individuals, thereby enhancing clinician confidence in decision-making.
There are certain limitations in this study. First, manual segmentation of a large number of medical images inevitably introduces errors. Therefore, the feature adoption of semi-automatic or automatic segmentation methods across multicenter imaging data is crucial to enhancing the generalizability of models. Second, there is currently a lack of a unified approach to data standardization in the field of radiomics. Although this study employed multiple methods to reduce batch effects across different hospitals and scanners, barriers may still exist between machine learning models and radiomics data. Third, the clinical application of radiomics models lacks sufficient biological explanations. Hence, exploring the biological significance of radiomics features will be a key step in enabling radiomics to independently assist in clinical diagnosis and prognosis prediction. Lastly, as this study was a multicenter retrospective design, there may be heterogeneity in demographic and clinical baseline characteristics among the cohorts. Therefore, future research should conduct larger-scale, multicenter prospective cohort validations to further assess the applicability and stability of the model across different clinical scenarios and populations.
Conclusions
In conclusion, we developed and validated an interpretable machine learning model integrating MRI information and clinicopathologic factors that accurately predicted the PFS in LANPC patients. SHAP and image visualization technology provided quantitative contribution values and image-level radiomics information, which could enhance the confidence of clinicians in decision-making.
Acknowledgments
We are grateful for the support of MediAI Hub, an advanced medical image analysis software developed and maintained by MediaLab, a non-commercial research group. Special thanks to the R&D team of MediAI Hub for their continuous efforts and innovations. We acknowledge the use of icons from Alibaba IconFont in this work, which greatly enhanced the visual presentation of our research.
Footnote
Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1860/rc
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1860/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of the Guangxi Medical University Cancer Hospital (No. KY-2022-303), Wuzhou Red Cross Hospital (No. LL-2022-59), and The Second Affiliated Hospital of Guangxi Medical University (No. KY-2022-0788). Informed consent was waived due to the retrospective nature of the study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Hong S, Zhang Y, Yu G, Peng P, Peng J, Jia J, et al. Gemcitabine Plus Cisplatin Versus Fluorouracil Plus Cisplatin as First-Line Therapy for Recurrent or Metastatic Nasopharyngeal Carcinoma: Final Overall Survival Analysis of GEM20110714 Phase III Study. J Clin Oncol 2021;39:3273-82. [Crossref] [PubMed]
- Chua DT, Ma J, Sham JS, Mai HQ, Choy DT, Hong MH, Lu TX, Min HQ. Long-term survival after cisplatin-based induction chemotherapy and radiotherapy for nasopharyngeal carcinoma: a pooled data analysis of two phase III trials. J Clin Oncol 2005;23:1118-24. [Crossref] [PubMed]
- Ouyang FS, Guo BL, Zhang B, Dong YH, Zhang L, Mo XK, Huang WH, Zhang SX, Hu QG. Exploration and validation of radiomics signature as an independent prognostic biomarker in stage III-IVb nasopharyngeal carcinoma. Oncotarget 2017;8:74869-79. [Crossref] [PubMed]
- Chan ATC, Hui EP, Ngan RKC, Tung SY, Cheng ACK, Ng WT, Lee VHF, Ma BBY, Cheng HC, Wong FCS, Loong HHF, Tong M, Poon DMC, Ahuja AT, King AD, Wang K, Mo F, Zee BCY, Chan KCA, Lo YMD. Analysis of Plasma Epstein-Barr Virus DNA in Nasopharyngeal Cancer After Chemoradiation to Identify High-Risk Patients for Adjuvant Chemotherapy: A Randomized Controlled Trial. J Clin Oncol 2018; Epub ahead of print. [Crossref]
- Wang FH, Wei XL, Feng J, Li Q, Xu N, Hu XC, et al. Efficacy, Safety, and Correlative Biomarkers of Toripalimab in Previously Treated Recurrent or Metastatic Nasopharyngeal Carcinoma: A Phase II Clinical Trial (POLARIS-02). J Clin Oncol 2021;39:704-12. [Crossref] [PubMed]
- Lin JC, Jan JS, Hsu CY, Liang WM, Jiang RS, Wang WY. Phase III study of concurrent chemoradiotherapy versus radiotherapy alone for advanced nasopharyngeal carcinoma: positive effect on overall and progression-free survival. J Clin Oncol 2003;21:631-7. [Crossref] [PubMed]
- Kwong DL, Sham JS, Au GK, Chua DT, Kwong PW, Cheng AC, Wu PM, Law MW, Kwok CC, Yau CC, Wan KY, Chan RT, Choy DD. Concurrent and adjuvant chemotherapy for nasopharyngeal carcinoma: a factorial study. J Clin Oncol 2004;22:2643-53. [Crossref] [PubMed]
- Al-Sarraf M, LeBlanc M, Giri PG, Fu KK, Cooper J, Vuong T, Forastiere AA, Adams G, Sakr WA, Schuller DE, Ensley JF. Chemoradiotherapy versus radiotherapy in patients with advanced nasopharyngeal cancer: phase III randomized Intergroup study 0099. J Clin Oncol 1998;16:1310-7. [Crossref] [PubMed]
- Wee J, Tan EH, Tai BC, Wong HB, Leong SS, Tan T, Chua ET, Yang E, Lee KM, Fong KW, Tan HS, Lee KS, Loong S, Sethi V, Chua EJ, Machin D. Randomized trial of radiotherapy versus concurrent chemoradiotherapy followed by adjuvant chemotherapy in patients with American Joint Committee on Cancer/International Union against cancer stage III and IV nasopharyngeal cancer of the endemic variety. J Clin Oncol 2005;23:6730-8. [Crossref] [PubMed]
- Low WK, Toh ST, Wee J, Fook-Chong SM, Wang DY. Sensorineural hearing loss after radiotherapy and chemoradiotherapy: a single, blinded, randomized study. J Clin Oncol 2006;24:1904-9. [Crossref] [PubMed]
- León X, Hitt R, Constenla M, Rocca A, Stupp R, Kovács AF, Amellal N, Bessa EH, Bourhis J. A retrospective analysis of the outcome of patients with recurrent and/or metastatic squamous cell carcinoma of the head and neck refractory to a platinum-based chemotherapy. Clin Oncol (R Coll Radiol) 2005;17:418-24. [Crossref] [PubMed]
- King AD, Woo JKS, Ai QY, Chan JSM, Lam WKJ, Tse IOL, Bhatia KS, Zee BCY, Hui EP, Ma BBY, Chiu RWK, van Hasselt AC, Chan ATC, Lo YMD, Chan KCA. Complementary roles of MRI and endoscopic examination in the early detection of nasopharyngeal carcinoma. Ann Oncol 2019;30:977-82. [Crossref] [PubMed]
- Li S, Zhang W, Liang B, Huang W, Luo C, Zhu Y, Kou KI, Ruan G, Liu L, Zhang G, Li H. A Rulefit-based prognostic analysis using structured MRI report to select potential beneficiaries from induction chemotherapy in advanced nasopharyngeal carcinoma: A dual-centre study. Radiother Oncol 2023;189:109943. [Crossref] [PubMed]
- King AD, Ai QYH, Lam WKJ, Tse IOL, So TY, Wong LM, Tsang JYM, Leung HS, Zee BCY, Hui EP, Ma BBY, Vlantis AC, van Hasselt AC, Chan ATC, Woo JKS, Chan KCA. Early detection of nasopharyngeal carcinoma: performance of a short contrast-free screening magnetic resonance imaging. J Natl Cancer Inst 2024;116:665-72. [Crossref] [PubMed]
- Pei W, Wang C, Liao H, Chen X, Wei Y, Huang X, Liang X, Bao H, Su D, Jin G. MRI-based random survival Forest model improves prediction of progression-free survival to induction chemotherapy plus concurrent Chemoradiotherapy in Locoregionally Advanced nasopharyngeal carcinoma. BMC Cancer 2022;22:739. [Crossref] [PubMed]
- Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A, van Wijk Y, Woodruff H, van Soest J, Lustberg T, Roelofs E, van Elmpt W, Dekker A, Mottaghy FM, Wildberger JE, Walsh S. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017;14:749-62. [Crossref] [PubMed]
- Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [Crossref] [PubMed]
- Zhang B, Tian J, Dong D, Gu D, Dong Y, Zhang L, Lian Z, Liu J, Luo X, Pei S, Mo X, Huang W, Ouyang F, Guo B, Liang L, Chen W, Liang C, Zhang S. Radiomics Features of Multiparametric MRI as Novel Prognostic Factors in Advanced Nasopharyngeal Carcinoma. Clin Cancer Res 2017;23:4259-69. [Crossref] [PubMed]
- Bao D, Zhao Y, Li L, Lin M, Zhu Z, Yuan M, Zhong H, Xu H, Zhao X, Luo D. A MRI-based radiomics model predicting radiation-induced temporal lobe injury in nasopharyngeal carcinoma. Eur Radiol 2022;32:6910-21. [Crossref] [PubMed]
- Zeng F, Lin KR, Jin YB, Li HJ, Quan Q, Su JC, Chen K, Zhang J, Han C, Zhang GY. MRI-based radiomics models can improve prognosis prediction for nasopharyngeal carcinoma with neoadjuvant chemotherapy. Magn Reson Imaging 2022;88:108-15. [Crossref] [PubMed]
- Xu H, Lv W, Zhang H, Yuan Q, Wang Q, Wu Y, Lu L. Multimodality radiomics analysis based on [18F]FDG PET/CT imaging and multisequence MRI: application to nasopharyngeal carcinoma prognosis. Eur Radiol 2023;33:6677-88.
- Mao J, Fang J, Duan X, Yang Z, Cao M, Zhang F, Lu L, Zhang X, Wu X, Ding Y, Shen J. Predictive value of pretreatment MRI texture analysis in patients with primary nasopharyngeal carcinoma. Eur Radiol 2019;29:4105-13. [Crossref] [PubMed]
- Wei L, Osman S, Hatt M, El Naqa I. Machine learning for radiomics-based multimodality and multiparametric modeling. Q J Nucl Med Mol Imaging 2019;63:323-38. [Crossref] [PubMed]
- Hu Q, Wang G, Song X, Wan J, Li M, Zhang F, Chen Q, Cao X, Li S, Wang Y. Machine Learning Based on MRI DWI Radiomics Features for Prognostic Prediction in Nasopharyngeal Carcinoma. Cancers (Basel) 2022;14:3201. [Crossref] [PubMed]
- Gu B, Meng M, Xu M, Feng DD, Bi L, Kim J, Song S. Multi-task deep learning-based radiomic nomogram for prognostic prediction in locoregionally advanced nasopharyngeal carcinoma. Eur J Nucl Med Mol Imaging 2023;50:3996-4009. [Crossref] [PubMed]
- Zhong L, Dong D, Fang X, Zhang F, Zhang N, Zhang L, Fang M, Jiang W, Liang S, Li C, Liu Y, Zhao X, Cao R, Shan H, Hu Z, Ma J, Tang L, Tian J. A deep learning-based radiomic nomogram for prognosis and treatment decision in advanced nasopharyngeal carcinoma: A multicentre study. EBioMedicine 2021;70:103522. [Crossref] [PubMed]
- Rudin C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell 2019;1:206-15. [Crossref] [PubMed]
- Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D. A Pipeline for the Implementation and Visualization of Explainable Machine Learning for Medical Imaging Using Radiomics Features. Sensors (Basel) 2022;22:5205. [Crossref] [PubMed]
- Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. N4ITK: improved N3 bias correction. IEEE Trans Med Imaging 2010;29:1310-20. [Crossref] [PubMed]
- Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 2006;31:1116-28. [Crossref] [PubMed]
- van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts HJWL. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-7. [Crossref] [PubMed]
- Orlhac F, Lecler A, Savatovski J, Goya-Outi J, Nioche C, Charbonneau F, Ayache N, Frouin F, Duron L, Buvat I. How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur Radiol 2021;31:2272-80. [Crossref] [PubMed]
- Dong B, Zhang H, Duan Y, Yao S, Chen Y, Zhang C. Development of a machine learning-based model to predict prognosis of alpha-fetoprotein-positive hepatocellular carcinoma. J Transl Med 2024;22:455. [Crossref] [PubMed]
- Li C, Liu M, Zhang Y, Wang Y, Li J, Sun S, Liu X, Wu H, Feng C, Yao P, Jia Y, Zhang Y, Wei X, Wu F, Du C, Zhao X, Zhang S, Qu J. Novel models by machine learning to predict prognosis of breast cancer brain metastases. J Transl Med 2023;21:404. [Crossref] [PubMed]
- Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM. Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships. J Chem Inf Model 2016;56:2353-60. [Crossref] [PubMed]
- Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584. [Crossref] [PubMed]
- Yang H, Liang Z, Liang J, Cao D, Cao Q, Zhao F, Zhang W, Kou KI, Cui C, Liu L, Li H, Peng Z, Zhu S. A magnetic resonance imaging-based lymph node regression grading scheme for nasopharyngeal carcinoma after radiotherapy. Quant Imaging Med Surg 2024;14:5513-25. [Crossref] [PubMed]
- de Oliveira LAP, Lopes DLG, Gomes JPP, da Silveira RV, Nozaki DVA, Santos LF, Castellano G, de Castro Lopes SLP, Costa ALF. Enhanced Diagnostic Precision: Assessing Tumor Differentiation in Head and Neck Squamous Cell Carcinoma Using Multi-Slice Spiral CT Texture Analysis. J Clin Med 2024;13:4038. [Crossref] [PubMed]
- Clift AK, Dodwell D, Lord S, Petrou S, Brady M, Collins GS, Hippisley-Cox J. Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study. BMJ 2023;381:e073800. [Crossref] [PubMed]
- Zhang B, Luo C, Zhang X, Hou J, Liu S, Gao M, Zhang L, Jin Z, Chen Q, Yu X, Zhang S. Integrative Scoring System for Survival Prediction in Patients With Locally Advanced Nasopharyngeal Carcinoma: A Retrospective Multicenter Study. JCO Clin Cancer Inform 2023;7:e2200015. [Crossref] [PubMed]
- Wu S, Yuan X, Huang H, Li Y, Cui L, Lin D, Lu W, Feng H, Chen Z, Liu X, Tan J, Wang F. Nomogram incorporating Epstein-Barr virus DNA and a novel immune-nutritional marker for survival prediction in nasopharyngeal carcinoma. BMC Cancer 2023;23:1217. [Crossref] [PubMed]
- Peng H, Chen L, Zhang Y, Li WF, Mao YP, Liu X, Zhang F, Guo R, Liu LZ, Tian L, Lin AH, Sun Y, Ma J. The Tumour Response to Induction Chemotherapy has Prognostic Value for Long-Term Survival Outcomes after Intensity-Modulated Radiation Therapy in Nasopharyngeal Carcinoma. Sci Rep 2016;6:24835. [Crossref] [PubMed]
- Zhao R, Liang Z, Chen K, Zhu X. Nomogram Based on Hemoglobin, Albumin, Lymphocyte and Platelet Score to Predict Overall Survival in Patients with T3-4N0-1 Nasopharyngeal Carcinoma. J Inflamm Res 2023;16:1995-2006. [Crossref] [PubMed]
- Takeda K, Takanami K, Shirata Y, Yamamoto T, Takahashi N, Ito K, Takase K, Jingu K. Clinical utility of texture analysis of 18F-FDG PET/CT in patients with Stage I lung cancer treated with stereotactic body radiotherapy. J Radiat Res 2017;58:862-9. [Crossref] [PubMed]
- Orhan K, Driesen L, Shujaat S, Jacobs R, Chai X. Development and Validation of a Magnetic Resonance Imaging-Based Machine Learning Model for TMJ Pathologies. Biomed Res Int 2021;2021:6656773. [Crossref] [PubMed]