A three-classification machine learning model for non-invasive prediction of molecular subtypes in diffuse glioma: a two-center study
Introduction
Adult-type diffuse glioma is one of the most prevalent types of adult primary brain tumors with high heterogeneous and poor prognosis (1). According to the updated 2021 World Health Organization classification of tumors of the central nervous system 5th edition (WHO CNS 5), molecular characteristics constitute an essential part of the diagnostic process of gliomas and provide an integrated phenotypic and genotypic diagnosis (2). Distinguishing the three main molecular subtypes of diffuse gliomas—astrocytoma, isocitrate dehydrogenase (IDH)-mutant (IDHmut); oligodendroglioma, IDH-mutant and 1p/19q-codeleted (IDHmut-codel); and glioblastoma, IDH-wildtype (IDHwt)—is of paramount importance, as each subtype exhibits distinct biological behavior, prognosis, and response to therapy (3,4). For instance, IDH-mutant gliomas respond better to alkylating agents, whereas 1p/19q-codeleted oligodendrogliomas exhibit prolonged survival with combined chemoradiotherapy (5,6), which underscores the necessity of an accurate and precise molecular diagnosis for guiding treatment strategies and predicting outcomes. Using this new glioma genotype-based classification system not only enhances our understanding of the biological diversity of gliomas but also facilitates the development of more personalized medicine, in order to uncover novel therapeutic targets within these molecular subtypes (3,4,7,8). Despite these advances, current clinical workflows rely heavily on invasive tissue sampling by surgery or biopsy for molecular profiling, which carries the risk of postoperative complications and sampling bias due to tumor heterogeneity and is impractical for inoperable patients and delays treatment planning. Newly non-invasive and convenient biomarkers are needed to classify molecular subtypes to improve clinical treatment strategies and prognostic prediction.
Preoperative magnetic resonance imaging (MRI) evaluation plays a crucial role in the management of gliomas. In the recent decade, machine learning (ML) and deep learning (DL) methods have emerged as promising tools for decoding glioma heterogeneity. Many studies have shown that integrating ML methods with MRI and clinical data can improve the performance of non-invasive prediction of molecular markers of glioma (9-11). For example, in Wu et al.’s study, a hybrid model integrating age and tumor location features as well as DL features from MRI yielded improved performance [area under the receiver operating characteristic (ROC) curve (AUC) =0.878] for predicting IDH mutation status of gliomas (12). Lu et al. used an ML model based on multimodal MRI radiomics to classify subtypes of gliomas, and demonstrated that the combination of MRI phenotypes and histological information could improve the classification of molecular subtypes (13). However, previous studies have primarily focused on binary classification for simplex prediction of IDH mutation or 1p/19q co-deletion status, which, although useful, do not fully capture the heterogeneity of gliomas and have not covered the updated WHO CNS 5 molecular typing. In addition, limited data and external validation or explanation of models hinders clinical reliability (14). Specifically, the ability to differentiate among the three molecular subtypes (IDHmut, IDHmut-codel, and IDHwt) of adult-type diffuse glioma according to WHO CNS 5 criteria in a common clinical setting is still limited. The complexity of gliomas, particularly in their molecular diversity, necessitates a more nuanced approach. This limitation underscores the need for a more precise diagnostic tool that can accurately classify gliomas into these three distinct subtypes, which holds greater clinical value and practicality, thereby providing more precise prognostic information and guiding targeted therapy decisions.
Recently, a DL neural network based on the Transformer architecture has emerged as a powerful class, and it has achieved great success in gliomas segmentation and classification (12,15-18). The Vision Transformer (ViT) integrates the self-attention mechanism (the cornerstone of the Transformer model) with convolutional neural networks (CNNs), allowing the capture of global dependencies in image data, which is particularly useful in medical imaging where context is crucial for accurate diagnosis and analysis (19,20). In addition, the Swin Transformer, which introduces a hierarchical structure and shiftable window approach to the Transformer architecture, has been shown to be effective in capturing long-range spatial dependencies within images, making it valuable for the analysis of medical images where regional details can be significant (21). Recently, Xu et al. developed a multitask framework based on the ViT to predict molecular expressions (including IDH, MGMT, Ki67, and P53) of glioma using MRI, and they achieved high accuracy (AUCs =0.976–0.984), outperforming CNN-based models (17). Notably, a recent study showed that the Swin Transformer outperformed the CNN-based ResNet model in predicting IDH-mutation status of glioma (12). However, these studies did not directly address the WHO CNS 5 classification requiring integrated genotypic diagnosis (IDH and 1p/19q status) for diffuse glioma.
In light of this, in order to further improve the prediction accuracy of the three molecular subtypes in alignment with the WHO CNS 5 criteria of adult-type diffuse gliomas in clinical applications, our study aimed to develop a hybrid three-classification ML model by combining demographic variables, conventional MRI (CM) features, and radiomics and Swin Transformer-based DL (RSTD) features. We employed the SHapley Additive explanations (SHAP) (22) and gradient-weighted class activation mapping (Grad-CAM) (23) methods to interpret the ML model. This approach aligns with the calls for “explainable artificial intelligence (AI)” in neuro-oncology (24), in order to improve the interpretability and clinical trust of the prediction model. We present this article in accordance with the TRIPOD+AI reporting checklist (25,26) (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2461/rc).
Methods
Study design
This study was registered at the Chinese Clinical Trials Registry (ChiCTR2400082552). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Huashan Hospital (No. KY2024-013) and the requirement for individual consent for this retrospective analysis was waived. The First Affiliated Hospital of Anhui Medical University was also informed of and approved the study. Consecutive patients with pathological verified adult-type diffuse glioma at the Huashan Hospital (Center 1) between January 2021 and July 2023 were retrospectively collected, constituting the primary cohort. The external validation cohort came from The First Affiliated Hospital of Anhui Medical University (Center 2) from January 2018 to July 2023. Patients in Center 1 were randomly divided into training (n=180) and internal validation (n=78) sets in a ratio of 7:3. The detailed flow diagram of the patient selection process is presented in Figure 1.

MRI acquisition protocols
In Center 1, MRI examinations involved four 3.0-T magnetic resonance (MR) scanners (Discovery MR750 and MR750w, GE Healthcare, Chicago, IL, USA; Verio and Prisma, Siemens Healthcare, Erlangen, Germany). In Center 2, two 3.0-T MR scanners (Signa HDxt and Discovery MR750w, GE Healthcare) and one 1.5-T MR scanner (Ingenia, Philips Medical Systems, Amsterdam, Netherlands) were used. Details of the MRI protocol are presented in Tables S1,S2. Two MRI sequences including T2-fluid-attenuated inversion recovery (T2-FLAIR) and three-dimensional T1-weighted contrast-enhanced (3D T1C) were acquired and analyzed in this study (17,27).
Histopathological characteristics
Tumor grade was retrospectively classified based on pathological and molecular detection reports according to the 2021 WHO CNS 5 classification criteria (2). The final diagnosis was made by integrating histopathological and genetic detection results. IDH1/2 mutation was tested using Sanger sequencing for codon 132 of IDH1 and codons 140 and 172 of IDH2. The chromosome 1p/19q status was determined using fluorescence in situ hybridization (FISH).
Image preprocessing and tumor segmentation
To improve the reproducibility of radiomics analysis, the N4 bias field correction and z normalization of all MR images were conducted by using PyRadiomics (28). The T2-FLAIR images were registered to T1C images, with linear interpolation being used. A radiologist (M.Z., with 5 years of experience in radiology) manually segmented tumors of all cases on the axial 3D T1C images by using the open-source ITK-SNAP software (version 3.8.0) (http://www.itksnap.org). For each case, the entire tumor region, comprising contrast-enhancing, non-enhancing, and necrotic areas, was manually delineated on each slice [T1-weighted (T1W), T2-weighted (T2W), FLAIR, diffusion-weighted imaging (DWI), and apparent diffusion coefficient (ADC) images were referenced], and a volume of interest (VOI) was then generated. Subsequently, another senior radiologist (HF, with 14 years of experience in radiology) individually re-checked and validated the VOIs to improve the reproducibility and accuracy. Both radiologists were blinded to the clinical, molecular, and pathological data of patients. The modeling workflow is shown in Figure 2.

CM features assessment and selection
The Visually Accessible Rembrandt Images (VASARI) features [defined on clinical MRI including T1W image (T1WI), T2W image (T1WI), T2-FLAIR, 3D T1C, and DWI sequences] (29), which have been demonstrated as meaningful and reproducible non-invasive MRI biomarkers of gliomas molecular profile and survival (30), were assessed by the two radiologists (M.Z. and F.H.). Disagreements were resolved by a third senior neuroradiologist (J.Z.; with 30 years of experience in radiology). The details of the VASARI features are listed in Table S3. All three radiologists were blind to the clinical, molecular, and pathological data of patients. The variance analysis and least absolute shrinkage and selection operator (LASSO) regression analysis were used to select VASARI features associated with molecular subtypes. The threshold of P<0.05 was used for variance analysis. The 10-fold cross-validation was used to find the optimal regularization parameter for LASSO.
Radiomic features extraction and selection
The open source “PyRadiomics” package (28) was used for extraction of radiomics features from T2-FLAIR and 3D T1C images. Grayscale discretization employed a fixed bin width of 25 to ensure consistency and reproducibility of radiomic features, and images were resampled to a uniform voxel size of 1×1×1 mm3. Radiomics features were extracted from the VOI for each patient, including shape, first order, texture, gray-level co-occurrence matrix, gray-level run-length matrix, gray-level size zone matrix, gray-level dependence matrix, and neighboring gray tone difference matrix. A total of 386 radiomic features were extracted for each patient. Volume and shape features were extracted only once from the original image (T1C). The z-score method was used to normalize the features from the training set, and the same normalization parameters were applied to the internal and external validation sets.
Feature selection was performed in the training cohort. The variance analysis and LASSO regression with 10-fold cross-validation were employed for radiomic features selection.
Swin Transformer network development and features extraction
The T1C and T2-FLAIR sequences were separately processed as single-channel using the Swin Transformer architecture, and the model was trained from scratch on the training set. The Swin Transformer architecture uses a hierarchical design containing four stages (21), summarized in Figure 3. Firstly, the input region of interest (ROI) is converted into slices and then batch inputted (matrix 224×224), through a Patch Partition layer to be subdivided into non-overlapping 4×4 patches; the partitioned patches are processed by a linear embedding layer to project their feature dimension to C, followed by Swin Transformer blocks. Stage 2 conducted a downsampling via a patch merging layer, combining adjacent 2×2 patches into one. As the network deepens, hierarchical representations akin to those in CNNs were extracted by the Swin Transformer block. This four-stage process was used to generate the final representation. A global average pooling layer was applied to the output feature map in the last stage, then a linear classifier output the prediction features.

For our Swin Transformer model development, the epoch was 200 (when training and testing performance plateaued and with sufficient training iterations) and the batch size was 32. The variance analysis and LASSO regression were also used for selecting Swin Transformer DL features which associated with molecular subtypes of diffuse glioma, and 10-fold cross-validation was used to find the optimal regularization parameter for LASSO.
ML implementation and model construction
To select the best performance of the model for our three-classification task, we used six ML classifiers, including k-nearest neighbor (kNN), light gradient-boosting machine (LightGBM), random forest (RF), support vector machine (SVM), stochastic gradient descent (SGD), and extreme gradient boosting (XGBoost). They cover a wide range of methodological properties including integrated learning (XGBoost and LightGBM), tree-based models (RF), linear models (SGD), kernel methods (SVM), and distance-based models (kNN), and have been widely adopted in the fields of medical image analysis of glioma (31). ML models were implemented using Python 3.11 (https://www.python.org). Further, 10-fold cross-validation was used for model training to avoid overfitting. For each ML classifier, hyperparameters were tuned using cross-validation diagnostic performance as the primary metric; the classifier with the highest mean accuracy was selected for final model construction. The internal/external validation sets were not used for parameter tuning or model training; the internal validation set was solely reserved for final model evaluation, whereas the external validation set was used to test model generalizability. The hyperparameter optimization of ML algorithms is presented in Table S4. The accuracy, precision, recall, and F1-score were calculated to evaluate the classification performance of the models.
Three types of prediction models: CM, RSTD, and combined model (combining CM features, RSTD features, as well as age and gender), were respectively trained using the above six ML classifiers. Then, the best-performing ML classifier was chosen as our algorithm for final model construction and validated in the external validation set. To explain the ML model, the SHAP method was used to interpret the feature contribution (22). In addition, the Grad-CAM (23) was generated to visualize and interpret the Swin Transformer model.
Statistical analysis
Statistical analyses were performed using Python 3.11 (https://www.python.org) or RStudio software (R-4.2.3; R Foundation for Statistical Computing, Vienna, Austria). One-way analysis of variance (ANOVA) or Kruskal-Wallis tests were used to compare continuous variables in the training, internal, and external validation datasets using SPSS 26.0 (IBM Corp., Armonk, NY, USA), whereas the Chi-squared test was used to compare categorical variables. All statistical analyses were two-sided, with statistical significance set at P<0.05. The overall performance of the three-classification model was evaluated by ROC curve analysis and the calculation of micro- and macro-AUC (32). Moreover, accuracy, sensitivity, and specificity were calculated to measure the classification performance of models. The “roc.test” function of the “pROC” package was used to perform the DeLong test to compare ROC curves and calculate the corresponding P values.
Results
Patient demographics
A total of 306 patients, 258 from Center 1 and 48 from Center 2, were finally included (Figure 1). The baseline characteristics among the cohorts are shown in Table 1. The whole cohort included 116 females and 190 males, with a mean age of 50.6 years (range, 23–82 years); among them, there were 78 (25.5%) patients with IDHmut, 45 (14.7%) patients with IDHmut-codel, and 183 (59.8%) patients with IDHwt. There were no significant variations in patients baseline characteristics among the training, internal validation, and external validation cohorts (all P>0.05).
Table 1
Characteristics | Whole cohort (n=306) | Training cohort (n=180) | Internal validation cohort (n=78) | External validation cohort (n=48) | P value† |
---|---|---|---|---|---|
Age (years) | 51±14 | 50±15 | 50±14 | 54±13 | 0.143 |
Gender (female) | 116 (37.9) | 65 (36.1) | 35 (44.9) | 16 (33.3) | 0.322 |
Molecular subtype | 0.765 | ||||
IDHmut | 78 (25.5) | 43 (23.9) | 23 (29.5) | 12 (25) | |
IDHmut-codel | 45 (14.7) | 28 (15.6) | 10 (12.8) | 7 (14.6) | |
IDHwt | 183 (59.8) | 109 (60.6) | 45 (57.7) | 29 (60.4) | |
2021 WHO grade | 0.748 | ||||
Grade 2 | 68 (22.2) | 36 (20) | 19 (24.4) | 12 (25) | |
Grade 3 | 51 (16.7) | 33 (18.3) | 14 (17.9) | 4 (8.3) | |
Grade 4 | 187 (61.1) | 111 (61.7) | 45 (57.7) | 32 (66.7) | |
Tumor location | 0.513 | ||||
Frontal | 138 (45.1) | 77 (42.8) | 40 (51.3) | 21 (43.8) | |
Temporal | 63 (20.6) | 40 (22.2) | 11 (14.1) | 12 (25) | |
Insular | 27 (8.8) | 15 (8.3) | 8 (10.3) | 4 (8.3) | |
Parietal | 34 (11.1) | 19 (10.6) | 9 (11.5) | 6 (12.5) | |
Occipital | 15 (4.9) | 6 (3.3) | 5 (6.4) | 4 (8.3) | |
Others (thalamus/corpus callosum/brainstem/cerebellum/basal ganglia) | 29 (9.5) | 23 (12.8) | 5 (6.4) | 1 (2.1) |
†, comparisons among the training, internal, and external validation cohorts, the statistic calculated from Chi-squared test or Fisher’ test for categorical variables, and one-way ANOVA or Kruskal-Wallis test for continuous variables according to normality. Data are presented as mean ± SD or n (%). ANOVA, analysis of variance; IDH, isocitrate dehydrogenase; IDHmut, astrocytoma, IDH-mutant; IDHmut-codel, oligodendroglioma, IDH-mutant and 1p/19q-codeleted; IDHwt, glioblastoma, IDH-wildtype; SD, standard deviation; WHO, World Health Organization.
Feature selection
For each patient, a total of 20 CM features were assessed by the two radiologists, and a total of 386 radiomics and 1,024 Swin Transformer DL features were extracted from the T1C and T2-FLAIR images. After the variance analysis and 10-fold cross-validation LASSO analysis, six CM features were selected for constructing the CM model, and 28 RSTD features (including 19 radiomics features and 9 Swin Transformer DL features) were selected for the RSTD model, the details of which are shown in Figure S1 and Table S5. These selected features were then used to develop a model using different ML algorithms. The combined model was constructed using all of the 36 selected features (6 CM features, 28 RSTD features, as well as age and gender).
Model performance
The model performance for training and internal validation are presented in Table 2. For the three-classification task of molecular subtypes of gliomas in the internal validation set, the CM model reached an average accuracy of 0.792, 0.729, 0.708, 0.705, 0.750, and 0.771 for kNN, LightGBM, RF, SVM, SGD, and XGBoost, respectively; the RSTD model reached an average accuracy of 0.750, 0.791, 0.769, 0.771, 0.756, and 0.813 for kNN, LightGBM, RF, SVM, SGD, and XGBoost, respectively; the combined model reached an average accuracy of 0.781, 0.822, 0.809, 0.787, 0.796, and 0.853 for kNN, LightGBM, RF, SVM, SGD, and XGBoost, respectively. It was shown that the XGBoost had improvements in most measurement metrics compared to the other ML classifiers. The Wilcoxon tests demonstrated these significant improvements. XGBoost can effectively integrate heterogeneous features by optimizing complex, non-linear relationships through gradient-boosted trees. It inherently supports class-weighted loss functions and integrates synthetic minority over-sampling technique (SMOTE)-augmented samples, ensuring balanced gradient updates across minority subtypes.
Table 2
Dataset | ML model | CM model | RSTD model | Combined model | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | ||||
Train | kNN | 0.817 | 0.731 | 0.716 | 0.717 | 0.844 | 0.797 | 0.785 | 0.790 | 0.844 | 0.790 | 0.793 | 0.790 | ||
LightGBM | 0.744 | 0.618 | 0.588 | 0.572 | 0.883 | 0.882 | 0.877 | 0.879 | 0.933 | 0.942 | 0.901 | 0.918 | |||
RF | 0.716 | 0.443 | 0.537 | 0.482 | 0.867 | 0.859 | 0.850 | 0.854 | 0.939 | 0.935 | 0.922 | 0.928 | |||
SVM | 0.750 | 0.471 | 0.578 | 0.513 | 0.844 | 0.797 | 0.785 | 0.790 | 0.933 | 0.919 | 0.888 | 0.902 | |||
SGD | 0.717 | 0.446 | 0.537 | 0.484 | 0.822 | 0.755 | 0.755 | 0.755 | 0.839 | 0.787 | 0.764 | 0.774 | |||
XGBoost | 0.811 | 0.731 | 0.691 | 0.706 | 0.894 | 0.877 | 0.866 | 0.871 | 0.944 | 0.956 | 0.907 | 0.928 | |||
Validation | kNN | 0.792 | 0.679 | 0.675 | 0.675 | 0.750 | 0.630 | 0.584 | 0.598 | 0.781 | 0.691 | 0.635 | 0.664 | ||
LightGBM | 0.729 | 0.544 | 0.552 | 0.544 | 0.791 | 0.689 | 0.692 | 0.684 | 0.822 | 0.758 | 0.754 | 0.755 | |||
RF | 0.708 | 0.427 | 0.505 | 0.462 | 0.769 | 0.630 | 0.624 | 0.620 | 0.809 | 0.730 | 0.674 | 0.690 | |||
SVM | 0.705 | 0.458 | 0.535 | 0,487 | 0.771 | 0.817 | 0.592 | 0.632 | 0.787 | 0.817 | 0.602 | 0.632 | |||
SGD | 0.750 | 0.467 | 0.560 | 0.505 | 0.756 | 0.667 | 0.628 | 0.639 | 0.796 | 0.702 | 0.657 | 0.668 | |||
XGBoost | 0.771 | 0.486 | 0.588 | 0.525 | 0.813 | 0.800 | 0.718 | 0.751 | 0.853 | 0.820 | 0.728 | 0.751 |
All ML algorithms have used the optimal hyperparameters. CM, conventional MRI; kNN, k-nearest neighbor; LightGBM, light gradient-boosting machine; ML, machine learning; MRI, magnetic resonance imaging; RF, random forest; RSTD, radiomics and Swin Transformer-based deep learning; SGD, stochastic gradient descent; SVM, support vector machine; XGBoost, extreme gradient boosting.
Validation results
Among the six ML classifiers, we chose XGBoost as our algorithm for model construction due to its superior performance in the training and internal validation cohorts. The ROC curves of the three model types using XGBoost are presented in Figure 4. In the internal validation set, the micro-AUCs of the CM model, RSTD model, and combined model were 0.822, 0.874, and 0.905, respectively; and the macro-AUCs were 0.788, 0.846, and 0.878, respectively. In the external validation set, the micro-AUCs of the CM model, RSTD model, and combined model were 0.733, 0.844, and 0.911, respectively, and the macro-AUCs were 0.689, 0.818, and 0.891, respectively. Additionally, the AUCs, accuracies, sensitivities, and specificities of the three models in predicting IDHmut, IDHmut-codel, and IDHwt were evaluated, respectively (Table 3).

Table 3
Models | Molecular subtypes | Training set (n=180) | Internal validation set (n=78) | External validation set (n=48) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUC | ACC | SEN | SPE | AUC | ACC | SEN | SPE | AUC | ACC | SEN | SPE | ||||
CM model | IDHmut | 0.792 | 0.769 | 0.832 | 0.750 | 0.674 | 0.659 | 0.783 | 0.618 | 0.580 | 0.636 | 0.917 | 0.444 | ||
IDHmut-codel | 0.851 | 0.799 | 0.864 | 0.789 | 0.763 | 0.814 | 1.000 | 0.603 | 0.711 | 0.714 | 0.857 | 0.634 | |||
IDHwt | 0.916 | 0.887 | 0.930 | 0.834 | 0.920 | 0.868 | 0.867 | 0.879 | 0.765 | 0.736 | 0.690 | 0.842 | |||
RSTD model | IDHmut | 0.928 | 0.945 | 0.921 | 0.801 | 0.817 | 0.833 | 0.652 | 0.927 | 0.667 | 0.727 | 0.667 | 0.806 | ||
IDHmut-codel | 0.940 | 0.905 | 0.830 | 0.895 | 0.831 | 0.833 | 1.000 | 0.574 | 0.850 | 0.718 | 1.000 | 0.585 | |||
IDHwt | 0.962 | 0.932 | 0.925 | 0.944 | 0.863 | 0.831 | 0.844 | 0.818 | 0.866 | 0.833 | 0.931 | 0.842 | |||
Combined model | IDHmut | 0.990 | 0.940 | 0.932 | 0.949 | 0.866 | 0.833 | 0.739 | 0.891 | 0.826 | 0.818 | 0.667 | 0.917 | ||
IDHmut-codel | 0.992 | 0.979 | 0.990 | 0.967 | 0.807 | 0.778 | 0.800 | 0.706 | 0.864 | 0.833 | 0.857 | 0.829 | |||
IDHwt | 0.988 | 0.955 | 0.940 | 0.970 | 0.934 | 0.864 | 0.822 | 0.909 | 0.940 | 0.896 | 0.897 | 0.895 |
ACC, accuracy; AUC, area under the ROC curve; CM, conventional MRI; IDH, isocitrate dehydrogenase; IDHmut, astrocytoma, IDH-mutant; IDHmut-codel, oligodendroglioma, IDH-mutant and 1p/19q-codeleted; IDHwt, glioblastoma, IDH-wildtype; MRI, magnetic resonance imaging; ROC, receiver operating characteristic; RSTD, radiomics and Swin Transformer-based deep learning; SEN, sensitivity; SPE, specificity.
The comparative results between the combined model and the CM or RSTD model are presented in Table 4. In the training and validation sets, the combined model showed higher AUCs than that of the CM model in most classification tasks (P<0.05). In addition, in the internal validation set, the combined model showed a higher AUC than that of the RSTD model for predicting glioblastoma (P=0.037); in the external validation set, the combined model showed a higher AUC than that of the RSTD model for predicting astrocytoma (P=0.028).
Table 4
Model comparison | Molecular subtypes | P value† | ||
---|---|---|---|---|
Training set | Internal validation set | External validation set | ||
Combined model vs. CM model | IDHmut | 0.001 | 0.003 | 0.042 |
IDHmut-codel | <0.001 | 0.442 | 0.158 | |
IDHwt | 0.023 | 0.696 | 0.018 | |
Combined model vs. RSTD model | IDHmut | 0.048 | 0.176 | 0.028 |
IDHmut-codel | 0.137 | 0.573 | 0.962 | |
IDHwt | 0.165 | 0.037 | 0.150 |
†, calculated by DeLong test. CM, conventional MRI; IDH, isocitrate dehydrogenase; IDHmut, astrocytoma, IDH-mutant; IDHmut-codel, oligodendroglioma, IDH-mutant and 1p/19q-codeleted; IDHwt, glioblastoma, IDH-wildtype; MRI, magnetic resonance imaging; RSTD, radiomics and Swin Transformer-based deep learning.
Model explanation
The SHAP approach was used to interpret the contribution of each feature to the prediction models using XGBoost. The summary plot of SHAP values of the top 20 features for the combined model is shown in Figure 5. The average SHAP values were used to assess each feature’s contribution to the model, which was exhibited in descending order. To visualize and interpret the significance of different regions in the tumor classification process of the model, Figure 6 presents the Grad-CAM of T1C images of representative patients from internal and external validation sets, respectively. The Grad-CAM revealed distinct attention patterns for each molecular subtype, highlighted in red regions which contributed most to the classification task. Specifically, for IDHwt, the model prioritized contrast-enhancing tumor substantial compositions (red), which are histologically associated with microvascular proliferation and cell proliferation. In contrast, IDHmut and IDHmut-codel exhibited high attention (red) in non-enhancing hypointense tumor substantial regions. It was shown that the morphological characteristics of tumors are also focused by the model. These readable results by Grad-CAM of T1C images revealed that the classification of the model concerned tumor morphology and enhancement pattern, which are in accordance with radiologists’ visual evaluation and biological behavior of gliomas.


The Pearson correlation analysis showed that the correlation between RSTD features was lower than 0.5 (P<0.05, Figure S2A). The heatmap showed the distribution of the selected RSTD features within the three molecular subtype groups in Center 1 and Center 2, respectively (Figure S2B,S2C).
Discussion
To the best of our knowledge, this is the first study to integrate CM features, RSTD features, and demographic characteristics into a three-class ML model for noninvasively predicting the adult-type diffuse glioma molecular subtypes under the 2021 WHO CNS 5 criteria. This directly addresses a critical diagnostic need in neuro-oncology and simplifies the clinical workflows. Our combined model synergized demographic data, morphological features, and radiomic local details with the global information of Swin Transformer model, achieving robust diagnostic performance in validation sets (AUCs =0.878–0.911). This strategy could improve the traditional diagnostic pattern of radiologists which is based on subjective visual assessment, and reduce diagnostic uncertainty. These findings also confirmed the hypothesis that the hybrid method, by combining ML and radiomics DL, can obtain high performance (33). Importantly, for nonsurgical patients (e.g., frail patients or tumors in unresectable areas), the model can provide actionable molecular insights to guide targeted therapies. By aligning with WHO CNS 5’s genotype-driven framework, our approach is expected to optimize treatment personalization and prognostication, demonstrating tangible clinical utility in precision neuro-oncology.
Accurate preoperative molecular classification for glioma is crucial for making patient treatment strategies and assessing prognosis. Previous studies in this field have predominantly been focused on binary classification tasks, such as identification of IDH mutation or 1p/19q codeletion status (34,35). However, it is difficult to identify the subtypes using a single molecular status due to the heterogeneity of glioma. Our study introduced a three-classification ML model that aligns with the more nuanced WHO CNS 5 classification system for adult-type diffuse gliomas. The capability of this model to conduct tasks based on commonly used clinical MRI sequences streamlines workflow and diminishes dependency on additional procedures, making the model readily applicable for a variety of clinical settings where MRI is routinely performed. Our model provides a more precise preoperative non-invasive molecular marker of adult-type diffuse gliomas, which can guide neurosurgeons in planning the extent of resection and the administration of targeted therapies. This granular level of molecular classification is crucial for personalized medicine, as each subtype may respond differently to various treatments and has distinct prognostic implications (36).
ML is a powerful computational approach that excels at managing complex and voluminous data, as it can process datasets with substantial variability and discern intricate connections among variables with adaptability and thorough training. To date, numerous studies have concentrated on the molecular prediction of gliomas (31,37,38). However, the results of previous studies have lacked generalizability and interpretability (31), which may have limited their use by clinicians. Another advantage of our study is that we applied the SHAP method to explain the “black-box” nature of the ML model. The model is elucidated through a global model explanation that describes its overall functionality, as well as a local explanation that specifies how individual prediction is generated for the individual patient by inputting personalized data. From the SHAP summary dot plot in our results, the features such as tumor enhancement quality, enhancing margin thickness, age, and some RSTD features contributed high impact for the specific classification task of the model. In addition, the Grad-CAM on T1C indicates that the non-enhancing tumor parenchyma in astrocytoma and oligodendroglioma, as well as the enhancing tumor parenchyma in glioblastoma, are considered the most significant (red) in the classification task. These areas received more attention from the model than the necrosis components and adjacent extruded normal brain tissue. Our findings supported that non-enhancing tumors were activated when the tumor had no enhancement and can be recognized by a DL model (39). Through these interpretations, we demonstrated that our ML model attentions are consistent with tumor morphology and enhancement patterns; these characteristics accord with the biological behavior of gliomas, and are well recognized by radiologists. This explainable, straightforward, and convenient ML-based prediction tool could improve clinical trust, and is promising for use to support clinical decision-making for glioma classification.
Recently, Pei et al. constructed a three-class RF model by integrating MRI-based radiomic features and perfusion parameters, and successfully discriminated the three molecular subtypes (IDHmut, IDHwt, and IDHmut-codel), with AUCs of 0.778–0.861 (40). Their study provided an important reference for the three-classification problem of gliomas; however, it was limited by single-center data, lacking external validation and model explanation. In addition, Cheng et al. used a hybrid CNN-ViT model for both glioma segmentation and IDH status prediction, and demonstrated promising performance for IDH-mutation prediction (AUC =90.37–91.04%), outperforming single-task learning counterparts and existing state-of-the-art methods (15). Their study demonstrated the successful application of the Transformer-based hybrid model in IDH prediction of gliomas. Although lacking the focus on both IDH and 1p/19q gene status concurrently, a commonality with our study, is that both harnessed the power of a hybrid model to address molecular prediction of gliomas, aiming to use non-invasive methods to identify molecular subtypes of glioma, ultimately to aid clinical decision making.
A recent study that utilized Swin Transformer and ResNet networks to extract DL features from single T2WI sequence demonstrated that the Swin Transformer model outperformed the ResNet model in predicting IDH mutation in gliomas (12). In contrast, we extracted the Swin Transformer-based DL features from T1C and T2-FLAIR sequences, which also demonstrated good performance. T2WI is sensitive in depicting the extent of brain edema and differentiating between tumor and peritumoral regions (41). However, T1C imaging offers superior contrast between different tissue types, including the blood-brain barrier, which is often compromised in gliomas, leading to enhancement patterns that can be indicative of specific molecular subtypes (42). The combination of the T1C and T2-FLAIR sequences in our study allows for a more nuanced understanding of the tumor’s structural and vascular characteristics, and is sensitive for detecting edema and infiltration of tumor cells, which are integral for the accurate classification of glioma subtypes. Moreover, our study incorporated the complex image features with detailed spatial hierarchies from the Swin Transformer and high-dimensional quantitative features from radiomics; this fusion of the two complementary methodologies results in a more comprehensive representation of the tumor’s characteristics and can capture the intricate details of the tumor microenvironment, which enhances the model’s predictive accuracy, as evidenced by the improved AUC values in both internal and external validation sets.
Limitations
Firstly, the retrospective nature of the study and the relatively limited sample size may restrict the generalizability of our findings to diverse populations. Second, the reliance on manual ROI annotation may introduce inter-reader variability. Third, although we performed the standard pre-processing of the images and features standardization, the protocol heterogeneity across institutions (e.g., slice thickness, contrast timing) remains a barrier to clinical adoption, as evidenced by reduced validation performance. Additionally, although the results of the correlation analysis in our study demonstrated the weak intercorrelations between the selected features, and the internal and external validations exhibited comparable performance, overfitting remains a significant challenge that may occur during model development. Moreover, in-house software support is needed to simplify the implementation process of the model for clinical practice. Thus, future prospective studies incorporating multicenter and histological data will be necessary to validate and optimize the model in real-world clinical workflows, ensuring robustness across diverse settings. In addition, domain adaptation techniques, such as adversarial training or style transfer, will be helpful to mitigate cross-site distribution shifts.
Conclusions
Our study presented a three-classification ML model that integrated CM features and RSTD features as well as demographic characteristics for the molecular prediction of adult-type diffuse gliomas, and achieved promising performance. This successful application of an ML algorithm in conjunction with the radiomics and DL features highlights the potential for further exploration and development of hybrid models in preoperative noninvasive prediction for molecular subtypes of glioma. Such a model could reduce the reliance on invasive biopsy procedures by providing a non-invasive means for obtaining critical molecular diagnostic information, especially for patients who cannot be treated surgically. Before it can be a practical tool for supporting clinical decision making, further validation in larger, multicentric studies is necessary to establish their effectiveness and reliability across diverse patient populations.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2461/rc
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2461/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Huashan Hospital (No. KY2024-013) and the requirement for individual consent for this retrospective analysis was waived. The First Affiliated Hospital of Anhui Medical University was also informed of and agreed to the study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Ostrom QT, Cioffi G, Waite K, Kruchko C, Barnholtz-Sloan JS. CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2014-2018. Neuro Oncol 2021;23:iii1-iii105. [Crossref] [PubMed]
- Louis DN, Perry A, Wesseling P, Brat DJ, Cree IA, Figarella-Branger D, Hawkins C, Ng HK, Pfister SM, Reifenberger G, Soffietti R, von Deimling A, Ellison DW. The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro Oncol 2021;23:1231-51. [Crossref] [PubMed]
- Rong L, Li N, Zhang Z. Emerging therapies for glioblastoma: current state and future directions. J Exp Clin Cancer Res 2022;41:142. [Crossref] [PubMed]
- van den Bent MJ, Smits M, Kros JM, Chang SM. Diffuse Infiltrating Oligodendroglioma and Astrocytoma. J Clin Oncol 2017;35:2394-401. [Crossref] [PubMed]
- Cairncross G, Wang M, Shaw E, Jenkins R, Brachman D, Buckner J, Fink K, Souhami L, Laperriere N, Curran W, Mehta M. Phase III trial of chemoradiotherapy for anaplastic oligodendroglioma: long-term results of RTOG 9402. J Clin Oncol 2013;31:337-43. [Crossref] [PubMed]
- Yan H, Parsons DW, Jin G, McLendon R, Rasheed BA, Yuan W, Kos I, Batinic-Haberle I, Jones S, Riggins GJ, Friedman H, Friedman A, Reardon D, Herndon J, Kinzler KW, Velculescu VE, Vogelstein B, Bigner DD. IDH1 and IDH2 mutations in gliomas. N Engl J Med 2009;360:765-73. [Crossref] [PubMed]
- Sicklick JK, Kato S, Okamura R, Schwaederle M, Hahn ME, Williams CB, De P, Krie A, Piccioni DE, Miller VA, Ross JS, Benson A, Webster J, Stephens PJ, Lee JJ, Fanta PT, Lippman SM, Leyland-Jones B, Kurzrock R. Molecular profiling of cancer patients enables personalized combination therapy: the I-PREDICT study. Nat Med 2019;25:744-50. [Crossref] [PubMed]
- Lasocki A, Roberts-Thomson SJ, Gaillard F. Radiogenomics of adult intracranial gliomas after the 2021 World Health Organisation classification: a review of changes, challenges and opportunities. Quant Imaging Med Surg 2023;13:7572-81. [Crossref] [PubMed]
- Jian A, Jang K, Manuguerra M, Liu S, Magnussen J, Di Ieva A. Machine Learning for the Prediction of Molecular Markers in Glioma on Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis. Neurosurgery 2021;89:31-44. [Crossref] [PubMed]
- Zhang S, Yin L, Ma L, Sun H. Artificial Intelligence Applications in Glioma With 1p/19q Co-Deletion: A Systematic Review. J Magn Reson Imaging 2023;58:1338-52. [Crossref] [PubMed]
- Zhu Z, Shen J, Liang X, Zhou J, Liang J, Ni L, Wang H, Ye M, Chen S, Yang H, Chen Q, Li X, Zhang W, Lu J, Ge D, Fu L, Zhu Y, Zhang X, Sun Y, Zhang B. Radiomics for predicting grades, isocitrate dehydrogenase mutation, and oxygen 6-methylguanine-DNA methyltransferase promoter methylation of adult diffuse gliomas: combination of structural MRI, apparent diffusion coefficient, and susceptibility-weighted imaging. Quant Imaging Med Surg 2024;14:9276-89. [Crossref] [PubMed]
- Wu J, Xu Q, Shen Y, Chen W, Xu K, Qi XR. Swin Transformer Improves the IDH Mutation Status Prediction of Gliomas Free of MRI-Based Tumor Segmentation. J Clin Med 2022;11:4625. [Crossref] [PubMed]
- Lu CF, Hsu FT, Hsieh KL, Kao YJ, Cheng SJ, Hsu JB, Tsai PH, Chen RJ, Huang CC, Yen Y, Chen CY. Machine Learning-Based Radiomics for Molecular Subtyping of Gliomas. Clin Cancer Res 2018;24:4429-36. [Crossref] [PubMed]
- Abdel Razek AAK, Alksas A, Shehata M, AbdelKhalek A, Abdel Baky K, El-Baz A, Helmy E. Clinical applications of artificial intelligence and radiomics in neuro-oncology imaging. Insights Imaging 2021;12:152. [Crossref] [PubMed]
- Cheng J, Liu J, Kuang H, Wang J. A Fully Automated Multimodal MRI-Based Multi-Task Learning for Glioma Segmentation and IDH Genotyping. IEEE Trans Med Imaging 2022;41:1520-32. [Crossref] [PubMed]
- Ma C, Wang L, Song D, Gao C, Jing L, Lu Y, Liu D, Man W, Yang K, Meng Z, Zhang H, Xue P, Zhang Y, Guo F, Wang G. Multimodal-based machine learning strategy for accurate and non-invasive prediction of intramedullary glioma grade and mutation status of molecular markers: a retrospective study. BMC Med 2023;21:198. [Crossref] [PubMed]
- Xu Q, Xu QQ, Shi N, Dong LN, Zhu H, Xu K. A multitask classification framework based on vision transformer for predicting molecular expressions of glioma. Eur J Radiol 2022;157:110560. [Crossref] [PubMed]
- Usuzaki T, Inamori R, Shizukuishi T, Morishita Y, Takagi H, Ishikuro M, Obara T, Takase K. Predicting isocitrate dehydrogenase status among adult patients with diffuse glioma using patient characteristics, radiomic features, and magnetic resonance imaging: Multi-modal analysis by variable vision transformer. Magn Reson Imaging 2024;111:266-76. [Crossref] [PubMed]
- Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, Yang Z, Zhang Y, Tao D. A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 2022;45:87-110. [Crossref] [PubMed]
- Wang Y, Huang R, Song S, Huang Z, Huang G. Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition. Adv Neural Inf Process Syst 2021;34:11960-73.
- Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021:10012-22.
- Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017.
- Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017:618-26.
- Hollon T, Jiang C, Chowdury A, Nasir-Moin M, Kondepudi A, Aabedi A, et al. Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging. Nat Med 2023;29:828-32. [Crossref] [PubMed]
- Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378. [Crossref] [PubMed]
- Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. [Crossref] [PubMed]
- Li Y, Zheng K, Li S, Yi Y, Li M, Ren Y, Guo C, Zhong L, Yang W, Li X, Yao L. A transformer-based multi-task deep learning model for simultaneous infiltrated brain area identification and segmentation of gliomas. Cancer Imaging 2023;23:105. [Crossref] [PubMed]
- van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts HJWL. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-7. [Crossref] [PubMed]
- Cancer Imaging Archive. VASARI research project. 2019. (Accessed 29 July 2019). Available online: https://wiki.cancerimagingarchive.net/display/Public/VASARI+Research+Project
- Gutman DA, Cooper LA, Hwang SN, Holder CA, Gao J, Aurora TD, et al. MR imaging predictors of molecular profile and survival: multi-institutional study of the TCGA glioblastoma data set. Radiology 2013;267:560-9. [Crossref] [PubMed]
- Luo J, Pan M, Mo K, Mao Y, Zou D. Emerging role of artificial intelligence in diagnosis, classification and clinical management of glioma. Semin Cancer Biol 2023;91:110-23. [Crossref] [PubMed]
- Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing & Management 2009;45:427-37. [Crossref]
- Raghavendra U, Gudigar A, Paul A, Goutham TS, Inamdar MA, Hegde A, Devi A, Ooi CP, Deo RC, Barua PD, Molinari F, Ciaccio EJ, Acharya UR. Brain tumor detection and screening using artificial intelligence techniques: Current trends and future perspectives. Comput Biol Med 2023;163:107063. [Crossref] [PubMed]
- Zhao J, Huang Y, Song Y, Xie D, Hu M, Qiu H, Chu J. Diagnostic accuracy and potential covariates for machine learning to identify IDH mutations in glioma patients: evidence from a meta-analysis. Eur Radiol 2020;30:4664-74. [Crossref] [PubMed]
- Zhou H, Chang K, Bai HX, Xiao B, Su C, Bi WL, Zhang PJ, Senders JT, Vallières M, Kavouridis VK, Boaro A, Arnaout O, Yang L, Huang RY. Machine learning reveals multimodal MRI patterns predictive of isocitrate dehydrogenase and 1p/19q status in diffuse low- and high-grade gliomas. J Neurooncol 2019;142:299-307. [Crossref] [PubMed]
- Brat DJ, Verhaak RG, Aldape KD, Yung WK, Salama SR, et al. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med 2015;372:2481-98. [Crossref] [PubMed]
- Choi YS, Bae S, Chang JH, Kang SG, Kim SH, Kim J, Rim TH, Choi SH, Jain R, Lee SK. Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro Oncol 2021;23:304-13. [Crossref] [PubMed]
- Truong NCD, Bangalore Yogananda CG, Wagner BC, Holcomb JM, Reddy D, Saadat N, Hatanpaa KJ, Patel TR, Fei B, Lee MD, Jain R, Bruce RJ, Pinho MC, Madhuranthakam AJ, Maldjian JA. Two-Stage Training Framework Using Multicontrast MRI Radiomics for IDH Mutation Status Prediction in Glioma. Radiol Artif Intell 2024;6:e230218. [Crossref] [PubMed]
- Lee JO, Ahn SS, Choi KS, Lee J, Jang J, Park JH, Hwang I, Park CK, Park SH, Chung JW, Choi SH. Added prognostic value of 3D deep learning-derived features from preoperative MRI for adult-type diffuse gliomas. Neuro Oncol 2024;26:571-80. [Crossref] [PubMed]
- Pei D, Guan F, Hong X, Liu Z, Wang W, Qiu Y, et al. Radiomic features from dynamic susceptibility contrast perfusion-weighted imaging improve the three-class prediction of molecular subtypes in patients with adult diffuse gliomas. Eur Radiol 2023;33:3455-66. [Crossref] [PubMed]
- Jain R, Johnson DR, Patel SH, Castillo M, Smits M, van den Bent MJ, Chi AS, Cahill DP. "Real world" use of a highly reliable imaging sign: "T2-FLAIR mismatch" for identification of IDH mutant astrocytomas. Neuro Oncol 2020;22:936-43. [Crossref] [PubMed]
- Ellingson BM, Bendszus M, Boxerman J, Barboriak D, Erickson BJ, Smits M, et al. Consensus recommendations for a standardized Brain Tumor Imaging Protocol in clinical trials. Neuro Oncol 2015;17:1188-98. [PubMed]