Enhancing BRAF V600E mutation prediction in thyroid cancer through interpretable deep learning models combining clinical and ultrasound-based radiomics features
Introduction
Thyroid cancer is the most prevalent endocrine malignancy worldwide and the predominant cancer type in adolescents and adults under 40 years of age (1-3). Fine-needle aspiration (FNA) cytology results in 25% of nodules being classified as indeterminate, and these still require diagnostic surgery (4). Histopathology remains the reference standard for the diagnosis of thyroid cancer, but it is invasive and costly and may overtreat low-risk lesions. We aimed to evaluate whether a BRAF V600E mutation prediction model based on clinical and ultrasound features can serve as a triage test to reduce unnecessary FNA procedures. For this purpose, the model would be applied before FNA, patients with a high probability of BRAF V600E positivity would proceed directly to surgery, whereas those with low probability would avoid invasive workup. Establishing the diagnostic accuracy of this noninvasive strategy is therefore essential to refining current thyroid nodule management guidelines.
The substitution of valine for glutamic acid at codon 600 in exon 15 (T1799A), known as the BRAF V600E mutation, is the most frequent genetic alteration in BRAF (5). BRAF V600E mutation constitutively activates the mitogen-activated protein kinase (MAPK) signaling pathway to promote uncontrolled cell proliferation and survival while suppressing apoptosis and cellular senescence—key mechanisms enabling tumor evasion of growth control (5,6). BRAF V600E mutations are particularly prevalent in papillary thyroid carcinomas (PTCs), with reported incidence rates ranging from 29% to 83% (7). A previous study reported that BRAF-activated arylsulfatase I (ARSI) suppresses epiregulin-mediated ferroptosis to promote BRAF V600E-mutant PTC progression and sorafenib resistance (8). Other evidence suggests that BRAF V600E mutation-positive thyroid cancer is associated with aggressive clinicopathological features, including macroscopic extrathyroidal extension, lymph node metastasis, and high-risk histological characteristics (9-12). Beyond its diagnostic utility, BRAF V600E mutation has been linked to radioiodine resistance and altered tumor immune microenvironment (12). Moreover, several studies have consistently linked this mutation to poorer prognostic outcomes (13-16).
BRAF V600E mutation is determined by tissue biopsy or surgery and gene sequencing, which are invasive and resource-intensive. Although molecular testing on cytology samples has shown high sensitivity and specificity, this approach nonetheless requires invasive FNA and specialized laboratory infrastructure (17). Ultrasound is the first-line imaging modality for thyroid nodule assessment. Conventional sonographic features for mutation prediction possess limited reproducibility due to operator dependency and interobserver variability (18-20). Furthermore, artificial intelligence (AI)-based methodologies have been increasingly applied to thyroid nodule diagnosis, showing promising results in improving accuracy and reducing interobserver variability (21-23). Recent studies in radiomics and deep learning (DL) have demonstrated their potential for noninvasive BRAF V600E prediction, but this research involves small sample sizes or suboptimal predictive performance (24-28). Moreover, the “black-box” issue of machine learning (ML) models remains a challenge, as it limits the clinical applicability of the existing models. Shapley additive explanations (SHAP) analysis, grounded in the theoretical framework of Shapley values, has been introduced to elucidate feature importance from a game-theoretic perspective. It is now a widely adopted and scientifically validated methodology for interpreting ML models (29), achieving promising results across various types of cancer (30,31). However, there is a lack of research on applying the SHAP methodology to explain the BRAF V600E mutation prediction models in patients with thyroid cancer.
This study aimed to estimate the diagnostic accuracy of a combined model, integrating clinical, ultrasound, radiomics, and DL features, in predicting the BRAF V600E mutation in thyroid nodules. First, the efficacy of six ML models, including logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost), was compared to identify the optimal radiomics model. Second, a ResNet50-32x4d model was fine-tuned to construct the DL model. Subsequently, the XGBoost algorithm was used to integrate clinical and image data for combined model construction. Finally, a SHAP algorithm was applied to elucidate the significance of features in the predictive model and to identify nonlinear relationships among the risk predictors. We hypothesized that the combined model would achieve an area under the curve (AUC) ≥0.80 and significantly reduce the number of unnecessary FNAs without missing >5% of BRAF V600E mutation-positive nodules. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/rc).
Methods
Study population
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of Guangdong Provincial People’s Hospital. The requirement for informed consent was waived due to the retrospective nature of the analysis.
Participants formed a consecutive series of all eligible patients with thyroid nodules admitted to Guangdong Provincial People’s Hospital between January 2018 and August 2023.
The inclusion criteria were as follows: (I) preoperative thyroid ultrasound performed within one month before surgical resection; (II) histopathological confirmation of thyroid cancer; (III) definite BRAF V600E mutation status validated by tissue-based testing using quantitative real-time PCR (qPCR). The exclusion criteria were as follows: (I) BRAF V600E mutation status cannot be determined; (II) lack of preoperative thyroid ultrasound images acquired within one month before surgery; (III) underwent biopsy before ultrasound examination; (IV) poor image quality; (V) lesions cannot be located on imaging; and (VI) lesions cannot be clearly delineated. Potentially eligible participants were identified from the institutional thyroid nodule registry on the basis of a scheduled thyroid surgery. The schematic diagram for patient enrollment is provided in Figure 1. A total of 1,257 lesions from 1,202 patients were included in this study. Since each thyroid nodule exhibits distinct imaging and pathological characteristics, we treated each lesion as an independent unit. To prevent data leakage, a 7:3 random split was applied at the patient level, with 70% of patients allocated to the training cohort and 30% to the test cohort. All lesions from each patient were retained within their assigned cohort.
Ultrasound acquisition and evaluation
Four ultrasound devices were used: the Preirus/Aloka (Hitachi Medical Systems, Tokyo, Japan), Aplio 400 (Toshiba Medical Systems, Otawara, Japan), Resona 8 (Mindray Bio-Medical Electronics, Shenzhen, China), and EPIQ 7 (Philips Healthcare, Best, the Netherlands). No adverse events were attributed to the ultrasound examination. All examinations were performed by radiologists with ≥3 years of experience in thyroid ultrasound. Two radiologists [Reader 1 (Y.Y.) and Reader 2 (Y.G.), with 5- and 15-year experience in thyroid ultrasound, respectively] independently assessed sonographic features, including shape, margin, composition, echogenicity, echogenic foci and blood flow, referring to the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS). Discrepancies were resolved by a senior radiologist [Reader 3 (C.H.), 20 years’ experience in thyroid ultrasound]. The radiologists above were blinded to BRAF V600E reference-standard results and clinical information apart from age and sex.
Clinicopathological data and clinical model construction
The clinical parameters (e.g., sex, age, and number of lesions) and pathological data (e.g., pathologic category, BRAF V600E mutation status, and lymph node involvement or metastases) were collected from medical records and the pathological reports of resected specimens. The American Thyroid Association (ATA) and the National Comprehensive Cancer Network (NCCN) thyroid guidelines recognize BRAF V600E as a decisive marker for papillary carcinoma and recommend molecular testing on surgical specimens (32,33). Our institution already includes this test, ensuring complete data availability. A thyroid cancer eight-gene detection kit (Rigen Bio, Shanghai, China), including oncogene mutations (BRAF V600E, HRASQ61R, KRASG12C/G12V/Q61R, NRASQ61R, and TERTC228T/250T) and chromosome rearrangements (CCDC6-RET, PAX8-PPARG, and ETV6-NTRK3), was used for thyroid gene mutation testing in accordance with the manufacturer’s protocol. The reference standard (tissue PCR for BRAF V600E) yields a binary result: mutant-positive or mutant-negative. Therefore, no cutoff definition or exploratory analysis was required. Pathologists who evaluated BRAF V600E mutation status were unaware of the index test probability scores or clinical ultrasound details. Significant clinical factors and sonographic features (P<0.05) were initially identified via univariate logistic regression and subsequently incorporated into a multivariate logistic regression model for validation and clinical model construction.
Tumor segmentation
Tumor regions of interest (ROIs) were manually delineated by Reader 4 (L.Z., with 6 years’ experience in thyroid ultrasound) using ITK-SNAP software version 3.8.0 (http://www.itksnap.org). Reader 5 (S.L., 10 years’ experience in thyroid ultrasound) independently reviewed all primary delineations. For multifocal lesions, only the largest mass was used for image segmentation and analysis. Interobserver reproducibility was assessed through independent segmentation of 30 randomly selected cases by Reader 4 and Reader 5. The details of the image preprocessing procedure were as follows: (I) intensity normalization with 64 discrete gray-level bins to minimize noise and probe variability; and (II) standardization of all images to reduce the differences in acquisitions across multiple devices.
Development of the DL models
Python software version 3.8.8 (https://www.python.org; Python Software Foundation, Wilmington, DE, USA) was used to construct the DL models. A ResNet50-32x4d model pretrained on ImageNet was fine-tuned with the thyroid ROIs (34). Augmentation strategies included random rotation (±45°), flipping, and cropping. The model was trained with the AdamW optimizer at an initial learning rate of 5e-5 and a batch size of 128, with decay implemented via the cosine annealing algorithm for 50 epochs. The final fully connected layer was removed with replacement by a dropout layer, a batch normalization layer, and a binary classification layer to obtain the final predictive score (35).
Feature selection and development of the radiomics models
A total of 944 radiomics features (including original, log-transformed, and wavelet filtered features) were extracted via the PyRadiomics package (version 3.0.1; http://www.radiomics.io/pyradiomics.html) implemented in Python software version 3.8.8 (https://www.python.org). To mitigate bias and overfitting, feature normalization was performed via the Z-score method. Feature selection comprised three steps: (I) significance filtering, in which the Kruskal-Wallis test identified 711 features with significant differences (P<0.05) between the training and test sets; (II) redundancy reduction, in which features with pairwise Pearson correlation coefficients >0.50 were deemed redundant and removed; and (III) elastic logistic regression, in which the remaining 38 features were subjected to elastic logistic regression, yielding 15 nonredundant predictors (Table 1 and Figure 2).
Table 1
| Feature | Coefficient | OR |
|---|---|---|
| wavelet.HHH_glszm_ZoneEntropy | −0.177 | 0.838 |
| square_ngtdm_Busyness | −0.127 | 0.881 |
| wavelet.LHL_glcm_MCC | −0.089 | 0.915 |
| wavelet.HHH_glszm_SmallAreaHighGrayLevelEmphasis | −0.057 | 0.945 |
| wavelet.LLH_glrlm_ShortRunEmphasis | −0.053 | 0.948 |
| wavelet.LLL_glcm_MaximumProbability | −0.050 | 0.952 |
| wavelet.HLH_glrlm_HighGrayLevelRunEmphasis | −0.047 | 0.954 |
| wavelet.HLH_glszm_ZoneEntropy | −0.038 | 0.963 |
| wavelet.LHH_glszm_HighGrayLevelZoneEmphasis | −0.004 | 0.996 |
| wavelet.LLH_ngtdm_Contrast | 0.005 | 1.005 |
| wavelet.HLH_gldm_LargeDependenceHighGrayLevelEmphasis | 0.053 | 1.055 |
| wavelet.LHH_glcm_Autocorrelation | 0.062 | 1.064 |
| original_shape_Elongation | 0.134 | 1.143 |
| square_glszm_SmallAreaLowGrayLevelEmphasis | 0.170 | 1.185 |
| wavelet.LHH_glcm_SumEntropy | 0.221 | 1.248 |
glcm, gray level co-occurrence matrix; gldm, gray level dependence matrix; glrlm, gray level run length matrix; glszm, gray level size zone matrix; H, high-pass filter; L, low-pass filter; MCC, maximal correlation coefficient; ngtdm, neighborhood gray tone difference matrix; OR, odds ratio.
Six ML classifiers—LR, SVM, RF, DT, KNN, and XGBoost—were employed to predict the mutation status, and their parameters are summarized in Table 2. Hyperparameters were optimized via grid search and fivefold cross-validation. The efficacy of various ML classifiers was compared to identify the most optimal ML model. Prediction probabilities [radiomics score (radscores)] were generated from the most optimal ML model.
Table 2
| Model | Parameter |
|---|---|
| SVM | {‘gamma’: 0.01, ‘kernel’: ‘linear’} |
| RF | {‘depth’: 2, ‘features’: 2} |
| DT | {‘criterion’: ‘gini’, ‘splitter’: ‘best’, ‘max_features’: 2, ‘max_depth’: 5} |
| KNN | {‘algorithm’: ‘auto’, ‘leaf_size’: 1} |
| LR | Default {solver: lbfgs; penalty: L2; C: 1.0} |
| XGBoost | {‘max_depth’: 3, ‘colsample_bytree’: 0.5, ‘subsample’: 0.6, ‘min_child_weight’: 2} |
LR used default parameters; no tuning was performed. DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.
Construction of the image-level and lesion-level combined models
The XGBoost model was employed to integrate the significant clinical data and imaging data (radiomics radscores and DL predictive scores) for combined model construction under the following hyperparameters: “n_estimators”, 9; “max_depth”, 2; “colsample_bytree”, 0.7; “subsample”, 0.6; and “min_child_weight”, 1. One or more ultrasound images were archived for each patient, and given the nature of the retrospective study design, we applied a data-stitching strategy to average the image-level outcomes for the lesion-level outputs (36).
Model explanation
To overcome the black-box issue of ML and DL models, SHAP and gradient class activation map (Grad-CAM) were employed to enhance the transparency and interpretability of our model’s decision-making process. The SHAP method fairly quantifies, on average, the magnitude of each feature’s contribution to the ML model prediction via SHAP values. In this study, we used a beeswarm plot and waterfall plot to visualize SHAP values globally and locally. The SHAP module in Python software was applied to calculate the SHAP values for both the training and test cohorts. Unless otherwise specified, the plots in this article represent the SHAP values of the test cohort. In the Grad-CAM technique, a heatmap highlights the significant regions in the prediction task, which allows us to visualize and identify the most relevant areas that the model used for decision-making. The overall workflow of this study is shown in Figure 3.
Statistical analysis
All statistical analyses were conducted in Python and R software version 4.1.2 (https://www.r-project.org; The R Foundation for Statistical Computing, Vienna, Austria). We conducted a comparison of the clinicopathological and sonographic features between BRAF-mutant and BRAF wild-type lesions using the Kruskal-Wallis test, Mann-Whitney U test, or the χ2 test. Normally distributed data are presented as the mean ± standard deviation (SD) and nonnormal data as the median and interquartile range. The performance of the predictive models at both the image and lesion levels was assessed via the receiver operating characteristic (ROC) curve, the AUC, accuracy, sensitivity, and specificity. The optimal cutoff values were determined exploratorily through maximization of the Youden index. The DeLong test was employed to evaluate the differences in the AUCs between the combined model and the other models. Statistical significance was defined as a two-tailed P value <0.05.
Results
Patient characteristics
The study cohort comprised 6,703 ultrasound images from 1,257 lesions of 1,202 patients with thyroid cancer (Figure 1). The patient cohort consisted of 339 (28.2%) males and 863 (71.8%) females, with a mean age of 42.04±11.88 years. The lesion cohort consisted of 893 (71.0%) BRAF V600E mutation-positive (BRAF_MU) and 364 (29.0%) mutation-negative (BRAF_WT) lesions. Histologically, 1,222 (97.2%) were PTCs, 16 (1.3%) follicular thyroid carcinomas, and 19 (1.5%) other subtypes (Table 3). Patients with BRAF_MU exhibited higher rates of cervical lateral lymph node metastasis but a smaller tumor size compared to those with BRAF_WT in both the training and test sets (P<0.05). Sonographic features including the margin, echogenicity, and echogenic foci of thyroid lesions were significantly different between the BRAF_MU and BRAF_WT groups in the training and test sets (P<0.05). In the training cohort, patients with BRAF_MU were older and more frequently presented with central lymph node metastasis, vertical growth, solid composition, and mainly central vascularity (P<0.05). The demographic and clinical characteristics of patients in training and test cohorts are summarized in Table 4.
Table 3
| Feature | Level | Training set (n=880) | Test set (n=377) | |||||
|---|---|---|---|---|---|---|---|---|
| BRAF_WT (n=255) | BRAF_MU (n=625) | P | BRAF_WT (n=109) | BRAF_MU (n=268) | P | |||
| Age (years) | 39.00 (30.00 to 50.00) | 41.00 (34.00 to 50.00) | 0.046 | 41.00 (31.00 to 54.00) | 42.00 (34.00 to 51.00) | 0.621 | ||
| Sex | Male | 60 (23.5) | 184 (29.4) | 0.090 | 29 (26.6) | 82 (30.6) | 0.518 | |
| Female | 195 (76.5) | 441 (70.6) | 80 (73.4) | 186 (69.4) | ||||
| Multifocality | Negative | 196 (76.9) | 517 (82.7) | 0.055 | 86 (78.9) | 209 (78.0) | 0.954 | |
| Positive | 59 (23.1) | 108 (17.3) | 23 (21.1) | 59 (22.0) | ||||
| Pathological category | Papillary carcinoma | 232 (91.0) | 624 (99.8) | <0.001 | 98 (89.9) | 268 (100) | <0.001 | |
| Follicular carcinoma | 10 (3.9) | 1 (0.2) | 5 (4.6) | 0 (0) | ||||
| Other malignancy | 13 (5.1) | 0 (0) | 6 (5.5) | 0 (0) | ||||
| Central lymph node metastasis | Negative | 155 (60.8) | 427 (68.3) | 0.039 | 69 (63.3) | 185 (69.0) | 0.340 | |
| Positive | 100 (39.2) | 198 (31.7) | 40 (36.7) | 83 (31.0) | ||||
| Lateral lymph node metastasis | Negative | 224 (87.8) | 597 (95.5) | <0.001 | 96 (88.1) | 257 (95.9) | 0.010 | |
| Positive | 31 (12.2) | 28 (4.5) | 13 (11.9) | 11 (4.1) | ||||
Data are presented as median (interquartile range) or n (%). BRAF_MU, BRAF V600E mutation-positive; BRAF_WT, BRAF V600E mutation-negative.
Table 4
| Feature | Level | Training set (n=880) | Test set (n=377) | |||||
|---|---|---|---|---|---|---|---|---|
| BRAF_WT (n=255) | BRAF_MU (n=625) | P | BRAF_WT (n=109) | BRAF_MU (n=268) | P | |||
| Tumor size (cm) | 1.10 (0.70 to 1.65) | 0.80 (0.60 to 1.20) | <0.001 | 1.00 (0.70 to 2.00) | 0.80 (0.60 to 1.10) | <0.001 | ||
| Tumor location | Left upper | 24 (9.4) | 68 (10.9) | 0.520 | 14 (12.8) | 38 (14.2) | 0.844 | |
| Left middle | 49 (19.2) | 147 (23.5) | 16 (14.7) | 52 (19.4) | ||||
| Left lower | 43 (16.9) | 78 (12.5) | 16 (14.7) | 34 (12.7) | ||||
| Right upper | 24 (9.4) | 64 (10.2) | 8 (7.3) | 23 (8.6) | ||||
| Right middle | 59 (23.1) | 147 (23.5) | 32 (29.4) | 63 (23.5) | ||||
| Right lower | 49 (19.2) | 107 (17.1) | 20 (18.3) | 48 (17.9) | ||||
| Isthmus | 7 (2.8) | 14 (2.3) | 3 (2.8) | 10 (3.7) | ||||
| Shape | Wider than tall | 129 (50.6) | 201 (32.2) | <0.001 | 45 (41.3) | 84 (31.3) | 0.085 | |
| Taller than wide | 126 (49.4) | 424 (67.8) | 64 (58.7) | 184 (68.7) | ||||
| Margin | Smooth | 6 (2.4) | 2 (0.3) | 0.011 | 3 (2.7) | 0 (0) | 0.004 | |
| Ill-defined | 214 (83.9) | 509 (81.4) | 90 (82.6) | 205 (76.5) | ||||
| Lobulated or irregular | 22 (8.6) | 80 (12.8) | 10 (9.2) | 53 (19.8) | ||||
| Extrathyroidal extension | 13 (5.1) | 34 (5.5) | 6 (5.5) | 10 (3.7) | ||||
| Composition | Spongiform | 2 (0.8) | 0 (0) | 0.014 | 0 (0) | 0 (0) | 1.000 | |
| Mixed cystic and solid | 6 (2.3) | 5 (0.8) | 2 (1.8) | 3 (1.1) | ||||
| Solid or almost completely solid | 247 (96.9) | 620 (99.2) | 107 (98.2) | 265 (98.9) | ||||
| Echogenicity | Very hypoechoic | 17 (6.7) | 82 (13.1) | <0.001 | 4 (3.7) | 29 (10.8) | <0.001 | |
| Hypoechoic | 212 (83.1) | 520 (83.2) | 92 (84.4) | 231 (86.2) | ||||
| Hyperechoic or isoechoic | 26 (10.2) | 23 (3.7) | 13 (11.9) | 8 (3.0) | ||||
| Echogenic foci | None | 54 (21.2) | 218 (34.9) | <0.001 | 27 (24.8) | 82 (30.6) | 0.033 | |
| Large comet-tail artifacts | 0 (0) | 1 (0.2) | 1 (0.9) | 0 (0) | ||||
| Macrocalcifications | 9 (3.5) | 8 (1.2) | 5 (4.6) | 4 (1.5) | ||||
| Peripheral (rim) calcifications | 0 (0) | 1 (0.2) | 0 (0) | 1 (0.4) | ||||
| Punctate echogenic foci | 138 (54.1) | 324 (51.8) | 55 (50.5) | 153 (57.1) | ||||
| Mixed | 54 (21.2) | 73 (11.7) | 21 (19.3) | 28 (10.4) | ||||
| Blood flow | None | 10 (3.9) | 32 (5.1) | 0.003 | 4 (3.7) | 17 (6.3) | 0.560 | |
| Only peripheral blood flow | 12 (4.7) | 14 (2.2) | 5 (4.6) | 6 (2.2) | ||||
| Mainly peripheral blood flow | 80 (31.4) | 156 (25) | 32 (29.4) | 71 (26.5) | ||||
| Mainly central blood flow | 67 (26.3) | 240 (38.4) | 29 (26.6) | 80 (29.9) | ||||
| Mixed blood flow | 86 (33.7) | 183 (29.3) | 39 (35.8) | 94 (35.1) | ||||
| Ultrasound assessment of lymph node metastasis | Negative | 179 (70.2) | 499 (79.8) | 0.003 | 74 (67.9) | 214 (79.9) | 0.019 | |
| Positive | 76 (29.8) | 126 (20.2) | 35 (32.1) | 54 (20.1) | ||||
Data are presented as median (interquartile range) or n (%). BRAF_MU, BRAF V600E mutation-positive; BRAF_WT, BRAF V600E mutation-negative.
Performance of the clinical model
After the univariate logistic regression and multivariate logistic regression analyses, sex, age, tumor size, and multifocality were identified as predictive factors (Table 5). At the image level, these four factors were combined to construct a clinical model to predict the BRAF V600E mutation. The AUC, accuracy, sensitivity, and specificity of the model were 0.645 (95% CI: 0.630–0.660), 61.1%, 62.0%, and 60.3%, respectively, in the training set, while they were 0.644 (95% CI: 0.611–0.677), 58.5%, 59.2%, and 56.9%, respectively, in the test set. At the lesion level, the AUC, accuracy, sensitivity, and specificity of the clinical model were 0.636 (95% CI: 0.594–0.675), 61.3%, 62.4%, and 58.4%, respectively, in the training set, while they were 0.631 (95% CI: 0.570–0.695), 58.9%, 60.4%, and 55.0%, respectively, in the test set.
Table 5
| Feature | Univariate | Multivariate | |||
|---|---|---|---|---|---|
| OR (95% CI) | P | OR (95% CI) | P | ||
| Sex | 0.71 (0.63–0.80) | <0.001 | 0.61 (0.54–0.69) | <0.001 | |
| Age | 1.02 (1.01–1.02) | <0.001 | 1.01 (1.00–1.01) | <0.001 | |
| Tumor size | 0.53 (0.49–0.57) | <0.001 | 0.54 (0.50–0.58) | <0.001 | |
| Multifocality | 0.60 (0.52–0.68) | <0.001 | 0.66 (0.57–0.75) | <0.001 | |
CI, confidence interval; OR, odds ratio.
Performance and comparison of the radiomics models
Among six classifiers, XGBoost demonstrated superior performance at both the image and lesion levels. At the image level, the AUC, accuracy, sensitivity, and specificity were 0.772 (95% CI: 0.759–0.784), 71.0%, 72.4%, and 69.8%, respectively, in the training set, while they were 0.721 (95% CI: 0.690–0.751), 67.2%, 68.2%, and 64.8%, respectively, in the test set (Table 6 and Figure 4A,4B). At the lesion level, the AUC, accuracy, sensitivity, and specificity were 0.809 (95% CI: 0.776–0.844), 76.5%, 79.5%, and 69.0%, respectively, in the training set, while they were 0.745 (95% CI: 0.686–0.805), 70.3%, 73.9%, and 61.5%, respectively, in the test set (Table 7 and Figure 4C,4D).
Table 6
| Group | Model | AUC (95% CI) | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Training set | LR | 0.705 (0.692–0.718) | 0.635 | 0.872 | 0.429 |
| SVM | 0.628 (0.613–0.643) | 0.595 | 0.734 | 0.474 | |
| RF | 0.673 (0.658–0.686) | 0.628 | 0.668 | 0.593 | |
| DT | 0.692 (0.678–0.706) | 0.638 | 0.762 | 0.531 | |
| KNN | 0.697 (0.683–0.712) | 0.645 | 0.732 | 0.570 | |
| XGBoost | 0.772 (0.759–0.784) | 0.710 | 0.724 | 0.698 | |
| Test set | LR | 0.712 (0.682–0.743) | 0.738 | 0.855 | 0.452 |
| SVM | 0.619 (0.587–0.652) | 0.640 | 0.706 | 0.479 | |
| RF | 0.670 (0.639–0.701) | 0.631 | 0.645 | 0.595 | |
| DT | 0.655 (0.622–0.684) | 0.672 | 0.752 | 0.476 | |
| KNN | 0.610 (0.578–0.641) | 0.631 | 0.694 | 0.479 | |
| XGBoost | 0.721 (0.690–0.751) | 0.672 | 0.682 | 0.648 |
AUC, area under the curve; CI, confidence interval; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.
Table 7
| Group | Model | AUC (95% CI) | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Training set | LR | 0.731 (0.693–0.769) | 0.714 | 0.766 | 0.584 |
| SVM | 0.643 (0.602–0.685) | 0.692 | 0.786 | 0.463 | |
| RF | 0.708 (0.673–0.746) | 0.697 | 0.741 | 0.588 | |
| DT | 0.774 (0.738–0.809) | 0.748 | 0.781 | 0.667 | |
| KNN | 0.761 (0.725–0.797) | 0.744 | 0.811 | 0.580 | |
| XGBoost | 0.809 (0.776–0.844) | 0.765 | 0.795 | 0.690 | |
| Test set | LR | 0.730 (0.671–0.792) | 0.714 | 0.757 | 0.606 |
| SVM | 0.614 (0.550–0.682) | 0.645 | 0.731 | 0.431 | |
| RF | 0.689 (0.624–0.757) | 0.695 | 0.728 | 0.615 | |
| DT | 0.703 (0.644–0.761) | 0.682 | 0.754 | 0.505 | |
| KNN | 0.631 (0.567–0.701) | 0.668 | 0.757 | 0.450 | |
| XGBoost | 0.745 (0.686–0.805) | 0.703 | 0.739 | 0.615 |
AUC, area under the curve; CI, confidence interval; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.
Performance of the DL and combined models
In the test cohort, the efficacy of prediction for the BRAF V600E mutation of the DL model was superior to that of the radiomics model at both the image and lesion levels, with AUCs of 0.782 and 0.807, respectively (Tables 8,9 and Figures 5,6). At the lesion level, the combined model achieved the significantly highest diagnostic accuracy in both the training and test sets as compared to the clinical and radiomics models (P<0.05). The AUC, accuracy, sensitivity, and specificity of the combined model were 0.845 (95% CI: 0.815–0.875), 80.2%, 83.2%, and 72.9%, respectively, in the training set, and they were 0.814 (95% CI: 0.769–0.861), 78.2%, 84.0%, and 64.2%, respectively, in the test set (Table 9 and Figure 6). However, at the image and lesion levels, the AUC of the combined models in the test set was not significantly different from that of the DL models.
Table 8
| Group | Model | AUC (95% CI) | Accuracy | Sensitivity | Specificity | P (DeLong test) |
|---|---|---|---|---|---|---|
| Training set | Clinical model | 0.645 (0.630–0.660) | 0.611 | 0.620 | 0.603 | <0.001 |
| Radiomics model | 0.772 (0.759–0.784) | 0.710 | 0.724 | 0.698 | <0.001 | |
| DL model | 0.741 (0.728–0.754) | 0.683 | 0.688 | 0.679 | <0.001 | |
| Combined model | 0.818 (0.807–0.829) | 0.749 | 0.786 | 0.717 | – | |
| Test set | Clinical model | 0.644 (0.611–0.677) | 0.585 | 0.592 | 0.569 | <0.001 |
| Radiomics model | 0.721 (0.690–0.751) | 0.672 | 0.682 | 0.648 | <0.001 | |
| DL model | 0.782 (0.755–0.807) | 0.696 | 0.706 | 0.671 | 0.077 | |
| Combined model | 0.797 (0.769–0.823) | 0.736 | 0.764 | 0.669 | – |
AUC, area under the curve; CI, confidence interval; DL, deep learning.
Table 9
| Group | Model | AUC (95% CI) | Accuracy | Sensitivity | Specificity | P (DeLong test) |
|---|---|---|---|---|---|---|
| Training set | Clinical model | 0.636 (0.594–0.675) | 0.613 | 0.624 | 0.584 | <0.001 |
| Radiomics model | 0.809 (0.776–0.844) | 0.765 | 0.795 | 0.690 | 0.003 | |
| DL model | 0.770 (0.733–0.805) | 0.733 | 0.765 | 0.655 | <0.001 | |
| Combined model | 0.845 (0.815–0.875) | 0.802 | 0.832 | 0.729 | – | |
| Test set | Clinical model | 0.631 (0.570–0.695) | 0.589 | 0.604 | 0.550 | <0.001 |
| Radiomics model | 0.745 (0.686–0.805) | 0.703 | 0.739 | 0.615 | <0.001 | |
| DL model | 0.807 (0.765–0.850) | 0.753 | 0.817 | 0.596 | 0.629 | |
| Combined model | 0.814 (0.769–0.861) | 0.782 | 0.840 | 0.642 | – |
AUC, area under the curve; CI, confidence interval; DL, deep learning.
Visual interpretation of the combined models
In Figure 7, the features are sorted by importance from top to bottom, with a string of colored dots representing individual patients with thyroid cancer. The DL features rank at the top, followed by radscores and clinical variables. Patients with high DL scores or radscores (red dots) tended to have a high positive impact on the BRAF V600E mutation prediction. The other features, including sex, age, lesion size, and multifocality, provide additional insights into the clinical factors of the lesion and their relevance to BRAF V600E mutation status. The waterfall plot was used to illustrate to individual interpretability. Figure 8 provides two representative predictive examples, demonstrating how the SHAP could help clinicians develop a more accurate diagnosis based on the individual characteristics of each patient. In the heatmaps in Figure 8, the regions with red color represent a significant impact on the model’s decision-making.
Discussion
In this study, we found that the XGBoost model had the highest performance in predicting the BRAF V600E mutation among the six ML models. Logistic analysis of clinical factors showed that sex, age, lesion size, and multifocality were independent predictors of the BRAF V600E mutation. The predictive performance of clinical-image combined model was significantly better than that of the clinical model and XGBoost model alone and exhibited comparable performance to that of the DL model. The models performed better at the lesion level than at the image level. The SHAP algorithm improved the interpretability for clinical-image combined model and visualized the BRAF V600E mutation prediction process at the patient level. Our model’s performance is driven by the high prevalence of BRAF V600E in PTC and its near absence in non-PTC subtypes. Consequently, this model is not generalizable to non-PTC thyroid cancers and should be used exclusively for the preoperative assessment of suspected PTC. This constraint reflects biological reality, not methodological bias.
We evaluated the performance of six ML models in predicting BRAF V600E mutation and identified the XGBoost model as the top performer. This finding underscores the robustness of XGBoost in handling complex datasets and its ability to capture intricate patterns that are critical for accurate mutation prediction. The superior performance of XGBoost aligns with previous studies that have highlighted its efficacy in various prediction tasks relevant to thyroid nodules, particularly in scenarios involving high-dimensional data (37-43).
Various methods were adopted to examine the relationship between thyroid ultrasound images and BRAF V600E mutation status. Several previous studies have found that demographic and ultrasound features recognized by the radiologists can predict BRAF V600E mutation status (44-46). The identification of these predictors not only enhances our understanding of the mutation’s epidemiology but also provides a foundation for incorporating clinical factors into predictive models. Other studies adopted radiomics methods to predict BRAF V600E mutation status based on the radiomics features extracted from thyroid ultrasound images (24-28,47). However, these studies employed relatively small sample sizes or reported low-to-moderate performances unable to meet the needs of clinical diagnosis. A few previous studies clarified the association of BRAF V600E mutation with features derived from novel ultrasound technologies, such as elasticity ultrasound (26,28) and contrast-enhanced ultrasound (18). These technologies are neither easily deployable in primary hospitals nor definitively recommended by current guidelines for thyroid cancer. For instance, Wu et al. constructed relatively effective radiomics and deep transfer learning models to predict BRAF V600E mutation status (48). In another study, a clinical-image fusion model demonstrated superior predictive performance compared to a clinical model, radiomics model, and DL model alone, but an explainable method to predict BRAF V600E mutation status at the individual level was lacking (49).
The combined model in our study, consisting of an integrated clinical model, radiomics model, and DL model, yielded an AUC of 0.845 and 0.814 in the training and test sets, respectively. This suggests that the integration of clinical and imaging data provides a more comprehensive representation of the underlying biological processes, leading to improved prediction accuracy. The combined model’s ability to leverage both structured clinical data and high-dimensional imaging features likely contributes to its enhanced performance, highlighting the importance of multimodal data integration in precision medicine. However, at the image and lesion levels, the AUC of the combined models in the test sets was not significantly different from those of the DL models. Our results are similar to those of Xiang et al., who demonstrated that the AUC of a combined DL model was not significantly different from that of DL-B model based on B-mode images (P=0.98) (50). The small sample size and substantial lesion heterogeneity might have limited the power to detect statistically significant AUC differences between the combined and DL models. Yet, the observed lack of improvement is not a failure of the combined model but an important empirical finding. First, this does not imply that the additional features carry no relevant signal; rather, in this specific cohort and task, the ResNet50 architecture alone was able to capture information that was at least as discriminative as the hand‑crafted radiomic features and the selected clinical variables. This represents a valuable negative result, suggesting that for similar homogeneous datasets and standardized imaging protocols, end‑to‑end deep learning may suffice, thereby simplifying future clinical deployment. Second, the combined model may offer improved generalizability across different imaging protocols or populations, a possibility that was not testable in our singlecohort study (51). Finally, the combined model can provide explainable predictors. In clinical decision‑making, interpretability is indispensable for building physician trust in AI‑assisted systems, meeting regulatory requirements, and avoiding errors that may arise from blackbox models (52).
Our results indicate that models operating at the lesion level outperform those at the image level. This observation emphasizes the importance of considering the lesion-level context in the prediction of BRAF V600E mutations. Lesion-level models likely capture a broader range of relevant features that may not be apparent at the image level. Although we employed a data-stitching strategy given our standardized single-center acquisition protocol and empirical validation of equivalent performance, we acknowledge that attention-based multiple-instance learning represents an important methodological advancement for multicenter studies with variable imaging practices (53).
The application of the SHAP algorithm in this study significantly improved the interpretability of the clinical-image combined model. This study identified DL, radscore, sex, age, tumor size and multifocality as the top 6 key factors to predict BRAF V600E mutation. By visualizing the contribution of each feature to the prediction process, the SHAP algorithm provided individualized insights into the factors driving BRAF V600E mutation predictions. Grad-CAM was used in this study to visualize the regions that contributed the most to the combined model prediction. This interpretability is crucial for clinical adoption, as it allows clinicians to understand and trust the model’s predictions, facilitating its integration into decision-making processes.
Certain limitations to this study should be addressed. First, although the analysis included numerous samples, all the patients in the training and test cohorts were recruited from a single center, and further validation with external data was lacking. Secondly, we employed a retrospective design, which inevitably introduced selection bias. A prospective cohort study is needed to evaluate how these variables may bias the model’s outcomes in the future. Finally, the manual segmentation approach is inherently inefficient and operator-dependent, introducing unquantified selection bias and limiting clinical scalability. This time-intensive process, requiring substantial expert effort per lesion, presents a significant barrier to real-world implementation and reproducibility across a range of clinical settings. We explicitly recognize that these limitations could be effectively overcome through the integration of advanced automated segmentation models incorporating convolutional neural networks or transformer architectures. Such automation would not only eliminate operator dependency and drastically reduce the processing time but also enable the standardized, objective delineation essential for large-scale clinical deployment. We have prioritized advancing this transition in our ongoing research agenda.
Conclusions
Our study demonstrates the potential of ML and DL, particularly XGBoost and clinical-image combined models, in predicting BRAF V600E mutation with high accuracy. The integration of clinical and imaging data, coupled with lesion-level analysis and interpretability tools such as SHAP, represents a significant advancement in the field of mutation prediction.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/dss
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/coif). C.H. reports support from Noncommunicable Chronic Diseases-National Science and Technology Major Project. Y.G. reports support from Guangdong Medical Science and Technology Research Foundation. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committee of Guangdong Provincial People’s Hospital. Informed consent was waived in this retrospective study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17-48. [Crossref] [PubMed]
- Miller KD, Fidler-Benaoudia M, Keegan TH, Hipp HS, Jemal A, Siegel RL. Cancer statistics for adolescents and young adults, 2020. CA Cancer J Clin 2020;70:443-59. [Crossref] [PubMed]
- Boucai L, Zafereo M, Cabanillas ME. Thyroid Cancer: A Review. JAMA 2024;331:425-35. [Crossref] [PubMed]
- Azaryan I, Maxwell C, Tran DH, Sipos JA, Endo M. Molecular testing for the management of indeterminate thyroid nodules. Endocr Relat Cancer 2026;33:e250258. [Crossref] [PubMed]
- Hertzman Johansson C, Egyhazi Brage S. BRAF inhibitors in cancer therapy. Pharmacol Ther 2014;142:176-82. [Crossref] [PubMed]
- Pisapia P, Pepe F, Iaccarino A, Sgariglia R, Nacchio M, Russo G, Gragnano G, Malapelle U, Troncone G. BRAF: A Two-Faced Janus. Cells 2020;9:2549. [Crossref] [PubMed]
- Baloch ZW, Asa SL, Barletta JA, Ghossein RA, Juhlin CC, Jung CK. LiVolsi VA, Papotti MG, Sobrinho-Simões M, Tallini G, Mete O. Overview of the 2022 WHO Classification of Thyroid Neoplasms. Endocr Pathol 2022;33:27-63. [Crossref] [PubMed]
- Chen X, Chen X, Xie W, Ge H, He H, Zhang A, Zheng J. BRAF-activated ARSI suppressed EREG-mediated ferroptosis to promote BRAF(V600E) (mutant) papillary thyroid carcinoma progression and sorafenib resistance. Int J Biol Sci 2025;21:128-42. [Crossref] [PubMed]
- Silver JA, Bogatchenko M, Pusztaszeri M, Forest VI, Hier MP, Yang JW, Tamilia M, Payne RJ. BRAF V600E mutation is associated with aggressive features in papillary thyroid carcinomas ≤ 1.5 cm. J Otolaryngol Head Neck Surg 2021;50:63. [Crossref] [PubMed]
- Liu X, Yan K, Lin X, Zhao L, An W, Wang C, Liu X. The association between BRAF (V600E) mutation and pathological features in PTC. Eur Arch Otorhinolaryngol 2014;271:3041-52. [Crossref] [PubMed]
- Dell'Aquila M, Fiorentino V, Martini M, Capodimonti S, Cenci T, Lombardi CP, Raffaelli M, Pontecorvi A, Fadda G, Pantanowitz L, Larocca LM, Rossi ED. How limited molecular testing can also offer diagnostic and prognostic evaluation of thyroid nodules processed with liquid-based cytology: Role of TERT promoter and BRAF V600E mutation analysis. Cancer Cytopathol 2021;129:819-29. [Crossref] [PubMed]
- Pizzimenti C, Fiorentino V, Ieni A, Rossi ED, Germanà E, Giovanella L, Lentini M, Alessi Y, Tuccari G, Campennì A, Martini M, Fadda G. BRAF-AXL-PD-L1 Signaling Axis as a Possible Biological Marker for RAI Treatment in the Thyroid Cancer ATA Intermediate Risk Category. Int J Mol Sci 2023;24:10024. [Crossref] [PubMed]
- Xing M, Alzahrani AS, Carson KA, Shong YK, Kim TY, Viola D, et al. Association between BRAF V600E mutation and recurrence of papillary thyroid cancer. J Clin Oncol 2015;33:42-50. [Crossref] [PubMed]
- Ye Z, Xia X, Xu P, Liu W, Wang S, Fan Y, Guo M. The Prognostic Implication of the BRAF V600E Mutation in Papillary Thyroid Cancer in a Chinese Population. Int J Endocrinol 2022;2022:6562149. [Crossref] [PubMed]
- Cong R, Ouyang H, Zhou D, Li X, Xia F. BRAF V600E mutation in thyroid carcinoma: a large-scale study in Han Chinese population. World J Surg Oncol 2024;22:259. [Crossref] [PubMed]
- Liu R, Zhu G, Tan J, Shen X, Xing M. Genetic trio of BRAF and TERT alterations and rs2853669TT in papillary thyroid cancer aggressiveness. J Natl Cancer Inst 2024;116:694-701. [Crossref] [PubMed]
- Fiorentino V, Giordano W, Pizzimenti C, Zuccalà V, Ieni A, Molinario C, Cannavò S, Campennì A, Tralongo P, Martini M, Giuffrè G, Larocca LM, Fadda G, Rossi ED. Molecular profiling of thyroid nodules on cytologic samples: Findings from an Italian multi-institutional cohort. Cancer Cytopathol 2026;134:e70065. [Crossref] [PubMed]
- Liu Y, He L, Yin G, Cheng L, Zeng B, Cheng J, Yang L. Association analysis and the clinical significance of BRAF gene mutations and ultrasound features in papillary thyroid carcinoma. Oncol Lett 2019;18:2995-3002. [Crossref] [PubMed]
- Xu JM, Chen YJ, Dang YY, Chen M. Association Between Preoperative US, Elastography Features and Prognostic Factors of Papillary Thyroid Cancer With BRAF(V600E) Mutation. Front Endocrinol (Lausanne) 2019;10:902. [Crossref] [PubMed]
- Lv Y, He X, Yang F, Guo L, Qi M, Zhang J, Wang H. Correlation of conventional ultrasound features and related factors with BRAFV600E gene mutation in papillary thyroid carcinoma. Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi 2021;35:925-9. [Crossref] [PubMed]
- Sujini GN, Balakrishna S. Automated thyroid nodule classification in ultrasound imaging using a hybrid vision transformer and Wasserstein GAN with gradient penalty. Sci Rep 2025;15:40786. [Crossref] [PubMed]
- Fiorentino V, Pizzimenti C, Franchina M, Micali MG, Russotto F, Pepe L, Militi GB, Tralongo P, Pierconti F, Ieni A, Martini M, Tuccari G, Rossi ED, Fadda G. The minefield of indeterminate thyroid nodules: could artificial intelligence be a suitable diagnostic tool? Diagn Histopathol 2023;29:396-401.
- Jerbi F, Aboudi N, Khlifa N. Automatic classification of ultrasound thyroids images using vision transformers and generative adversarial networks. Sci Afr 2023;20:e01679.
- Yoon J, Lee E, Koo JS, Yoon JH, Nam KH, Lee J, Jo YS, Moon HJ, Park VY, Kwak JY. Artificial intelligence to predict the BRAFV600E mutation in patients with thyroid cancer. PLoS One 2020;15:e0242806. [Crossref] [PubMed]
- Kwon MR, Shin JH, Park H, Cho H, Hahn SY, Park KW. Radiomics Study of Thyroid Ultrasound for Predicting BRAF Mutation in Papillary Thyroid Carcinoma: Preliminary Results. AJNR Am J Neuroradiol 2020;41:700-5. [Crossref] [PubMed]
- Wang YG, Xu FJ, Agyekum EA, Xiang H, Wang YD, Zhang J, Sun H, Zhang GL, Bo XS, Lv WZ, Wang X, Hu SD, Qian XQ. Radiomic Model for Determining the Value of Elasticity and Grayscale Ultrasound Diagnoses for Predicting BRAF(V600E) Mutations in Papillary Thyroid Carcinoma. Front Endocrinol (Lausanne) 2022;13:872153. [Crossref] [PubMed]
- Tang J, Jiang S, Ma J, Xi X, Li H, Wang L, Zhang B. Nomogram based on radiomics analysis of ultrasound images can improve preoperative BRAF mutation diagnosis for papillary thyroid microcarcinoma. Front Endocrinol (Lausanne) 2022;13:915135. [Crossref] [PubMed]
- Agyekum EA, Wang YG, Xu FJ, Akortia D, Ren YZ, Chambers KH, Wang X, Taupa JO, Qian XQ. Predicting BRAFV600E mutations in papillary thyroid carcinoma using six machine learning algorithms based on ultrasound elastography. Sci Rep 2023;13:12604. [Crossref] [PubMed]
- Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin Transl Sci 2024;17:e70056. [Crossref] [PubMed]
- Hou F, Zhu Y, Zhao H, Cai H, Wang Y, Peng X, Lu L, He R, Hou Y, Li Z, Chen T. Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: a multicenter study. EClinicalMedicine 2024;77:102913. [Crossref] [PubMed]
- Teng X, Han K, Jin W, Ma L, Wei L, Min D, Chen L, Du Y. Development and validation of an early diagnosis model for bone metastasis in non-small cell lung cancer based on serological characteristics of the bone metastasis mechanism. EClinicalMedicine 2024;72:102617. [Crossref] [PubMed]
- Ringel MD, Sosa JA, Baloch Z, Bischoff L, Bloom G, Brent GA, Brock PL, Chou R, Flavell RR, Goldner W, Grubbs EG, Haymart M, Larson SM, Leung AM, Osborne JR, Ridge JA, Robinson B, Steward DL, Tufano RP, Wirth LJ. 2025 American Thyroid Association Management Guidelines for Adult Patients with Differentiated Thyroid Cancer. Thyroid 2025;35:841-985. [Crossref] [PubMed]
- Haddad RI, Bischoff L, Applewhite M, Bernet V, Blomain E, Brito M, Busaidy NL, Campbell M, DeLozier O, Duh QY, Ehya H, Grady E, Guo T, Haymart M, Hunt JP, Kandeel F, Kotwal A, Lamonica DM, Lorch J, Mandel SJ, Markovina S, Mydlarz W, Nabell L, Raeburn CD, Rezaee R, Ridge JA, Ritter H, Roth MY, Salgado SA, Scheri RP, Shah JP, Sipos JA, Sippel R, Sturgeon C, Wirth LJ, Wong RJ, Worden F, Yeh MW, Darlow S, Cassara CJ, Sliker B. NCCN Guidelines(R) Insights: Thyroid Carcinoma, Version 1.2025. J Natl Compr Canc Netw 2025;23.
- Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, editors. Imagenet: A large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition; 2009: Ieee.
- Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 2014;15:1929-58.
- Qian X, Pei J, Zheng H, Xie X, Yan L, Zhang H, Han C, Gao X, Zhang H, Zheng W, Sun Q, Lu L, Shung KK. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng 2021;5:522-32. [Crossref] [PubMed]
- Xiong Z, Shi Y, Zhang Y, Duan S, Ding Y, Zheng Q, Jiao Y, Yan J. Ultrasound radiomics based XGBoost model to differential diagnosis thyroid nodules and unnecessary biopsy rate: Individual application of SHapley additive exPlanations. J Clin Ultrasound 2024;52:305-14. [Crossref] [PubMed]
- Arabi M, Nazari M, Salahshour A, Jenabi E, Hajianfar G, Khateri M, Shayesteh SP. A machine learning-based sonomics for prediction of thyroid nodule malignancies. Endocrine 2023;82:326-34. [Crossref] [PubMed]
- Mao Y, Lan H, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Machine learning algorithms are comparable to conventional regression models in predicting distant metastasis of follicular thyroid carcinoma. Clin Endocrinol (Oxf) 2023;98:98-109. [Crossref] [PubMed]
- Li Z, Nie W, Liu Q, Lin M, Li X, Zhang J, Liu T, Deng Y, Li S. A prognostic model for thermal ablation of benign thyroid nodules based on interpretable machine learning. Front Endocrinol (Lausanne) 2024;15:1433192. [Crossref] [PubMed]
- Fu R, Deng S, Hu Y, Luo P, Yang H, Teng H, Zeng D, Ren J. Preoperative Evaluation of Cervical Lymph Node Metastasis in Patients With Hashimoto's Thyroiditis Combined With Thyroid Papillary Carcinoma Using Machine Learning and Radiomics-Based Features: A Preliminary Study. Sichuan Da Xue Xue Bao Yi Xue Ban 2024;55:1026-33. [Crossref] [PubMed]
- Huang Y, Mao Y, Xu L, Wen J, Chen G. Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model. BMC Endocr Disord 2022;22:269. [Crossref] [PubMed]
- Zhu H, Luo H, Li Y, Zhang Y, Wu Z, Yang Y. The superior value of radiomics to sonographic assessment for ultrasound-based evaluation of extrathyroidal extension in papillary thyroid carcinoma: a retrospective study. Radiol Oncol 2024;58:386-96. [Crossref] [PubMed]
- Pelizzo MR, Dobrinja C, Casal Ide E, Zane M, Lora O, Toniato A, Mian C, Barollo S, Izuzquiza M, Guerrini J, De Manzini N, Merante Boschin I, Rubello D. The role of BRAF(V600E) mutation as poor prognostic factor for the outcome of patients with intrathyroid papillary thyroid carcinoma. Biomed Pharmacother 2014;68:413-7. [Crossref] [PubMed]
- Kabaker AS, Tublin ME, Nikiforov YE, Armstrong MJ, Hodak SP, Stang MT, McCoy KL, Carty SE, Yip L. Suspicious ultrasound characteristics predict BRAF V600E-positive papillary thyroid carcinoma. Thyroid 2012;22:585-9. [Crossref] [PubMed]
- Li HL, Zhang B. Correlations of Ultrasound Features With Gene Mutations and Pathologic Subtypes in Papillary Thyroid Carcinoma. Zhongguo Yi Xue Ke Xue Yuan Xue Bao 2024;46:747-55. [Crossref] [PubMed]
- Shi H, Ding K, Yang XT, Wu TF, Zheng JY, Wang LF, Zhou BY, Sun LP, Zhang YF, Zhao CK, Xu HX. Prediction of BRAF and TERT status in PTCs by machine learning-based ultrasound radiomics methods: A multicenter study. J Clin Transl Endocrinol 2025;40:100390. [Crossref] [PubMed]
- Wu F, Lin X, Chen Y, Ge M, Pan T, Shi J, Mao L, Pan G, Peng Y, Zhou L, Zheng H, Luo D, Zhang Y. Breaking barriers: noninvasive AI model for BRAF(V600E) mutation identification. Int J Comput Assist Radiol Surg 2025;20:935-47. [Crossref] [PubMed]
- Yu Y, Zhao C, Guo R, Zhang Y, Li X, Liu N, Lu Y, Han X, Tang X, Mao R, Peng C, Yu J, Zhou J. Deep learning model based on ultrasound images predicts BRAF V600E mutation in papillary thyroid carcinoma. iScience 2025;28:112482. [Crossref] [PubMed]
- Xiang H, Wang X, Xu M, Zhang Y, Zeng S, Li C, Liu L, Deng T, Tang G, Yan C, Ou J, Lin Q, He J, Sun P, Li A, Chen H, Heng PA, Lin X. Deep Learning-assisted Diagnosis of Breast Lesions on US Images: A Multivendor, Multicenter Study. Radiol Artif Intell 2023;5:e220185. [Crossref] [PubMed]
- Nawaz A, Edinat A, Rana MRR, Ali T, Mustafa G, Tahir S, Lee SW. The role of multimodality in clinical disease diagnosis: advances, challenges, and opportunities. Front Public Health 2026;14:1788454. [Crossref] [PubMed]
- Hassan MM, Tahsin A, Alam MGR, Alzamil D, Garg S, Uddin MZ, Choudhury N, Fortino G. Explainable multimodal fusion for breast carcinoma diagnosis: A systematic review, open problems, and future directions. Comput Methods Programs Biomed 2026;274:109152. [Crossref] [PubMed]
- Vafaeezadeh M, Behnam H, Gifani P. Ultrasound Image Analysis with Vision Transformers-Review. Diagnostics (Basel) 2024;14:542. [Crossref] [PubMed]


