Enhancing BRAF V600E mutation prediction in thyroid cancer through interpretable deep learning models combining clinical and ultrasound-based radiomics features

Lijie Zhang; Chunwang Huang; Zefeng Chen; Yuanlin Ying; Nan Jiang; Xiaozhu Zhong; Fenghuan Chen; Yuping Guo; Siwei Luo

doi:10.21037/qims-2026-1-0299

Original Article

Enhancing BRAF V600E mutation prediction in thyroid cancer through interpretable deep learning models combining clinical and ultrasound-based radiomics features

Lijie Zhang^1#, Chunwang Huang^1#, Zefeng Chen², Yuanlin Ying¹, Nan Jiang³, Xiaozhu Zhong⁴, Fenghuan Chen⁴, Yuping Guo¹, Siwei Luo¹

¹Department of Ultrasound, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China; ²School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China; ³School of Computer Science, Guangdong University of Foreign Studies South China Business College, Guangzhou, China; ⁴Department of Ultrasound, The Third Affiliated Hospital of Southern Medical University, Academy of Orthopedics, Guangzhou, China

Contributions: (I) Conception and design: L Zhang, C Huang, S Luo; (II) Administrative support: C Huang, S Luo; (III) Provision of study materials or patients: L Zhang, Y Ying, X Zhong, F Chen, S Luo; (IV) Collection and assembly of data: L Zhang, Y Ying, X Zhong, F Chen, Z Chen, N Jiang, S Luo; (V) Data analysis and interpretation: L Zhang, Z Chen, N Jiang, S Luo; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Siwei Luo, MD. Department of Ultrasound, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, 106 Zhongshan Second Road, Guangzhou 510080, China. Email: luosiwei@gdph.org.cn.

Background: BRAF V600E mutation, the most prevalent driver alteration in papillary thyroid carcinoma, is associated with aggressive clinicopathological features, including macroscopic extrathyroidal extension, lymph node metastasis, and high-risk histological features. BRAF V600E mutation is determined by tissue biopsy/surgery and gene sequencing, which are invasive and costly. The study aimed to develop an interpretable prediction model based on clinical and ultrasound characteristics via radiomics and deep learning (DL) methods to noninvasively predict the BRAF V600E mutation in patients with thyroid cancer.

Methods: A total of 6,703 ultrasound images from 1,257 lesions in 1,202 patients with thyroid cancer were retrospectively collected. Since multiple ultrasound images were available for each lesion, the lesion-level prediction was derived as the average of the image-level outputs. Univariate and multivariate logistic regression were adopted to construct the clinical model. Six machine learning models were compared to identify the optimal one. A ResNet50-32x4d model was fine-tuned to build the DL model. The extreme gradient boosting (XGBoost) algorithm was employed to integrate the optimal radiomics score (radscore), DL scores, and clinical factors for combined model construction. The Shapley additive explanations (SHAP) algorithm and gradient-weighted class activation mapping technique were applied for interpretability.

Results: Multivariate analysis identified the significant predictive variables to be sex [odds ratio (OR) =0.61; 95% confidence interval (CI): 0.54–0.69; P<0.001], age (OR =1.01; 95% CI: 1.00–1.01; P<0.001), tumor size (OR =0.54; 95% CI: 0.50–0.58; P<0.001), and multifocality (OR =0.66; 95% CI: 0.57–0.75; P<0.001). Among the six machine learning models, the XGBoost model demonstrated the best performance, with an area under the curve (AUC) of 0.809 and 0.745 in the training and test sets at the lesion level, respectively. The DL model outperformed the XGBoost model, achieving an AUC of 0.807 in the test set at the lesion level. The combined model exhibited comparable performance to that of the DL model, with AUCs of 0.845 and 0.814 in training and test sets at the lesion level, respectively. SHAP analysis revealed that DL scores and radscores were key contributors in predicting mutation status.

Conclusions: The combined model integrating clinical and ultrasound data can effectively predict BRAF V600E mutation status in patients with thyroid cancer.

Keywords: Thyroid cancer; ultrasound; BRAF V600E mutation; SHAP

Submitted Feb 11, 2026. Accepted for publication May 20, 2026. Published online Jun 10, 2026.

doi: 10.21037/qims-2026-1-0299

Introduction

Thyroid cancer is the most prevalent endocrine malignancy worldwide and the predominant cancer type in adolescents and adults under 40 years of age (1-3). Fine-needle aspiration (FNA) cytology results in 25% of nodules being classified as indeterminate, and these still require diagnostic surgery (4). Histopathology remains the reference standard for the diagnosis of thyroid cancer, but it is invasive and costly and may overtreat low-risk lesions. We aimed to evaluate whether a BRAF V600E mutation prediction model based on clinical and ultrasound features can serve as a triage test to reduce unnecessary FNA procedures. For this purpose, the model would be applied before FNA, patients with a high probability of BRAF V600E positivity would proceed directly to surgery, whereas those with low probability would avoid invasive workup. Establishing the diagnostic accuracy of this noninvasive strategy is therefore essential to refining current thyroid nodule management guidelines.

The substitution of valine for glutamic acid at codon 600 in exon 15 (T1799A), known as the BRAF V600E mutation, is the most frequent genetic alteration in BRAF (5). BRAF V600E mutation constitutively activates the mitogen-activated protein kinase (MAPK) signaling pathway to promote uncontrolled cell proliferation and survival while suppressing apoptosis and cellular senescence—key mechanisms enabling tumor evasion of growth control (5,6). BRAF V600E mutations are particularly prevalent in papillary thyroid carcinomas (PTCs), with reported incidence rates ranging from 29% to 83% (7). A previous study reported that BRAF-activated arylsulfatase I (ARSI) suppresses epiregulin-mediated ferroptosis to promote BRAF V600E-mutant PTC progression and sorafenib resistance (8). Other evidence suggests that BRAF V600E mutation-positive thyroid cancer is associated with aggressive clinicopathological features, including macroscopic extrathyroidal extension, lymph node metastasis, and high-risk histological characteristics (9-12). Beyond its diagnostic utility, BRAF V600E mutation has been linked to radioiodine resistance and altered tumor immune microenvironment (12). Moreover, several studies have consistently linked this mutation to poorer prognostic outcomes (13-16).

BRAF V600E mutation is determined by tissue biopsy or surgery and gene sequencing, which are invasive and resource-intensive. Although molecular testing on cytology samples has shown high sensitivity and specificity, this approach nonetheless requires invasive FNA and specialized laboratory infrastructure (17). Ultrasound is the first-line imaging modality for thyroid nodule assessment. Conventional sonographic features for mutation prediction possess limited reproducibility due to operator dependency and interobserver variability (18-20). Furthermore, artificial intelligence (AI)-based methodologies have been increasingly applied to thyroid nodule diagnosis, showing promising results in improving accuracy and reducing interobserver variability (21-23). Recent studies in radiomics and deep learning (DL) have demonstrated their potential for noninvasive BRAF V600E prediction, but this research involves small sample sizes or suboptimal predictive performance (24-28). Moreover, the “black-box” issue of machine learning (ML) models remains a challenge, as it limits the clinical applicability of the existing models. Shapley additive explanations (SHAP) analysis, grounded in the theoretical framework of Shapley values, has been introduced to elucidate feature importance from a game-theoretic perspective. It is now a widely adopted and scientifically validated methodology for interpreting ML models (29), achieving promising results across various types of cancer (30,31). However, there is a lack of research on applying the SHAP methodology to explain the BRAF V600E mutation prediction models in patients with thyroid cancer.

This study aimed to estimate the diagnostic accuracy of a combined model, integrating clinical, ultrasound, radiomics, and DL features, in predicting the BRAF V600E mutation in thyroid nodules. First, the efficacy of six ML models, including logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost), was compared to identify the optimal radiomics model. Second, a ResNet50-32x4d model was fine-tuned to construct the DL model. Subsequently, the XGBoost algorithm was used to integrate clinical and image data for combined model construction. Finally, a SHAP algorithm was applied to elucidate the significance of features in the predictive model and to identify nonlinear relationships among the risk predictors. We hypothesized that the combined model would achieve an area under the curve (AUC) ≥0.80 and significantly reduce the number of unnecessary FNAs without missing >5% of BRAF V600E mutation-positive nodules. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/rc).

Methods

Study population

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of Guangdong Provincial People’s Hospital. The requirement for informed consent was waived due to the retrospective nature of the analysis.

Participants formed a consecutive series of all eligible patients with thyroid nodules admitted to Guangdong Provincial People’s Hospital between January 2018 and August 2023.

The inclusion criteria were as follows: (I) preoperative thyroid ultrasound performed within one month before surgical resection; (II) histopathological confirmation of thyroid cancer; (III) definite BRAF V600E mutation status validated by tissue-based testing using quantitative real-time PCR (qPCR). The exclusion criteria were as follows: (I) BRAF V600E mutation status cannot be determined; (II) lack of preoperative thyroid ultrasound images acquired within one month before surgery; (III) underwent biopsy before ultrasound examination; (IV) poor image quality; (V) lesions cannot be located on imaging; and (VI) lesions cannot be clearly delineated. Potentially eligible participants were identified from the institutional thyroid nodule registry on the basis of a scheduled thyroid surgery. The schematic diagram for patient enrollment is provided in Figure 1. A total of 1,257 lesions from 1,202 patients were included in this study. Since each thyroid nodule exhibits distinct imaging and pathological characteristics, we treated each lesion as an independent unit. To prevent data leakage, a 7:3 random split was applied at the patient level, with 70% of patients allocated to the training cohort and 30% to the test cohort. All lesions from each patient were retained within their assigned cohort.

Figure 1 Flowchart of patient inclusion and exclusion criteria in the study. BRAF_MU, BRAF V600E mutation-positive; BRAF_WT, BRAF V600E mutation-negative; US, ultrasound.

Ultrasound acquisition and evaluation

Four ultrasound devices were used: the Preirus/Aloka (Hitachi Medical Systems, Tokyo, Japan), Aplio 400 (Toshiba Medical Systems, Otawara, Japan), Resona 8 (Mindray Bio-Medical Electronics, Shenzhen, China), and EPIQ 7 (Philips Healthcare, Best, the Netherlands). No adverse events were attributed to the ultrasound examination. All examinations were performed by radiologists with ≥3 years of experience in thyroid ultrasound. Two radiologists [Reader 1 (Y.Y.) and Reader 2 (Y.G.), with 5- and 15-year experience in thyroid ultrasound, respectively] independently assessed sonographic features, including shape, margin, composition, echogenicity, echogenic foci and blood flow, referring to the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS). Discrepancies were resolved by a senior radiologist [Reader 3 (C.H.), 20 years’ experience in thyroid ultrasound]. The radiologists above were blinded to BRAF V600E reference-standard results and clinical information apart from age and sex.

Clinicopathological data and clinical model construction

The clinical parameters (e.g., sex, age, and number of lesions) and pathological data (e.g., pathologic category, BRAF V600E mutation status, and lymph node involvement or metastases) were collected from medical records and the pathological reports of resected specimens. The American Thyroid Association (ATA) and the National Comprehensive Cancer Network (NCCN) thyroid guidelines recognize BRAF V600E as a decisive marker for papillary carcinoma and recommend molecular testing on surgical specimens (32,33). Our institution already includes this test, ensuring complete data availability. A thyroid cancer eight-gene detection kit (Rigen Bio, Shanghai, China), including oncogene mutations (BRAF V600E, HRASQ61R, KRASG12C/G12V/Q61R, NRASQ61R, and TERTC228T/250T) and chromosome rearrangements (CCDC6-RET, PAX8-PPARG, and ETV6-NTRK3), was used for thyroid gene mutation testing in accordance with the manufacturer’s protocol. The reference standard (tissue PCR for BRAF V600E) yields a binary result: mutant-positive or mutant-negative. Therefore, no cutoff definition or exploratory analysis was required. Pathologists who evaluated BRAF V600E mutation status were unaware of the index test probability scores or clinical ultrasound details. Significant clinical factors and sonographic features (P<0.05) were initially identified via univariate logistic regression and subsequently incorporated into a multivariate logistic regression model for validation and clinical model construction.

Tumor segmentation

Tumor regions of interest (ROIs) were manually delineated by Reader 4 (L.Z., with 6 years’ experience in thyroid ultrasound) using ITK-SNAP software version 3.8.0 (http://www.itksnap.org). Reader 5 (S.L., 10 years’ experience in thyroid ultrasound) independently reviewed all primary delineations. For multifocal lesions, only the largest mass was used for image segmentation and analysis. Interobserver reproducibility was assessed through independent segmentation of 30 randomly selected cases by Reader 4 and Reader 5. The details of the image preprocessing procedure were as follows: (I) intensity normalization with 64 discrete gray-level bins to minimize noise and probe variability; and (II) standardization of all images to reduce the differences in acquisitions across multiple devices.

Development of the DL models

Python software version 3.8.8 (https://www.python.org; Python Software Foundation, Wilmington, DE, USA) was used to construct the DL models. A ResNet50-32x4d model pretrained on ImageNet was fine-tuned with the thyroid ROIs (34). Augmentation strategies included random rotation (±45°), flipping, and cropping. The model was trained with the AdamW optimizer at an initial learning rate of 5e-5 and a batch size of 128, with decay implemented via the cosine annealing algorithm for 50 epochs. The final fully connected layer was removed with replacement by a dropout layer, a batch normalization layer, and a binary classification layer to obtain the final predictive score (35).

Feature selection and development of the radiomics models

A total of 944 radiomics features (including original, log-transformed, and wavelet filtered features) were extracted via the PyRadiomics package (version 3.0.1; http://www.radiomics.io/pyradiomics.html) implemented in Python software version 3.8.8 (https://www.python.org). To mitigate bias and overfitting, feature normalization was performed via the Z-score method. Feature selection comprised three steps: (I) significance filtering, in which the Kruskal-Wallis test identified 711 features with significant differences (P<0.05) between the training and test sets; (II) redundancy reduction, in which features with pairwise Pearson correlation coefficients >0.50 were deemed redundant and removed; and (III) elastic logistic regression, in which the remaining 38 features were subjected to elastic logistic regression, yielding 15 nonredundant predictors (Table 1 and Figure 2).

Table 1

Elastic-net logistic regression of the residual 15 features

Feature	Coefficient	OR
wavelet.HHH_glszm_ZoneEntropy	−0.177	0.838
square_ngtdm_Busyness	−0.127	0.881
wavelet.LHL_glcm_MCC	−0.089	0.915
wavelet.HHH_glszm_SmallAreaHighGrayLevelEmphasis	−0.057	0.945
wavelet.LLH_glrlm_ShortRunEmphasis	−0.053	0.948
wavelet.LLL_glcm_MaximumProbability	−0.050	0.952
wavelet.HLH_glrlm_HighGrayLevelRunEmphasis	−0.047	0.954
wavelet.HLH_glszm_ZoneEntropy	−0.038	0.963
wavelet.LHH_glszm_HighGrayLevelZoneEmphasis	−0.004	0.996
wavelet.LLH_ngtdm_Contrast	0.005	1.005
wavelet.HLH_gldm_LargeDependenceHighGrayLevelEmphasis	0.053	1.055
wavelet.LHH_glcm_Autocorrelation	0.062	1.064
original_shape_Elongation	0.134	1.143
square_glszm_SmallAreaLowGrayLevelEmphasis	0.170	1.185
wavelet.LHH_glcm_SumEntropy	0.221	1.248

glcm, gray level co-occurrence matrix; gldm, gray level dependence matrix; glrlm, gray level run length matrix; glszm, gray level size zone matrix; H, high-pass filter; L, low-pass filter; MCC, maximal correlation coefficient; ngtdm, neighborhood gray tone difference matrix; OR, odds ratio.

Figure 2 Elastic-net logistic regression analysis for feature selection. AUC, area under the curve.

Six ML classifiers—LR, SVM, RF, DT, KNN, and XGBoost—were employed to predict the mutation status, and their parameters are summarized in Table 2. Hyperparameters were optimized via grid search and fivefold cross-validation. The efficacy of various ML classifiers was compared to identify the most optimal ML model. Prediction probabilities [radiomics score (radscores)] were generated from the most optimal ML model.

Table 2

The best parameters of the six machine learning models

Model	Parameter
SVM	{‘gamma’: 0.01, ‘kernel’: ‘linear’}
RF	{‘depth’: 2, ‘features’: 2}
DT	{‘criterion’: ‘gini’, ‘splitter’: ‘best’, ‘max_features’: 2, ‘max_depth’: 5}
KNN	{‘algorithm’: ‘auto’, ‘leaf_size’: 1}
LR	Default {solver: lbfgs; penalty: L2; C: 1.0}
XGBoost	{‘max_depth’: 3, ‘colsample_bytree’: 0.5, ‘subsample’: 0.6, ‘min_child_weight’: 2}

LR used default parameters; no tuning was performed. DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Construction of the image-level and lesion-level combined models

The XGBoost model was employed to integrate the significant clinical data and imaging data (radiomics radscores and DL predictive scores) for combined model construction under the following hyperparameters: “n_estimators”, 9; “max_depth”, 2; “colsample_bytree”, 0.7; “subsample”, 0.6; and “min_child_weight”, 1. One or more ultrasound images were archived for each patient, and given the nature of the retrospective study design, we applied a data-stitching strategy to average the image-level outcomes for the lesion-level outputs (36).

Model explanation

To overcome the black-box issue of ML and DL models, SHAP and gradient class activation map (Grad-CAM) were employed to enhance the transparency and interpretability of our model’s decision-making process. The SHAP method fairly quantifies, on average, the magnitude of each feature’s contribution to the ML model prediction via SHAP values. In this study, we used a beeswarm plot and waterfall plot to visualize SHAP values globally and locally. The SHAP module in Python software was applied to calculate the SHAP values for both the training and test cohorts. Unless otherwise specified, the plots in this article represent the SHAP values of the test cohort. In the Grad-CAM technique, a heatmap highlights the significant regions in the prediction task, which allows us to visualize and identify the most relevant areas that the model used for decision-making. The overall workflow of this study is shown in Figure 3.

Figure 3 Flowchart of development and interpretation of a clinical-ultrasound fusion model. AUC, area under the curve; DL, deep learning; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; US, ultrasound; XGBoost, extreme gradient boosting.

Statistical analysis

All statistical analyses were conducted in Python and R software version 4.1.2 (https://www.r-project.org; The R Foundation for Statistical Computing, Vienna, Austria). We conducted a comparison of the clinicopathological and sonographic features between BRAF-mutant and BRAF wild-type lesions using the Kruskal-Wallis test, Mann-Whitney U test, or the χ² test. Normally distributed data are presented as the mean ± standard deviation (SD) and nonnormal data as the median and interquartile range. The performance of the predictive models at both the image and lesion levels was assessed via the receiver operating characteristic (ROC) curve, the AUC, accuracy, sensitivity, and specificity. The optimal cutoff values were determined exploratorily through maximization of the Youden index. The DeLong test was employed to evaluate the differences in the AUCs between the combined model and the other models. Statistical significance was defined as a two-tailed P value <0.05.

Results

Patient characteristics

The study cohort comprised 6,703 ultrasound images from 1,257 lesions of 1,202 patients with thyroid cancer (Figure 1). The patient cohort consisted of 339 (28.2%) males and 863 (71.8%) females, with a mean age of 42.04±11.88 years. The lesion cohort consisted of 893 (71.0%) BRAF V600E mutation-positive (BRAF_MU) and 364 (29.0%) mutation-negative (BRAF_WT) lesions. Histologically, 1,222 (97.2%) were PTCs, 16 (1.3%) follicular thyroid carcinomas, and 19 (1.5%) other subtypes (Table 3). Patients with BRAF_MU exhibited higher rates of cervical lateral lymph node metastasis but a smaller tumor size compared to those with BRAF_WT in both the training and test sets (P<0.05). Sonographic features including the margin, echogenicity, and echogenic foci of thyroid lesions were significantly different between the BRAF_MU and BRAF_WT groups in the training and test sets (P<0.05). In the training cohort, patients with BRAF_MU were older and more frequently presented with central lymph node metastasis, vertical growth, solid composition, and mainly central vascularity (P<0.05). The demographic and clinical characteristics of patients in training and test cohorts are summarized in Table 4.

Table 3

Association of BRAF V600E mutation with the clinicopathological features of lesions

Feature	Level	Training set (n=880)			Test set (n=377)
Feature	Level	BRAF_WT (n=255)	BRAF_MU (n=625)	P	BRAF_WT (n=109)	BRAF_MU (n=268)	P
Age (years)		39.00 (30.00 to 50.00)	41.00 (34.00 to 50.00)	0.046	41.00 (31.00 to 54.00)	42.00 (34.00 to 51.00)	0.621
Sex	Male	60 (23.5)	184 (29.4)	0.090	29 (26.6)	82 (30.6)	0.518
Sex	Female	195 (76.5)	441 (70.6)	0.090	80 (73.4)	186 (69.4)	0.518
Multifocality	Negative	196 (76.9)	517 (82.7)	0.055	86 (78.9)	209 (78.0)	0.954
Multifocality	Positive	59 (23.1)	108 (17.3)	0.055	23 (21.1)	59 (22.0)	0.954
Pathological category	Papillary carcinoma	232 (91.0)	624 (99.8)	<0.001	98 (89.9)	268 (100)	<0.001
	Follicular carcinoma	10 (3.9)	1 (0.2)		5 (4.6)	0 (0)
	Other malignancy	13 (5.1)	0 (0)		6 (5.5)	0 (0)
Central lymph node metastasis	Negative	155 (60.8)	427 (68.3)	0.039	69 (63.3)	185 (69.0)	0.340
Central lymph node metastasis	Positive	100 (39.2)	198 (31.7)	0.039	40 (36.7)	83 (31.0)	0.340
Lateral lymph node metastasis	Negative	224 (87.8)	597 (95.5)	<0.001	96 (88.1)	257 (95.9)	0.010
Lateral lymph node metastasis	Positive	31 (12.2)	28 (4.5)	<0.001	13 (11.9)	11 (4.1)	0.010

Data are presented as median (interquartile range) or n (%). BRAF_MU, BRAF V600E mutation-positive; BRAF_WT, BRAF V600E mutation-negative.

Table 4

Association of BRAF V600E mutation with the ultrasonographic features of lesions

Feature	Level	Training set (n=880)			Test set (n=377)
Feature	Level	BRAF_WT (n=255)	BRAF_MU (n=625)	P	BRAF_WT (n=109)	BRAF_MU (n=268)	P
Tumor size (cm)		1.10 (0.70 to 1.65)	0.80 (0.60 to 1.20)	<0.001	1.00 (0.70 to 2.00)	0.80 (0.60 to 1.10)	<0.001
Tumor location	Left upper	24 (9.4)	68 (10.9)	0.520	14 (12.8)	38 (14.2)	0.844
	Left middle	49 (19.2)	147 (23.5)		16 (14.7)	52 (19.4)
	Left lower	43 (16.9)	78 (12.5)		16 (14.7)	34 (12.7)
	Right upper	24 (9.4)	64 (10.2)		8 (7.3)	23 (8.6)
	Right middle	59 (23.1)	147 (23.5)		32 (29.4)	63 (23.5)
	Right lower	49 (19.2)	107 (17.1)		20 (18.3)	48 (17.9)
	Isthmus	7 (2.8)	14 (2.3)		3 (2.8)	10 (3.7)
Shape	Wider than tall	129 (50.6)	201 (32.2)	<0.001	45 (41.3)	84 (31.3)	0.085
Shape	Taller than wide	126 (49.4)	424 (67.8)	<0.001	64 (58.7)	184 (68.7)	0.085
Margin	Smooth	6 (2.4)	2 (0.3)	0.011	3 (2.7)	0 (0)	0.004
	Ill-defined	214 (83.9)	509 (81.4)		90 (82.6)	205 (76.5)
	Lobulated or irregular	22 (8.6)	80 (12.8)		10 (9.2)	53 (19.8)
	Extrathyroidal extension	13 (5.1)	34 (5.5)		6 (5.5)	10 (3.7)
Composition	Spongiform	2 (0.8)	0 (0)	0.014	0 (0)	0 (0)	1.000
	Mixed cystic and solid	6 (2.3)	5 (0.8)		2 (1.8)	3 (1.1)
	Solid or almost completely solid	247 (96.9)	620 (99.2)		107 (98.2)	265 (98.9)
Echogenicity	Very hypoechoic	17 (6.7)	82 (13.1)	<0.001	4 (3.7)	29 (10.8)	<0.001
	Hypoechoic	212 (83.1)	520 (83.2)		92 (84.4)	231 (86.2)
	Hyperechoic or isoechoic	26 (10.2)	23 (3.7)		13 (11.9)	8 (3.0)
Echogenic foci	None	54 (21.2)	218 (34.9)	<0.001	27 (24.8)	82 (30.6)	0.033
	Large comet-tail artifacts	0 (0)	1 (0.2)		1 (0.9)	0 (0)
	Macrocalcifications	9 (3.5)	8 (1.2)		5 (4.6)	4 (1.5)
	Peripheral (rim) calcifications	0 (0)	1 (0.2)		0 (0)	1 (0.4)
	Punctate echogenic foci	138 (54.1)	324 (51.8)		55 (50.5)	153 (57.1)
	Mixed	54 (21.2)	73 (11.7)		21 (19.3)	28 (10.4)
Blood flow	None	10 (3.9)	32 (5.1)	0.003	4 (3.7)	17 (6.3)	0.560
	Only peripheral blood flow	12 (4.7)	14 (2.2)		5 (4.6)	6 (2.2)
	Mainly peripheral blood flow	80 (31.4)	156 (25)		32 (29.4)	71 (26.5)
	Mainly central blood flow	67 (26.3)	240 (38.4)		29 (26.6)	80 (29.9)
	Mixed blood flow	86 (33.7)	183 (29.3)		39 (35.8)	94 (35.1)
Ultrasound assessment of lymph node metastasis	Negative	179 (70.2)	499 (79.8)	0.003	74 (67.9)	214 (79.9)	0.019
Ultrasound assessment of lymph node metastasis	Positive	76 (29.8)	126 (20.2)	0.003	35 (32.1)	54 (20.1)	0.019

Data are presented as median (interquartile range) or n (%). BRAF_MU, BRAF V600E mutation-positive; BRAF_WT, BRAF V600E mutation-negative.

Performance of the clinical model

After the univariate logistic regression and multivariate logistic regression analyses, sex, age, tumor size, and multifocality were identified as predictive factors (Table 5). At the image level, these four factors were combined to construct a clinical model to predict the BRAF V600E mutation. The AUC, accuracy, sensitivity, and specificity of the model were 0.645 (95% CI: 0.630–0.660), 61.1%, 62.0%, and 60.3%, respectively, in the training set, while they were 0.644 (95% CI: 0.611–0.677), 58.5%, 59.2%, and 56.9%, respectively, in the test set. At the lesion level, the AUC, accuracy, sensitivity, and specificity of the clinical model were 0.636 (95% CI: 0.594–0.675), 61.3%, 62.4%, and 58.4%, respectively, in the training set, while they were 0.631 (95% CI: 0.570–0.695), 58.9%, 60.4%, and 55.0%, respectively, in the test set.

Table 5

Logistic regression for the assessment of factors in predicting the BRAF V600E mutation

Feature	Univariate		Multivariate
Feature	OR (95% CI)	P	OR (95% CI)	P
Sex	0.71 (0.63–0.80)	<0.001	0.61 (0.54–0.69)	<0.001
Age	1.02 (1.01–1.02)	<0.001	1.01 (1.00–1.01)	<0.001
Tumor size	0.53 (0.49–0.57)	<0.001	0.54 (0.50–0.58)	<0.001
Multifocality	0.60 (0.52–0.68)	<0.001	0.66 (0.57–0.75)	<0.001

CI, confidence interval; OR, odds ratio.

Performance and comparison of the radiomics models

Among six classifiers, XGBoost demonstrated superior performance at both the image and lesion levels. At the image level, the AUC, accuracy, sensitivity, and specificity were 0.772 (95% CI: 0.759–0.784), 71.0%, 72.4%, and 69.8%, respectively, in the training set, while they were 0.721 (95% CI: 0.690–0.751), 67.2%, 68.2%, and 64.8%, respectively, in the test set (Table 6 and Figure 4A,4B). At the lesion level, the AUC, accuracy, sensitivity, and specificity were 0.809 (95% CI: 0.776–0.844), 76.5%, 79.5%, and 69.0%, respectively, in the training set, while they were 0.745 (95% CI: 0.686–0.805), 70.3%, 73.9%, and 61.5%, respectively, in the test set (Table 7 and Figure 4C,4D).

Table 6

Performance of machine learning models in predicting the BRAF V600E mutation at the image level

Group	Model	AUC (95% CI)	Accuracy	Sensitivity	Specificity
Training set	LR	0.705 (0.692–0.718)	0.635	0.872	0.429
	SVM	0.628 (0.613–0.643)	0.595	0.734	0.474
	RF	0.673 (0.658–0.686)	0.628	0.668	0.593
	DT	0.692 (0.678–0.706)	0.638	0.762	0.531
	KNN	0.697 (0.683–0.712)	0.645	0.732	0.570
	XGBoost	0.772 (0.759–0.784)	0.710	0.724	0.698
Test set	LR	0.712 (0.682–0.743)	0.738	0.855	0.452
	SVM	0.619 (0.587–0.652)	0.640	0.706	0.479
	RF	0.670 (0.639–0.701)	0.631	0.645	0.595
	DT	0.655 (0.622–0.684)	0.672	0.752	0.476
	KNN	0.610 (0.578–0.641)	0.631	0.694	0.479
	XGBoost	0.721 (0.690–0.751)	0.672	0.682	0.648

AUC, area under the curve; CI, confidence interval; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Figure 4 The performance of six machine learning models (LR, SVM, RF, DT, KNN, and XGBoost) in the training and test set at the image and lesion level. (A,B) The ROC curves of the training and test sets at the image level, respectively. (C,D) The ROC curves of the training and test sets at the lesion level, respectively. AUC, area under the curve; CI, confidence interval; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; ROC, receiver operating characteristic; SVM, support vector machine; XGBoost, extreme gradient boosting.

Table 7

Performance of machine learning models in predicting the BRAF V600E mutation at the lesion level

Group	Model	AUC (95% CI)	Accuracy	Sensitivity	Specificity
Training set	LR	0.731 (0.693–0.769)	0.714	0.766	0.584
	SVM	0.643 (0.602–0.685)	0.692	0.786	0.463
	RF	0.708 (0.673–0.746)	0.697	0.741	0.588
	DT	0.774 (0.738–0.809)	0.748	0.781	0.667
	KNN	0.761 (0.725–0.797)	0.744	0.811	0.580
	XGBoost	0.809 (0.776–0.844)	0.765	0.795	0.690
Test set	LR	0.730 (0.671–0.792)	0.714	0.757	0.606
	SVM	0.614 (0.550–0.682)	0.645	0.731	0.431
	RF	0.689 (0.624–0.757)	0.695	0.728	0.615
	DT	0.703 (0.644–0.761)	0.682	0.754	0.505
	KNN	0.631 (0.567–0.701)	0.668	0.757	0.450
	XGBoost	0.745 (0.686–0.805)	0.703	0.739	0.615

AUC, area under the curve; CI, confidence interval; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Performance of the DL and combined models

In the test cohort, the efficacy of prediction for the BRAF V600E mutation of the DL model was superior to that of the radiomics model at both the image and lesion levels, with AUCs of 0.782 and 0.807, respectively (Tables 8,9 and Figures 5,6). At the lesion level, the combined model achieved the significantly highest diagnostic accuracy in both the training and test sets as compared to the clinical and radiomics models (P<0.05). The AUC, accuracy, sensitivity, and specificity of the combined model were 0.845 (95% CI: 0.815–0.875), 80.2%, 83.2%, and 72.9%, respectively, in the training set, and they were 0.814 (95% CI: 0.769–0.861), 78.2%, 84.0%, and 64.2%, respectively, in the test set (Table 9 and Figure 6). However, at the image and lesion levels, the AUC of the combined models in the test set was not significantly different from that of the DL models.

Table 8

Performance of the independent and combined models in predicting the BRAF V600E mutation at the image level

Group	Model	AUC (95% CI)	Accuracy	Sensitivity	Specificity	P (DeLong test)
Training set	Clinical model	0.645 (0.630–0.660)	0.611	0.620	0.603	<0.001
	Radiomics model	0.772 (0.759–0.784)	0.710	0.724	0.698	<0.001
	DL model	0.741 (0.728–0.754)	0.683	0.688	0.679	<0.001
	Combined model	0.818 (0.807–0.829)	0.749	0.786	0.717	–
Test set	Clinical model	0.644 (0.611–0.677)	0.585	0.592	0.569	<0.001
	Radiomics model	0.721 (0.690–0.751)	0.672	0.682	0.648	<0.001
	DL model	0.782 (0.755–0.807)	0.696	0.706	0.671	0.077
	Combined model	0.797 (0.769–0.823)	0.736	0.764	0.669	–

AUC, area under the curve; CI, confidence interval; DL, deep learning.

Table 9

Performance of the independent and combined models in predicting the BRAF V600E mutation at the lesion level

Group	Model	AUC (95% CI)	Accuracy	Sensitivity	Specificity	P (DeLong test)
Training set	Clinical model	0.636 (0.594–0.675)	0.613	0.624	0.584	<0.001
	Radiomics model	0.809 (0.776–0.844)	0.765	0.795	0.690	0.003
	DL model	0.770 (0.733–0.805)	0.733	0.765	0.655	<0.001
	Combined model	0.845 (0.815–0.875)	0.802	0.832	0.729	–
Test set	Clinical model	0.631 (0.570–0.695)	0.589	0.604	0.550	<0.001
	Radiomics model	0.745 (0.686–0.805)	0.703	0.739	0.615	<0.001
	DL model	0.807 (0.765–0.850)	0.753	0.817	0.596	0.629
	Combined model	0.814 (0.769–0.861)	0.782	0.840	0.642	–

AUC, area under the curve; CI, confidence interval; DL, deep learning.

Figure 5 The performance of clinical, radiomics, deep learning, and combined models in the training and test sets at the image level. (A,B) The ROC curves of the training and test sets at the image level, respectively. (C,D) The confusion matrix of the training and test set at the image level of combined model, respectively. AUC, area under the curve; CI, confidence interval; DL, deep learning; ROC, receiver operating characteristic.

Figure 6 The performance of clinical, radiomics, deep learning, and combined models in training and test sets at the lesion level. (A,B) The ROC curves of the training and test sets at the lesion level, respectively. (C,D) The confusion matrix of the training and test sets at the lesion level of combined model, respectively. AUC, area under the curve; CI, confidence interval; DL, deep learning; ROC, receiver operating characteristic.

Visual interpretation of the combined models

In Figure 7, the features are sorted by importance from top to bottom, with a string of colored dots representing individual patients with thyroid cancer. The DL features rank at the top, followed by radscores and clinical variables. Patients with high DL scores or radscores (red dots) tended to have a high positive impact on the BRAF V600E mutation prediction. The other features, including sex, age, lesion size, and multifocality, provide additional insights into the clinical factors of the lesion and their relevance to BRAF V600E mutation status. The waterfall plot was used to illustrate to individual interpretability. Figure 8 provides two representative predictive examples, demonstrating how the SHAP could help clinicians develop a more accurate diagnosis based on the individual characteristics of each patient. In the heatmaps in Figure 8, the regions with red color represent a significant impact on the model’s decision-making.

Figure 7 The SHAP summary plot of the combined model. The SHAP summary plot shows the impact of the top six features on the model predictions for BRAF V600E mutation status. Individual dots represent patients, and colors represent the different levels of influence on the model output. DL, deep learning; radscore, radiomics score; SHAP, Shapley additive explanations.

Figure 8 Two representative predictive examples. The images on the left show the ultrasound image of a patient with thyroid cancer. The images in the middle are the heatmaps generated via the Grad-CAM technique. The heatmaps in red highlight the significant regions for mutation status prediction. The images on the right are waterfall plots. Features are ranked from top to bottom according to their impact on the prediction for the individual. The gray numbers on the left side of the y-axis reflect the exact value of each feature. f(x) is the model prediction for the individual, and E[f(x)] is the average prediction. (A) In a 51-year-old female, all features contributed positively (red) toward the BRAF V600E mutation prediction. This resulted in a significant shift of the sample’s Shapley value toward the high-risk direction, with the SHAP algorithm outputting a value above the baseline. As a result, the integrated model accurately predicted this patient as having the BRAF V600E mutation. (B) In a 48-year-old male, the deep learning and radiomics features both significantly contributed to shifting the sample’s Shapley values toward the BRAF V600E wild-type direction (blue). Consequently, the SHAP algorithm assigned a Shapley value below the baseline, leading the integrated model to correctly predict this patient as having the BRAF V600E wild type. DL, deep learning; Grad-CAM, gradient-weighted class activation mapping; radscore, radiomics score; SHAP, Shapley additive explanations.

Discussion

In this study, we found that the XGBoost model had the highest performance in predicting the BRAF V600E mutation among the six ML models. Logistic analysis of clinical factors showed that sex, age, lesion size, and multifocality were independent predictors of the BRAF V600E mutation. The predictive performance of clinical-image combined model was significantly better than that of the clinical model and XGBoost model alone and exhibited comparable performance to that of the DL model. The models performed better at the lesion level than at the image level. The SHAP algorithm improved the interpretability for clinical-image combined model and visualized the BRAF V600E mutation prediction process at the patient level. Our model’s performance is driven by the high prevalence of BRAF V600E in PTC and its near absence in non-PTC subtypes. Consequently, this model is not generalizable to non-PTC thyroid cancers and should be used exclusively for the preoperative assessment of suspected PTC. This constraint reflects biological reality, not methodological bias.

We evaluated the performance of six ML models in predicting BRAF V600E mutation and identified the XGBoost model as the top performer. This finding underscores the robustness of XGBoost in handling complex datasets and its ability to capture intricate patterns that are critical for accurate mutation prediction. The superior performance of XGBoost aligns with previous studies that have highlighted its efficacy in various prediction tasks relevant to thyroid nodules, particularly in scenarios involving high-dimensional data (37-43).

Various methods were adopted to examine the relationship between thyroid ultrasound images and BRAF V600E mutation status. Several previous studies have found that demographic and ultrasound features recognized by the radiologists can predict BRAF V600E mutation status (44-46). The identification of these predictors not only enhances our understanding of the mutation’s epidemiology but also provides a foundation for incorporating clinical factors into predictive models. Other studies adopted radiomics methods to predict BRAF V600E mutation status based on the radiomics features extracted from thyroid ultrasound images (24-28,47). However, these studies employed relatively small sample sizes or reported low-to-moderate performances unable to meet the needs of clinical diagnosis. A few previous studies clarified the association of BRAF V600E mutation with features derived from novel ultrasound technologies, such as elasticity ultrasound (26,28) and contrast-enhanced ultrasound (18). These technologies are neither easily deployable in primary hospitals nor definitively recommended by current guidelines for thyroid cancer. For instance, Wu et al. constructed relatively effective radiomics and deep transfer learning models to predict BRAF V600E mutation status (48). In another study, a clinical-image fusion model demonstrated superior predictive performance compared to a clinical model, radiomics model, and DL model alone, but an explainable method to predict BRAF V600E mutation status at the individual level was lacking (49).

The combined model in our study, consisting of an integrated clinical model, radiomics model, and DL model, yielded an AUC of 0.845 and 0.814 in the training and test sets, respectively. This suggests that the integration of clinical and imaging data provides a more comprehensive representation of the underlying biological processes, leading to improved prediction accuracy. The combined model’s ability to leverage both structured clinical data and high-dimensional imaging features likely contributes to its enhanced performance, highlighting the importance of multimodal data integration in precision medicine. However, at the image and lesion levels, the AUC of the combined models in the test sets was not significantly different from those of the DL models. Our results are similar to those of Xiang et al., who demonstrated that the AUC of a combined DL model was not significantly different from that of DL-B model based on B-mode images (P=0.98) (50). The small sample size and substantial lesion heterogeneity might have limited the power to detect statistically significant AUC differences between the combined and DL models. Yet, the observed lack of improvement is not a failure of the combined model but an important empirical finding. First, this does not imply that the additional features carry no relevant signal; rather, in this specific cohort and task, the ResNet50 architecture alone was able to capture information that was at least as discriminative as the hand‑crafted radiomic features and the selected clinical variables. This represents a valuable negative result, suggesting that for similar homogeneous datasets and standardized imaging protocols, end‑to‑end deep learning may suffice, thereby simplifying future clinical deployment. Second, the combined model may offer improved generalizability across different imaging protocols or populations, a possibility that was not testable in our singlecohort study (51). Finally, the combined model can provide explainable predictors. In clinical decision‑making, interpretability is indispensable for building physician trust in AI‑assisted systems, meeting regulatory requirements, and avoiding errors that may arise from blackbox models (52).

Our results indicate that models operating at the lesion level outperform those at the image level. This observation emphasizes the importance of considering the lesion-level context in the prediction of BRAF V600E mutations. Lesion-level models likely capture a broader range of relevant features that may not be apparent at the image level. Although we employed a data-stitching strategy given our standardized single-center acquisition protocol and empirical validation of equivalent performance, we acknowledge that attention-based multiple-instance learning represents an important methodological advancement for multicenter studies with variable imaging practices (53).

The application of the SHAP algorithm in this study significantly improved the interpretability of the clinical-image combined model. This study identified DL, radscore, sex, age, tumor size and multifocality as the top 6 key factors to predict BRAF V600E mutation. By visualizing the contribution of each feature to the prediction process, the SHAP algorithm provided individualized insights into the factors driving BRAF V600E mutation predictions. Grad-CAM was used in this study to visualize the regions that contributed the most to the combined model prediction. This interpretability is crucial for clinical adoption, as it allows clinicians to understand and trust the model’s predictions, facilitating its integration into decision-making processes.

Certain limitations to this study should be addressed. First, although the analysis included numerous samples, all the patients in the training and test cohorts were recruited from a single center, and further validation with external data was lacking. Secondly, we employed a retrospective design, which inevitably introduced selection bias. A prospective cohort study is needed to evaluate how these variables may bias the model’s outcomes in the future. Finally, the manual segmentation approach is inherently inefficient and operator-dependent, introducing unquantified selection bias and limiting clinical scalability. This time-intensive process, requiring substantial expert effort per lesion, presents a significant barrier to real-world implementation and reproducibility across a range of clinical settings. We explicitly recognize that these limitations could be effectively overcome through the integration of advanced automated segmentation models incorporating convolutional neural networks or transformer architectures. Such automation would not only eliminate operator dependency and drastically reduce the processing time but also enable the standardized, objective delineation essential for large-scale clinical deployment. We have prioritized advancing this transition in our ongoing research agenda.

Conclusions

Our study demonstrates the potential of ML and DL, particularly XGBoost and clinical-image combined models, in predicting BRAF V600E mutation with high accuracy. The integration of clinical and imaging data, coupled with lesion-level analysis and interpretability tools such as SHAP, represents a significant advancement in the field of mutation prediction.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/dss

Funding: This work was supported by Guangdong Medical Science and Technology Research Foundation (No. B2023002) and Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2024ZD0525600). Funding source was not involved in the study design, data collection, analysis, interpretation, writing of the report, or decision to submit the article for publication.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2026-1-0299/coif). C.H. reports support from Noncommunicable Chronic Diseases-National Science and Technology Major Project. Y.G. reports support from Guangdong Medical Science and Technology Research Foundation. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committee of Guangdong Provincial People’s Hospital. Informed consent was waived in this retrospective study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17-48. [Crossref] [PubMed]
Miller KD, Fidler-Benaoudia M, Keegan TH, Hipp HS, Jemal A, Siegel RL. Cancer statistics for adolescents and young adults, 2020. CA Cancer J Clin 2020;70:443-59. [Crossref] [PubMed]
Boucai L, Zafereo M, Cabanillas ME. Thyroid Cancer: A Review. JAMA 2024;331:425-35. [Crossref] [PubMed]
Azaryan I, Maxwell C, Tran DH, Sipos JA, Endo M. Molecular testing for the management of indeterminate thyroid nodules. Endocr Relat Cancer 2026;33:e250258. [Crossref] [PubMed]
Hertzman Johansson C, Egyhazi Brage S. BRAF inhibitors in cancer therapy. Pharmacol Ther 2014;142:176-82. [Crossref] [PubMed]
Pisapia P, Pepe F, Iaccarino A, Sgariglia R, Nacchio M, Russo G, Gragnano G, Malapelle U, Troncone G. BRAF: A Two-Faced Janus. Cells 2020;9:2549. [Crossref] [PubMed]
Baloch ZW, Asa SL, Barletta JA, Ghossein RA, Juhlin CC, Jung CK. LiVolsi VA, Papotti MG, Sobrinho-Simões M, Tallini G, Mete O. Overview of the 2022 WHO Classification of Thyroid Neoplasms. Endocr Pathol 2022;33:27-63. [Crossref] [PubMed]
Chen X, Chen X, Xie W, Ge H, He H, Zhang A, Zheng J. BRAF-activated ARSI suppressed EREG-mediated ferroptosis to promote BRAF(V600E) (mutant) papillary thyroid carcinoma progression and sorafenib resistance. Int J Biol Sci 2025;21:128-42. [Crossref] [PubMed]
Silver JA, Bogatchenko M, Pusztaszeri M, Forest VI, Hier MP, Yang JW, Tamilia M, Payne RJ. BRAF V600E mutation is associated with aggressive features in papillary thyroid carcinomas ≤ 1.5 cm. J Otolaryngol Head Neck Surg 2021;50:63. [Crossref] [PubMed]
Liu X, Yan K, Lin X, Zhao L, An W, Wang C, Liu X. The association between BRAF (V600E) mutation and pathological features in PTC. Eur Arch Otorhinolaryngol 2014;271:3041-52. [Crossref] [PubMed]
Dell'Aquila M, Fiorentino V, Martini M, Capodimonti S, Cenci T, Lombardi CP, Raffaelli M, Pontecorvi A, Fadda G, Pantanowitz L, Larocca LM, Rossi ED. How limited molecular testing can also offer diagnostic and prognostic evaluation of thyroid nodules processed with liquid-based cytology: Role of TERT promoter and BRAF V600E mutation analysis. Cancer Cytopathol 2021;129:819-29. [Crossref] [PubMed]
Pizzimenti C, Fiorentino V, Ieni A, Rossi ED, Germanà E, Giovanella L, Lentini M, Alessi Y, Tuccari G, Campennì A, Martini M, Fadda G. BRAF-AXL-PD-L1 Signaling Axis as a Possible Biological Marker for RAI Treatment in the Thyroid Cancer ATA Intermediate Risk Category. Int J Mol Sci 2023;24:10024. [Crossref] [PubMed]
Xing M, Alzahrani AS, Carson KA, Shong YK, Kim TY, Viola D, et al. Association between BRAF V600E mutation and recurrence of papillary thyroid cancer. J Clin Oncol 2015;33:42-50. [Crossref] [PubMed]
Ye Z, Xia X, Xu P, Liu W, Wang S, Fan Y, Guo M. The Prognostic Implication of the BRAF V600E Mutation in Papillary Thyroid Cancer in a Chinese Population. Int J Endocrinol 2022;2022:6562149. [Crossref] [PubMed]
Cong R, Ouyang H, Zhou D, Li X, Xia F. BRAF V600E mutation in thyroid carcinoma: a large-scale study in Han Chinese population. World J Surg Oncol 2024;22:259. [Crossref] [PubMed]
Liu R, Zhu G, Tan J, Shen X, Xing M. Genetic trio of BRAF and TERT alterations and rs2853669TT in papillary thyroid cancer aggressiveness. J Natl Cancer Inst 2024;116:694-701. [Crossref] [PubMed]
Fiorentino V, Giordano W, Pizzimenti C, Zuccalà V, Ieni A, Molinario C, Cannavò S, Campennì A, Tralongo P, Martini M, Giuffrè G, Larocca LM, Fadda G, Rossi ED. Molecular profiling of thyroid nodules on cytologic samples: Findings from an Italian multi-institutional cohort. Cancer Cytopathol 2026;134:e70065. [Crossref] [PubMed]
Liu Y, He L, Yin G, Cheng L, Zeng B, Cheng J, Yang L. Association analysis and the clinical significance of BRAF gene mutations and ultrasound features in papillary thyroid carcinoma. Oncol Lett 2019;18:2995-3002. [Crossref] [PubMed]
Xu JM, Chen YJ, Dang YY, Chen M. Association Between Preoperative US, Elastography Features and Prognostic Factors of Papillary Thyroid Cancer With BRAF(V600E) Mutation. Front Endocrinol (Lausanne) 2019;10:902. [Crossref] [PubMed]
Lv Y, He X, Yang F, Guo L, Qi M, Zhang J, Wang H. Correlation of conventional ultrasound features and related factors with BRAFV600E gene mutation in papillary thyroid carcinoma. Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi 2021;35:925-9. [Crossref] [PubMed]
Sujini GN, Balakrishna S. Automated thyroid nodule classification in ultrasound imaging using a hybrid vision transformer and Wasserstein GAN with gradient penalty. Sci Rep 2025;15:40786. [Crossref] [PubMed]
Fiorentino V, Pizzimenti C, Franchina M, Micali MG, Russotto F, Pepe L, Militi GB, Tralongo P, Pierconti F, Ieni A, Martini M, Tuccari G, Rossi ED, Fadda G. The minefield of indeterminate thyroid nodules: could artificial intelligence be a suitable diagnostic tool? Diagn Histopathol 2023;29:396-401.
Jerbi F, Aboudi N, Khlifa N. Automatic classification of ultrasound thyroids images using vision transformers and generative adversarial networks. Sci Afr 2023;20:e01679.
Yoon J, Lee E, Koo JS, Yoon JH, Nam KH, Lee J, Jo YS, Moon HJ, Park VY, Kwak JY. Artificial intelligence to predict the BRAFV600E mutation in patients with thyroid cancer. PLoS One 2020;15:e0242806. [Crossref] [PubMed]
Kwon MR, Shin JH, Park H, Cho H, Hahn SY, Park KW. Radiomics Study of Thyroid Ultrasound for Predicting BRAF Mutation in Papillary Thyroid Carcinoma: Preliminary Results. AJNR Am J Neuroradiol 2020;41:700-5. [Crossref] [PubMed]
Wang YG, Xu FJ, Agyekum EA, Xiang H, Wang YD, Zhang J, Sun H, Zhang GL, Bo XS, Lv WZ, Wang X, Hu SD, Qian XQ. Radiomic Model for Determining the Value of Elasticity and Grayscale Ultrasound Diagnoses for Predicting BRAF(V600E) Mutations in Papillary Thyroid Carcinoma. Front Endocrinol (Lausanne) 2022;13:872153. [Crossref] [PubMed]
Tang J, Jiang S, Ma J, Xi X, Li H, Wang L, Zhang B. Nomogram based on radiomics analysis of ultrasound images can improve preoperative BRAF mutation diagnosis for papillary thyroid microcarcinoma. Front Endocrinol (Lausanne) 2022;13:915135. [Crossref] [PubMed]
Agyekum EA, Wang YG, Xu FJ, Akortia D, Ren YZ, Chambers KH, Wang X, Taupa JO, Qian XQ. Predicting BRAFV600E mutations in papillary thyroid carcinoma using six machine learning algorithms based on ultrasound elastography. Sci Rep 2023;13:12604. [Crossref] [PubMed]
Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin Transl Sci 2024;17:e70056. [Crossref] [PubMed]
Hou F, Zhu Y, Zhao H, Cai H, Wang Y, Peng X, Lu L, He R, Hou Y, Li Z, Chen T. Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: a multicenter study. EClinicalMedicine 2024;77:102913. [Crossref] [PubMed]
Teng X, Han K, Jin W, Ma L, Wei L, Min D, Chen L, Du Y. Development and validation of an early diagnosis model for bone metastasis in non-small cell lung cancer based on serological characteristics of the bone metastasis mechanism. EClinicalMedicine 2024;72:102617. [Crossref] [PubMed]
Ringel MD, Sosa JA, Baloch Z, Bischoff L, Bloom G, Brent GA, Brock PL, Chou R, Flavell RR, Goldner W, Grubbs EG, Haymart M, Larson SM, Leung AM, Osborne JR, Ridge JA, Robinson B, Steward DL, Tufano RP, Wirth LJ. 2025 American Thyroid Association Management Guidelines for Adult Patients with Differentiated Thyroid Cancer. Thyroid 2025;35:841-985. [Crossref] [PubMed]
Haddad RI, Bischoff L, Applewhite M, Bernet V, Blomain E, Brito M, Busaidy NL, Campbell M, DeLozier O, Duh QY, Ehya H, Grady E, Guo T, Haymart M, Hunt JP, Kandeel F, Kotwal A, Lamonica DM, Lorch J, Mandel SJ, Markovina S, Mydlarz W, Nabell L, Raeburn CD, Rezaee R, Ridge JA, Ritter H, Roth MY, Salgado SA, Scheri RP, Shah JP, Sipos JA, Sippel R, Sturgeon C, Wirth LJ, Wong RJ, Worden F, Yeh MW, Darlow S, Cassara CJ, Sliker B. NCCN Guidelines(R) Insights: Thyroid Carcinoma, Version 1.2025. J Natl Compr Canc Netw 2025;23.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, editors. Imagenet: A large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition; 2009: Ieee.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 2014;15:1929-58.
Qian X, Pei J, Zheng H, Xie X, Yan L, Zhang H, Han C, Gao X, Zhang H, Zheng W, Sun Q, Lu L, Shung KK. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng 2021;5:522-32. [Crossref] [PubMed]
Xiong Z, Shi Y, Zhang Y, Duan S, Ding Y, Zheng Q, Jiao Y, Yan J. Ultrasound radiomics based XGBoost model to differential diagnosis thyroid nodules and unnecessary biopsy rate: Individual application of SHapley additive exPlanations. J Clin Ultrasound 2024;52:305-14. [Crossref] [PubMed]
Arabi M, Nazari M, Salahshour A, Jenabi E, Hajianfar G, Khateri M, Shayesteh SP. A machine learning-based sonomics for prediction of thyroid nodule malignancies. Endocrine 2023;82:326-34. [Crossref] [PubMed]
Mao Y, Lan H, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Machine learning algorithms are comparable to conventional regression models in predicting distant metastasis of follicular thyroid carcinoma. Clin Endocrinol (Oxf) 2023;98:98-109. [Crossref] [PubMed]
Li Z, Nie W, Liu Q, Lin M, Li X, Zhang J, Liu T, Deng Y, Li S. A prognostic model for thermal ablation of benign thyroid nodules based on interpretable machine learning. Front Endocrinol (Lausanne) 2024;15:1433192. [Crossref] [PubMed]
Fu R, Deng S, Hu Y, Luo P, Yang H, Teng H, Zeng D, Ren J. Preoperative Evaluation of Cervical Lymph Node Metastasis in Patients With Hashimoto's Thyroiditis Combined With Thyroid Papillary Carcinoma Using Machine Learning and Radiomics-Based Features: A Preliminary Study. Sichuan Da Xue Xue Bao Yi Xue Ban 2024;55:1026-33. [Crossref] [PubMed]
Huang Y, Mao Y, Xu L, Wen J, Chen G. Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model. BMC Endocr Disord 2022;22:269. [Crossref] [PubMed]
Zhu H, Luo H, Li Y, Zhang Y, Wu Z, Yang Y. The superior value of radiomics to sonographic assessment for ultrasound-based evaluation of extrathyroidal extension in papillary thyroid carcinoma: a retrospective study. Radiol Oncol 2024;58:386-96. [Crossref] [PubMed]
Pelizzo MR, Dobrinja C, Casal Ide E, Zane M, Lora O, Toniato A, Mian C, Barollo S, Izuzquiza M, Guerrini J, De Manzini N, Merante Boschin I, Rubello D. The role of BRAF(V600E) mutation as poor prognostic factor for the outcome of patients with intrathyroid papillary thyroid carcinoma. Biomed Pharmacother 2014;68:413-7. [Crossref] [PubMed]
Kabaker AS, Tublin ME, Nikiforov YE, Armstrong MJ, Hodak SP, Stang MT, McCoy KL, Carty SE, Yip L. Suspicious ultrasound characteristics predict BRAF V600E-positive papillary thyroid carcinoma. Thyroid 2012;22:585-9. [Crossref] [PubMed]
Li HL, Zhang B. Correlations of Ultrasound Features With Gene Mutations and Pathologic Subtypes in Papillary Thyroid Carcinoma. Zhongguo Yi Xue Ke Xue Yuan Xue Bao 2024;46:747-55. [Crossref] [PubMed]
Shi H, Ding K, Yang XT, Wu TF, Zheng JY, Wang LF, Zhou BY, Sun LP, Zhang YF, Zhao CK, Xu HX. Prediction of BRAF and TERT status in PTCs by machine learning-based ultrasound radiomics methods: A multicenter study. J Clin Transl Endocrinol 2025;40:100390. [Crossref] [PubMed]
Wu F, Lin X, Chen Y, Ge M, Pan T, Shi J, Mao L, Pan G, Peng Y, Zhou L, Zheng H, Luo D, Zhang Y. Breaking barriers: noninvasive AI model for BRAF(V600E) mutation identification. Int J Comput Assist Radiol Surg 2025;20:935-47. [Crossref] [PubMed]
Yu Y, Zhao C, Guo R, Zhang Y, Li X, Liu N, Lu Y, Han X, Tang X, Mao R, Peng C, Yu J, Zhou J. Deep learning model based on ultrasound images predicts BRAF V600E mutation in papillary thyroid carcinoma. iScience 2025;28:112482. [Crossref] [PubMed]
Xiang H, Wang X, Xu M, Zhang Y, Zeng S, Li C, Liu L, Deng T, Tang G, Yan C, Ou J, Lin Q, He J, Sun P, Li A, Chen H, Heng PA, Lin X. Deep Learning-assisted Diagnosis of Breast Lesions on US Images: A Multivendor, Multicenter Study. Radiol Artif Intell 2023;5:e220185. [Crossref] [PubMed]
Nawaz A, Edinat A, Rana MRR, Ali T, Mustafa G, Tahir S, Lee SW. The role of multimodality in clinical disease diagnosis: advances, challenges, and opportunities. Front Public Health 2026;14:1788454. [Crossref] [PubMed]
Hassan MM, Tahsin A, Alam MGR, Alzamil D, Garg S, Uddin MZ, Choudhury N, Fortino G. Explainable multimodal fusion for breast carcinoma diagnosis: A systematic review, open problems, and future directions. Comput Methods Programs Biomed 2026;274:109152. [Crossref] [PubMed]
Vafaeezadeh M, Behnam H, Gifani P. Ultrasound Image Analysis with Vision Transformers-Review. Diagnostics (Basel) 2024;14:542. [Crossref] [PubMed]

Cite this article as: Zhang L, Huang C, Chen Z, Ying Y, Jiang N, Zhong X, Chen F, Guo Y, Luo S. Enhancing BRAF V600E mutation prediction in thyroid cancer through interpretable deep learning models combining clinical and ultrasound-based radiomics features. Quant Imaging Med Surg 2026;16(7):562. doi: 10.21037/qims-2026-1-0299

Enhancing BRAF V600E mutation prediction in thyroid cancer through interpretable deep learning models combining clinical and ultrasound-based radiomics features

Introduction

Methods

Study population

Ultrasound acquisition and evaluation

Clinicopathological data and clinical model construction

Tumor segmentation

Development of the DL models

Feature selection and development of the radiomics models

Table 1

Table 2

Construction of the image-level and lesion-level combined models

Model explanation

Statistical analysis

Results

Patient characteristics

Table 3

Table 4

Performance of the clinical model

Table 5

Performance and comparison of the radiomics models

Table 6

Table 7

Performance of the DL and combined models

Table 8

Table 9

Visual interpretation of the combined models

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share