Combined clinicopathological and ultrasonographic features for predicting extrathyroidal extension in thyroid cancer
Introduction
Thyroid cancer, a prevalent endocrine malignancy, requires careful evaluation of extrathyroidal extension (ETE), as it critically influences prognosis and therapeutic planning. According to the 8th edition American Joint Committee on Cancer (AJCC) staging system, ETE is classified into minimal ETE (tumor invasion into perithyroidal soft tissues) and gross ETE (invasion of strap muscles or critical cervical structures, including subcutaneous soft tissues, larynx, trachea, esophagus, recurrent laryngeal nerve (RLN), prevertebral fascia, or encasement of carotid/mediastinal vessels) (1-3). Gross ETE correlates with elevated risks of lymph node metastasis, distant metastasis, surgical complexity, recurrence, and reduced survival (4,5). According to the National Comprehensive Cancer Network (NCCN) guidelines, version 2.2022, ETE is recognized as an indication for total thyroidectomy (6). Therefore, preoperative assessment of ETE is critical for surgeons to determine optimal surgical strategies.
Imaging modalities for ETE evaluation include magnetic resonance imaging (MRI), computed tomography (CT), and ultrasonography (US), preferred for its cost-effectiveness, non-radiation exposure, real-time capability, and high resolution (7-9), has been widely adopted. Previous studies (3,10-12) proposed US criteria for assessing invasion of strap muscles, trachea, and RLN, offering diagnostic references for ETE (13). However, these US indicators face limitations such as subjectivity and low sensitivity (SEN) (3.7–71.4%) in detecting minimal ETE, hindering clinical precision (14).
To address these gaps, we developed a novel predictive model integrating diverse ultrasonographic features and clinicopathological characteristics. This multimodal approach comprehensively evaluates anatomical and biological factors influencing ETE, aiming to enhance preoperative diagnostic accuracy (ACC) and clinical utility. We present this article in accordance with the TRIPOD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2654/rc).
Methods
Study population
This retrospective study enrolled 435 thyroid cancer patients who underwent thyroidectomy at The First Affiliated Hospital, Jiangxi Medical College, Nanchang University (Institution 1) between January 2022 and February 2025. Inclusion criteria were: (I) patients receiving total or hemithyroidectomy for thyroid cancer; (II) preoperative thyroid US evaluation; (III) postoperative histopathological confirmation of thyroid cancer with documented ETE or non-ETE outcomes. Exclusion criteria included: (I) patients with recurrent thyroid cancer; (II) incomplete surgical or pathological records; (III) poor-quality US images (e.g., severe artifacts or low resolution); (IV) evidence of distant metastasis; (V) absence of preoperative serological testing; (VI) incomplete clinical data or missing follow-up records. Based on postoperative histopathology as the gold standard, patients were categorized into non-ETE and ETE groups. The cohort was randomly divided into training (70%) and validation (30%) sets. An external test cohort comprising 70 patients from Huashan Hospital, Fudan University (Institution 2), collected between November 2023 and September 2024, was established for independent validation (Figure 1).
For this external cohort, ultrasonographic features were retrospectively extracted from stored images by two certified radiologists with over 6 years of experience, both of whom were trained to apply the same standardized definitions and classification criteria used at Institution 1 (as detailed in section “Ultrasonographic data acquisition” and Figure S1). This study was registered at the Chinese Clinical Trial Registry (registration number: ChiCTR2500113286).
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of The First Affiliated Hospital, Jiangxi Medical College, Nanchang University (No. IIT-2024-724). Huashan Hospital and Fudan University were also informed of and agreed to the study. Individual consent for this retrospective analysis was waived.
Clinicopathological data collection
Clinical and pathological data, including gender, age, multifocality, lymph node status, and ETE subtypes, were retrieved from medical records. Based on prior classifications (1-3), ETE is categorized into minimal ETE—defined as tumor extension beyond the thyroid capsule into perithyroidal soft tissues (fat and/or fibrous connective tissue)—and gross ETE—characterized by visual evidence of tumor invasion into strap muscles, subcutaneous soft tissues, larynx, trachea, esophagus, RLN, and/or prevertebral fascia.
Preoperative laboratory measurements included free triiodothyronine (fT3, 2.50–3.90 pg/mL), free thyroxine (fT4, 0.60–1.60 ng/dL), thyroid-stimulating hormone (TSH, 0.30–5.60 µIU/mL), thyroglobulin antibody (Tg-Ab, 0.0–60.0 IU/mL), and thyroid peroxidase antibody (TPO-Ab, 0.0–60.0 IU/mL), quantified using a Siemens ADVIA Centaur chemiluminescence analyzer. Serum thyroglobulin (Tg, 1.40–78 ng/mL) levels were measured via a Roche Cobas E602 automated immunoassay analyzer. Values exceeding detection limits were estimated using interval grouping (15,16).
Ultrasonographic data acquisition
All patients underwent preoperative color Doppler US using standardized devices, including Siemens S2000 (Siemens Healthineers), Supersonic Aixplorer (Supersonic Imagine), Mindray Resona 7/DC-8 (Mindray Bio-Medical Electronics), Philips EPIQ7 (Philips Healthcare), and Toshiba Aplio 500 (Canon Medical Systems). Examinations were performed by certified radiologists with over 6 years of experience. Patients were positioned supine with the neck fully exposed. Ultrasonographic features evaluated included nodule contact with the thyroid capsule, tumor-tracheal angle, contact with the tracheoesophageal groove (TEG), and cervical lymph node status (Figure S1). The definitions and classification criteria for key ultrasonographic features are as follows:
- Contact of the nodule with the thyroid capsule: (I) no contact (the nodule does not about the thyroid capsule at any point); (II) capsular abutment (the nodule margin is in contact with the capsule, and the capsular structure remains intact); (III) capsular bulging (the nodule compresses the capsule, causing its outward convexity while maintaining capsular continuity); (IV) invasion of strap muscles (the nodule breaches the capsule and invades the strap muscles, resulting in loss of clear demarcation or disappearance of muscle fiber boundaries).
- Angle between Tumor and Trachea: based on the contact interface between the tumor and the tracheal wall, the angle formed between them is measured on two-dimensional ultrasound sections: (I) no contact (no adjacency between the tumor and the tracheal wall); (II) acute angle (angle <90°, presenting as a pointed contact between the tumor and the tracheal wall); (III) right angle (angle =90°, presenting as a perpendicular contact); (IV) obtuse angle (angle >90°, presenting as a broad-based contact).
- Contact of the nodule with the TEG: categorized into 3 types based on the anatomical relationship with the TEG and depth of involvement: (I) no contact (the nodule is not adjacent to the TEG region); (II) abutting TEG (the nodule margin is in contact with the fascia or mucosal surface of the TEG, with intact anatomical structures within the groove and no evidence of compression, displacement, or destruction); (III) protrusion into TEG (the nodule protrudes into the TEG, causing local fascial displacement, narrowing of the space, or invasion of the soft tissues within the groove, manifesting as local heterogeneous echotexture and/or ill-defined borders).
For each thyroid nodule, tumor diameter was measured in anteroposterior, transverse, and longitudinal planes, with the maximum diameter used to define tumor size. Additionally, results from the Chinese thyroid imaging reporting and data system (C-TIRADS) were recorded (17,18) (Table S1).
Feature selection and model development
The entire cohort was randomly divided into training (n=304) and validation (n=132) sets at a 7:3 ratio using computer-generated random numbers. In the training set, univariate analysis was first performed to compare clinical and pathological data between the ETE and non-ETE groups, identifying variables with statistical significance (P<0.05). Subsequently, these significant clinicopathological features and ultrasonographic characteristics (including parameters from the C-TIRADS) were incorporated into a modeling framework. A combined clinical-ultrasonographic model was developed using backward stepwise multivariate logistic regression based on the Akaike Information Criterion (AIC). Similarly, separate clinical and ultrasonographic models were constructed using clinicopathological features and ultrasonographic features alone, respectively.
To mitigate sampling bias from random splits and enhance model stability, we performed 10 additional independent training-test splits (maintaining a 7:3 ratio) using computer-generated random numbers. Based on the stable variable set established in the original study, we repeated the performance evaluation for each split, employing a comprehensive set of metrics including area under the curve (AUC), ACC, SEN, and specificity (SPE).
Model comparison and nomogram development
The predictive performance of the three models was systematically evaluated and compared by constructing receiver operating characteristic (ROC) curves, calculating the AUC, and applying DeLong’s test. A nomogram was developed to graphically represent the optimal model for ETE prediction in thyroid cancer. Calibration curves were used to assess the model’s goodness-of-fit, while decision curve analysis (DCA) was performed to quantify its clinical utility.
Subgroup analysis
To validate the robustness and generalizability of the combined model across diverse clinical scenarios, subgroup analyses were conducted to evaluate its discriminatory performance in four distinct subgroups: tumor size (microcarcinoma vs. non-microcarcinoma), extent of invasion (minimal ETE vs. gross ETE), Hashimoto’s thyroiditis status (Hashimoto’s vs. non-Hashimoto’s), and thyroid functional status (euthyroid vs. thyroid dysfunction).
Model interpretability
The SHapley Additive Explanations (SHAP) method was employed to enhance model interpretability. Rooted in cooperative game theory’s SHapley value framework, this approach quantifies the contribution of each feature to model predictions, providing clear insights into prediction mechanisms and key feature roles (19). Additionally, SHAP-derived visualization tools not only precisely evaluate individual variable contributions but also improve understanding of the decision-making process, thereby enhancing model transparency and credibility.
Statistical analysis
Statistical analyses were performed using R (version 4.4.2). Normally distributed continuous variables were presented as mean ± standard deviation, and intergroup comparisons were performed using one-way ANOVA. Non-normally distributed continuous data were presented as median (interquartile range), and group differences were evaluated using the Kruskal-Wallis test. Categorical variables were summarized as frequencies (percentages) and analyzed using χ2 tests or Fisher’s exact test. Subgroup differences in AUC were evaluated via the Hanley & McNeil Z-test, with statistical significance defined as a two-tailed P<0.05. Model fit was assessed using the AIC, and calibration curves were used to evaluate prediction reliability. Predictive performance and clinical utility were quantified using ROC curves, AUC, DCA, ACC, SPE, and SEN.
Results
Demographic and baseline characteristics
The training and validation cohorts (Institution 1) enrolled a total of 435 patients with histologically confirmed thyroid carcinoma, aged 12–75 years (mean age: 44.91±11.68 years), comprising 106 males and 399 females. Among them, 311 patients had ETE, including 245 with minimal ETE and 66 with gross ETE, while 194 had no ETE. The training cohort consisted of 304 patients (62 males, 242 females; mean age: 44.85±11.45 years), with 194 ETE and 110 non-ETE cases. The validation cohort included 132 patients (25 males, 106 females; mean age: 46.06±10.94 years), comprising 78 ETE and 53 non-ETE cases. The external test cohort (Institution 2) enrolled 70 patients aged 17–74 years (mean age: 43.04±13.78 years), with 19 males and 51 females, including 39 ETE (28 minimal, 11 gross) and 31 non-ETE cases. Baseline characteristics across cohorts are summarized in Table 1. The baseline patient characteristics demonstrated balanced distributions across the training, validation, and external test cohorts, with no statistically significant differences observed in any comparisons (all P>0.05).
Table 1
| Characteristic | Training set (n=304) | Validation set (n=131) | External test set (n=70) | P |
|---|---|---|---|---|
| Gender | 0.377 | |||
| Male | 62 (20.39) | 25 (19.08) | 19 (27.14) | |
| Female | 242 (79.61) | 106 (80.92) | 51 (72.86) | |
| Age (years) | 44.85±11.45 | 46.06±10.94 | 43.04±13.78 | 0.216 |
| <55 | 240 (78.95) | 100 (76.34) | 54 (77.14) | |
| ≥55 | 64 (21.05) | 31 (23.66) | 16 (22.86) | |
| Longest diameter ≥2 cm | 0.418 | |||
| No | 251 (82.57) | 102 (77.86) | 59 (84.29) | |
| Yes | 53 (17.43) | 29 (22.14) | 11 (15.71) | |
| Multifocality | 0.805 | |||
| No | 157 (51.64) | 70 (53.44) | 34 (48.57) | |
| Yes | 147 (48.36) | 61 (46.56) | 36 (51.43) | |
| Bilaterality | 0.643 | |||
| No | 202 (66.45) | 93 (70.99) | 48 (68.57) | |
| Yes | 102 (33.55) | 38 (29.01) | 22 (31.43) | |
| Isthmus | 0.428 | |||
| No | 270 (88.82) | 120 (91.60) | 60 (85.71) | |
| Yes | 34 (11.18) | 11 (8.40) | 10 (14.29) | |
| Central lymph node metastasis | 0.930 | |||
| No | 140 (46.05) | 61 (46.56) | 34 (48.57) | |
| Yes | 164 (53.95) | 70 (53.44) | 36 (51.43) | |
| Lateral lymph node metastasis | 0.996 | |||
| No | 240 (78.95) | 103 (78.63) | 55 (78.57) | |
| Yes | 64 (21.05) | 28 (21.37) | 15 (21.43) | |
| Hashimoto’s thyroiditis | 0.446 | |||
| No | 164 (53.95) | 62 (47.33) | 36 (51.43) | |
| Yes | 140 (46.05) | 69 (52.67) | 34 (48.57) | |
| Contact of the nodule with the thyroid capsule | 0.638 | |||
| No contact | 63 (20.72) | 26 (19.85) | 18 (25.71) | |
| Capsular contact | 143 (47.04) | 54 (41.22) | 34 (48.57) | |
| Contour bulging | 75 (24.67) | 40 (30.53) | 14 (20.00) | |
| Replacement of strap muscle | 23 (7.57) | 11 (8.40) | 4 (5.71) | |
| Angle between tumor and trachea | 0.853 | |||
| No contact | 169 (55.59) | 78 (59.54) | 39 (55.71) | |
| Acute angle | 63 (20.72) | 19 (14.50) | 14 (20.00) | |
| Right angle | 17 (5.59) | 8 (6.11) | 5 (7.14) | |
| Obtuse angle | 55 (18.09) | 26 (19.85) | 12 (17.14) | |
| Contact of the nodule with the TEG | 0.911 | |||
| No contact | 168 (55.26) | 76 (58.02) | 39 (55.71) | |
| Abutting TEG | 78 (25.66) | 34 (25.95) | 20 (28.57) | |
| Protrusion into TEG | 58 (19.08) | 21 (16.03) | 11 (15.71) | |
| C-TIRADS 3 | 0.371 | |||
| No | 288 (94.74) | 124 (94.66) | 69 (98.57) | |
| Yes | 16 (5.26) | 7 (5.34) | 1 (1.43) | |
| C-TIRADS 4A | 0.489 | |||
| No | 203 (66.78) | 80 (61.07) | 47 (67.14) | |
| Yes | 101 (33.22) | 51 (38.93) | 23 (32.86) | |
| C-TIRADS 4B | 0.833 | |||
| No | 180 (59.21) | 81 (62.31) | 42 (60.00) | |
| Yes | 124 (40.79) | 49 (37.69) | 28 (40.00) | |
| C-TIRADS 4C | 0.203 | |||
| No | 267 (87.83) | 118 (90.08) | 57 (81.43) | |
| Yes | 37 (12.17) | 13 (9.92) | 13 (18.57) | |
| C-TIRADS 5 | 0.887 | |||
| No | 278 (91.45) | 119 (90.84) | 65 (92.86) | |
| Yes | 26 (8.55) | 12 (9.16) | 5 (7.14) | |
| fT3 (pg/mL) | 3.14 (2.90–3.42) | 3.18 (2.92–3.45) | 3.18 (2.90–3.54) | 0.529 |
| fT4 (ng/dL) | 1.26 (1.12–1.40) | 1.26 (1.14–1.40) | 1.29 (1.18–1.37) | 0.606 |
| TSH (μIU/mL) | 1.74 (1.21–2.63) | 1.82 (1.12–2.62) | 1.81 (1.20–2.85) | 0.623 |
| TG-Ab (IU/mL) | 17.50 (14.60–36.45) | 20.30 (14.05–150.50) | 17.88 (13.41–74.56) | 0.568 |
| TPO-Ab (IU/mL) | 24.09 (15.28–35.32) | 24.20 (15.40–50.41) | 24.24 (18.13–45.87) | 0.481 |
| Tg (ng/mL) | 16.15 (6.62–40.15) | 15.20 (2.68–41.65) | 13.72 (3.87–28.06) | 0.351 |
Data are presented as number (%), median (interquartile range), or mean ± standard deviation. C-TIRADS, Chinese thyroid imaging reporting and data system; fT3, free triiodothyronine; fT4, free thyroxine; TEG, tracheoesophageal groove; TG-Ab, thyroglobulin antibody; Tg, thyroglobulin; TPO-Ab, thyroid peroxidase antibody; TSH, thyroid-stimulating hormone.
Clinicopathological model development
Univariate analysis revealed statistically significant differences between non-ETE and ETE groups in age ≥55 years, central lymph node metastasis, lateral lymph node metastasis, isthmic tumor, bilateral tumor, tumor multifocality, and larger tumor diameter (≥2 cm) (all P<0.05; Table 2). Multivariate logistic regression identified age ≥55 years, central lymph node metastasis, lateral lymph node metastasis, bilateral tumor, and larger tumor diameter (≥2 cm) as independent risk factors for ETE. The clinicopathological model demonstrated an AUC of 0.756 [95% confidence interval (CI): 0.703–0.809], ACC of 0.641, SEN of 0.505, and SPE of 0.882 in the training cohort. In the validation cohort, the model achieved an AUC of 0.792 (0.716–0.868), ACC of 0.725, SEN of 0.603, and SPE of 0.906. For the external test cohort, performance metrics included an AUC of 0.835 (0.741–0.930), ACC of 0.757, SEN of 0.718, and SPE of 0.807.
Table 2
| Variable | Univariate logistic regression | Multivariate logistic regression | |||
|---|---|---|---|---|---|
| OR (95% CI) | P | OR (95% CI) | P | ||
| Gender (female vs. male) | 0.804 (0.438–1.440) | 0.471 | |||
| Age (≥55 vs. <55 years) | 2.661 (1.410–5.344) | 0.004 | 4.921 (2.124–12.160) | <0.001 | |
| Longest diameter (≥2 vs. <2 cm) | 4.573 (2.108–11.450) | <0.001 | |||
| Multifocality (present vs. absent) | 1.801 (1.122–2.911) | 0.015 | |||
| Bilaterality (present vs. absent) | 2.409 (1.426–4.179) | 0.001 | 1.899 (0.959–3.816) | 0.067 | |
| Isthmus (present vs. absent) | 4.848 (1.848–16.660) | 0.004 | |||
| Central lymph node metastasis (present vs. absent) | 2.743 (1.701–4.470) | <0.001 | 1.759 (0.926–3.366) | 0.085 | |
| Lateral lymph node metastasis (present vs. absent) | 6.122 (2.853–15.220) | <0.001 | 2.538 (0.974–7.274) | 0.066 | |
| Hashimoto’s thyroiditis (present vs. absent) | 1.038 (0.650–1.664) | 0.875 | |||
| Contact of the nodule with the thyroid capsule (present vs. absent) | 3.191 (2.253–4.663) | <0.001 | 2.524 (1.655–3.964) | <0.001 | |
| Angle between tumor and trachea (present vs. absent) | 3.684 (2.521–5.817) | <0.001 | 1.682 (0.982–3.089) | 0.072 | |
| Contact of the nodule with the TEG (present vs. absent) | 5.508 (3.471–9.335) | <0.001 | 2.234 (1.030–5.140) | 0.048 | |
| C-TIRADS | |||||
| C-TIRADS 3 (present vs. absent) | 0.420 (0.146–1.160) | 0.095 | |||
| C-TIRADS 4A (present vs. absent) | 0.354 (0.214–0.579) | <0.001 | |||
| C-TIRADS 4B (present vs. absent) | 1.336 (0.828–2.172) | 0.238 | |||
| C-TIRADS 4C (present vs. absent) | 3.297 (1.420–9.007) | 0.010 | 3.556 (1.218–11.690) | 0.026 | |
| C-TIRADS 5 (present vs. absent) | 7.624 (2.203–48.040) | 0.006 | 4.981 (1.196–34.340) | 0.049 | |
| Serum markers | |||||
| fT3 (pg/mL) | 0.938 (0.645–1.376) | 0.735 | |||
| fT4 (ng/dL) | 0.810 (0.421–1.510) | 0.491 | |||
| TSH (μIU/mL) | 0.991 (0.915–1.075) | 0.783 | |||
| TG-Ab (IU/mL) | 1.000 (1.000–1.001) | 0.177 | |||
| TPO-Ab (IU/mL) | 1.001 (1.000–1.003) | 0.138 | |||
| Tg (ng/mL) | 1.001 (0.999–1.004) | 0.293 | |||
C-TIRADS, Chinese thyroid imaging reporting and data system; CI, confidence interval; fT3, free triiodothyronine; fT4, free thyroxine; OR, odds ratio; TEG, tracheoesophageal groove; TG-Ab, thyroglobulin antibody; Tg, thyroglobulin; TPO-Ab, thyroid peroxidase antibody; TSH, thyroid-stimulating hormone.
Ultrasonographic model development
Univariate analysis identified statistically significant differences between non-ETE and ETE groups in contact of the nodule with the TEG, contact of the nodule with the thyroid capsule, angle between tumor and trachea, and C-TIRADS categories (4A, 4C, and 5) (all P<0.05; Table 2). Contact of the nodule with the thyroid capsule, contact of the nodule with the TEG, and C-TIRADS 4A were independently significant (P<0.05) and incorporated into the ultrasonographic prediction model. In the training cohort, the ultrasound model achieved an AUC of 0.828 (95% CI: 0.783–0.873), ACC of 0.734, SEN of 0.695, and SPE of 0.800. In the validation cohort, the model yielded an AUC of 0.794 (95% CI: 0.720–0.868), ACC of 0.679, SEN of 0.641, and SPE of 0.736. Notably, performance declined in the external test cohort, with an AUC of 0.598 (95% CI: 0.463–0.733), ACC of 0.571, SEN of 0.590, and SPE of 0.548.
Development of the combined clinicopathological-ultrasonographic model
The clinicopathological-ultrasonographic model was constructed by integrating clinical-pathological risk factors with ultrasonographic predictors. Based on univariate analysis and multivariate stepwise logistic regression results from the training cohort (Table 2), the following variables were independently selected for inclusion in the model: age ≥55 years, central lymph node metastasis, lateral lymph node metastasis, bilateral tumor, angle between tumor and trachea, contact of the nodule with the thyroid capsule, contact of the nodule with the TEG, and C-TIRADS 4C/5 classifications.
In the training cohort, the combined model achieved an AUC of 0.872 (95% CI: 0.834–0.910), significantly outperforming the clinical model (AUC =0.756, P<0.05) and the ultrasound model (AUC =0.828, P<0.05). The combined model also demonstrated superior diagnostic performance across additional metrics, including ACC (ACC =0.786), SEN (SEN =0.763), and SPE (SPE =0.827). In the validation cohort, the combined model exhibited a higher predictive performance with an AUC of 0.835 compared to the clinicopathological model (AUC =0.792) and the ultrasound model (AUC =0.794), although these differences did not reach statistical significance (P=0.06 and P=0.11, respectively). Furthermore, the combined model demonstrated a superior balance between SEN (0.756) and SPE (0.736), underscoring its clinical utility in harmonizing diagnostic ACC across metrics. In the external test cohort, the combined model achieved the highest AUC of 0.858, significantly outperforming the ultrasound model (AUC =0.598, P<0.05), with no statistically significant difference compared to the clinical model (AUC =0.835, P=0.71). Additionally, it demonstrated superior performance across other metrics, including ACC (ACC =0.843), SEN (SEN =0.846), and SPE (SPE =0.839) (Table 3). Therefore, the combined model was identified as the optimal model. The supplementary analysis of 10 repeated splits confirmed the stable performance of the clinicopathological-ultrasonic combined model, with the following key performance metrics: a mean AUC of 0.873, ACC of 0.789, SEN of 0.772, and SPE of 0.820 in the training cohorts; and a mean AUC of 0.839, ACC of 0.743, SEN of 0.736, and SPE of 0.758 in the validation cohorts. These results are highly consistent with the original single-split outcomes, further validating the model’s robustness and reliability. Detailed data are provided in Table S2.
Table 3
| Model | AUC (95% CI) | ACC | SEN | SPE | P value (vs. clinicopathological) | P value (vs. US) |
|---|---|---|---|---|---|---|
| Training cohort | ||||||
| Clinicopathological | 0.756 (0.703–0.809) | 0.641 | 0.505 | 0.882 | – | – |
| US | 0.828 (0.783–0.873) | 0.734 | 0.695 | 0.800 | – | – |
| Clinicopathological-US | 0.872 (0.834–0.910) | 0.786 | 0.763 | 0.827 | <0.05 | <0.05 |
| Validation cohort | ||||||
| Clinicopathological | 0.792 (0.716–0.868) | 0.725 | 0.603 | 0.906 | – | – |
| US | 0.794 (0.720–0.868) | 0.679 | 0.641 | 0.736 | – | – |
| Clinicopathological-US | 0.835 (0.768–0.902) | 0.748 | 0.756 | 0.736 | 0.06 | 0.11 |
| External test cohort | ||||||
| Clinicopathological | 0.835 (0.741–0.930) | 0.757 | 0.718 | 0.807 | – | – |
| US | 0.598 (0.463–0.733) | 0.571 | 0.590 | 0.548 | – | – |
| Clinicopathological-US | 0.858 (0.767–0.949) | 0.843 | 0.846 | 0.839 | 0.71 | <0.05 |
ACC, accuracy; AUC, area under the curve; CI, confidence interval; SEN, sensitivity; SPE, specificity; US, ultrasound.
A nomogram visualizing the combined model is presented in Figure 2. Across all three cohorts, the combined model demonstrated strong concordance between predicted probabilities and actual ETE status (Figure 2) and yielded a higher overall net benefit across most threshold probability ranges (Figure 3).
Model interpretability
To enhance the interpretability and clinical applicability of our combined model, comprehensive SHapley value analyses were performed at both global and individual levels. For global interpretability, a Beeswarm plot (Figure 4) illustrates the relative importance of clinicopathological and ultrasonographic features, with clinicopathological features contributing 44.4% (4/9) and ultrasonographic features 55.6% (5/9). Yellow and purple colors denote high and low feature values, respectively, while SHapley values >0 indicate positive contributions to predictions. For individual interpretability, two representative cases (Figure 5) demonstrate how each feature positively or negatively influenced predictions in specific instances. The baseline value in the figure reflects the probability of the base prediction under reference conditions, while f (x) represents the final predicted probability adjusted by feature contributions.
To identify key predictors of ETE and clarify their directional influence, we further conducted a variable importance analysis using a Random Forest model and performed interpretability analysis with SHAP values. The variable importance results, based on both Mean Decrease in Gini and Mean Decrease in ACC, indicated that Contact of the nodule with the thyroid capsule was the most important predictor for ETE (Figures S2,S3), followed by the Angle between Tumor and Trachea and Contact of the nodule with the TEG. The SHAP dependence plots (Figure S4) validated this ranking and illustrated the direction of association between feature values and model output.
Subgroup analysis
To validate the performance of the combined model across subgroups, we evaluated its discriminatory ability in the following stratified cohorts: microcarcinomas (≤1 cm) vs. non-microcarcinomas, minimal ETE vs. gross ETE, Hashimoto’s thyroiditis vs. non-Hashimoto’s, and euthyroid vs. dysfunctional thyroid status. The combined model demonstrated robust performance across all subgroups. The combined model demonstrated the following performance across subgroups. In the microcarcinoma (≤1 cm) and non-microcarcinoma subgroups, the model achieved AUCs of 0.772 (95% CI: 0.723–0.836) and 0.893 (95% CI: 0.847–0.936), with ACC 0.729, SEN 0.605, SPE 0.817 and ACC 0.857, SEN 0.875, SPE 0.720, respectively. For minimal ETE and gross ETE subgroups, the AUCs were 0.824 (95% CI: 0.783–0.860) and 0.963 (95% CI: 0.935–0.984), accompanied by ACC 0.751, SEN 0.708, SPE 0.804 and ACC 0.851, SEN 0.973, SPE 0.804. In Hashimoto’s thyroiditis and non-Hashimoto’s thyroiditis subgroups, the model yielded AUCs of 0.875 (95% CI: 0.826–0.915) and 0.843 (95% CI: 0.794–0.887), with ACC 0.804, SEN 0.792, SPE 0.839 and ACC 0.763, SEN 0.752, SPE 0.781. Finally, in euthyroid and thyroid dysfunction subgroups, the AUCs were 0.852 (95% CI: 0.816–0.885) and 0.845 (95% CI: 0.710–0.956), alongside ACC 0.780, SEN 0.765, SPE 0.803 and ACC 0.788, SEN 0.818, SPE 0.727 (Table 4).
Table 4
| Subgroup | AUC (95% CI) | Accuracy | Sensitivity | Specificity | P |
|---|---|---|---|---|---|
| Cohort 1 | |||||
| Gross ETE | 0.963 (0.935–0.984) | 0.851 | 0.973 | 0.804 | <0.001 |
| Minimal ETE | 0.824 (0.708–0.860) | 0.751 | 0.708 | 0.804 | |
| Cohort 2 | |||||
| Microcarcinomas | 0.772 (0.723–0.836) | 0.729 | 0.605 | 0.817 | 0.001 |
| Non-microcarcinomas | 0.893 (0.847–0.936) | 0.857 | 0.875 | 0.720 | |
| Cohort 3 | |||||
| Hashimoto’s thyroiditis | 0.875 (0.826–0.915) | 0.804 | 0.792 | 0.839 | 0.325 |
| Non-Hashimoto’s thyroiditis | 0.843 (0.704–0.887) | 0.763 | 0.752 | 0.781 | |
| Cohort 4 | |||||
| Euthyroid | 0.852 (0.816–0.885) | 0.780 | 0.765 | 0.803 | 0.544 |
| Thyroid dysfunction | 0.845 (0.710–0.956) | 0.788 | 0.818 | 0.727 |
P values correspond to the comparison of the AUCs between subgroups, calculated using the Hanley-McNeil Z-test. AUC, area under the curve; CI, confidence interval; ETE, extrathyroidal extension.
The combined model exhibited a slightly lower AUC in the microcarcinoma subgroup compared to the non-microcarcinoma subgroup (P<0.05), whereas it showed superior discrimination for gross ETE over minimal ETE (AUC: 0.963 vs. 0.824, P<0.05). No significant differences in diagnostic performance were observed between Hashimoto’s and non-Hashimoto’s subgroups or between euthyroid and thyroid dysfunction groups (P>0.05).
Discussion
Accurate preoperative identification of ETE in thyroid cancer patients is critical for prognostic prediction and personalized surgical planning. ETE is a well-recognized risk factor for tumor recurrence and poor clinical outcomes in thyroid cancer (20,21). The presence of ETE complicates surgical approach selection and increases the risk of incomplete resection, which in turn leads to elevated morbidity and mortality rates (22). Additionally, overestimating ETE may lead to unnecessarily extensive surgeries, resulting in severe complications (23). Thus, ETE status is a pivotal determinant of surgical extent (3). However, current detection of ETE relies on postoperative histopathological examination (24,25), which introduces inherent delays. In this study, we developed a multimodal radiopathological model integrating clinicopathological and ultrasonographic features, demonstrating robust performance in discriminating non-ETE from ETE cases, with AUCs of 0.872 in the training cohort, 0.835 in the validation cohort, and 0.858 in the external test cohort.
Multimodal data models integrating diverse information are increasingly applied in clinical practice, enhancing medical decision-making efficiency by rapidly and accurately capturing complex tumor characteristics through multi-data fusion (19). The integration of multimodal data has emerged as a promising strategy for predicting ETE in thyroid cancer. In this study, the combined clinicopathological-ultrasonographic model, which synthesizes clinicopathological risk factors (e.g., age ≥55 years, lymph node metastasis, bilateral tumor) and ultrasonographic features (contact of the nodule with the thyroid capsule, contact of the nodule with the TEG, C-TIRADS classification), outperformed standalone clinicopathological or ultrasonographic models. This aligns with findings by Lu et al. (26), who developed a nomogram integrating clinical features and US-based radiomic signatures (AUC: 0.810). Similarly, Jiang et al. (16) established a combined model using US and contrast-enhanced US images from 216 thyroid cancer patients (AUC: 0.843). Our results are consistent with these studies but offer distinct advantages. Unlike the single-center design of Lu et al. (26), our study validated model stability using an external test cohort from Huashan Hospital, Fudan University (AUC =0.858). Compared to models relying on contrast-enhanced ultrasound (CEUS), our approach integrates conventional ultrasound features with C-TIRADS classifications, eliminates the need for specialized imaging techniques, and demonstrates robust performance across heterogeneous subgroups, including Hashimoto’s thyroiditis and thyroid dysfunction (AUC range: 0.772–0.963). This framework offers enhanced clinical generalizability and broader applicability in diverse clinical scenarios.
Previous studies have highlighted the value of US features, such as invasion of strap muscles, trachea, and RLN, for preoperative ETE diagnosis (3,10-12). Our findings corroborate this evidence, demonstrating that contact of the nodule with the thyroid capsule and contact of the nodule with the TEG were independent risk factors for ETE in multivariate logistic regression, with odds ratios (ORs) of 2.5 and 2.2, respectively, consistent with prior reports. Furthermore, we integrated tumor margin characteristics (e.g., angle between tumor and trachea, contact of the nodule with the TEG) from US imaging with the C-TIRADS system and clinicopathological factors, establishing a comprehensive evaluation framework that incorporates anatomical and biological tumor features. The results demonstrated that the combined model outperformed both the standalone clinicopathological model (AUC =0.756) and the standalone ultrasound model (AUC =0.828). While prior studies achieved high SPE in ETE identification, their SEN was limited (43.0%). In contrast, our combined model maintained comparable diagnostic AUC and SPE while significantly improving SEN (76.3% vs. 43.0%), further validating the effectiveness and clinical utility of the proposed multimodal framework. This finding further validates the efficacy and clinical utility of the proposed multimodal approach.
The performance decline of the standalone ultrasound model during external validation aligns with the common multicenter challenge of radiological practice heterogeneity. This stems from inter-observer variability in assessing key features like Contact of the nodule with the TEG and Angle between Tumor and Trachea, combined with inter-site differences in hardware, imaging protocols, and acquisition parameters. Moreover, as the ultrasonographic features in the external cohort were retrospectively extracted from stored images rather than prospectively assessed using a unified protocol, this methodological difference may have further contributed to the observed performance decline. Future generalization may be improved through cross-center standardization, inclusion of device covariates, and data calibration techniques.
Notably, the ultrasonographic manifestations of ETE in thyroid cancer are significantly influenced by disease-specific contexts and hormonal profiles (27,28). For instance, Hashimoto’s thyroiditis induces inflammatory changes in thyroid tissue, which may obscure ultrasound imaging features and complicate ETE assessment. Thyroid dysfunction, associated with thyroid cancer progression, may alter tumor biology through hormonal fluctuations, thereby modifying invasion patterns. Additionally, microcarcinomas (≤1 cm) and non-microcarcinomas exhibit distinct differences in growth patterns and metastatic potential, which may lead to variations in diagnostic ACC. Therefore, this study rigorously assessed the applicability of the integrated model across four distinct subgroups.
In the tumor size subgroup analysis, the integrated model demonstrated a lower AUC in the microcarcinoma (≤1 cm) subgroup compared to the non-microcarcinoma subgroup (0.772 vs. 0.893, P<0.05). This discrepancy may be attributed to the richer morphological and contextual information available for analysis in larger tumors, which further highlights the significant challenges faced by the model in accurately identifying ETE within small-sized thyroid lesions. Regarding the extent of ETE, the model demonstrated exceptional diagnostic performance in identifying gross ETE (AUC: 0.963), a critical factor for high-risk surgical decision-making. In the minimal ETE subgroup, the model achieved an AUC of 0.824, retaining significant discriminative ability and demonstrating preserved SEN to subtle invasion features, despite a statistically significant difference in AUC compared to the gross ETE subgroup (P<0.05).
Furthermore, the model maintained stable performance in subgroups stratified by Hashimoto’s thyroiditis and thyroid functional status. For Hashimoto’s vs. non-Hashimoto’s subgroups, AUCs were 0.875 and 0.843, respectively. Similarly, euthyroid vs. thyroid dysfunction subgroups showed AUCs of 0.852 and 0.845, with no statistical differences (P>0.05). These findings collectively demonstrate that the model’s efficacy remains unaffected by underlying thyroid autoimmune diseases or dysfunction, reinforcing its reliability in heterogeneous clinical populations.
To elucidate the internal decision-making mechanisms of the integrated model, SHAP analysis was employed for interpretability enhancement. Global and individual SHAP analyses revealed the specific contributions of each feature to model predictions. The Beeswarm plot identified five ultrasound-derived features, contact of the nodule with the TEG, contact of the nodule with the thyroid capsule, and angle between tumor and trachea emerging as the top three predictors. These features quantitatively characterize spatial relationships between tumors and adjacent anatomical structures, objectively mapping invasion pathways to provide standardized, anatomically grounded insights for ETE prediction. Among clinical features, four key factors were integrated, with age ≥55 years emerging as the most influential predictor, consistent with prior studies (16,29). The inclusion of clinical variables not only complements the anatomical information from US but also enriches the predictive framework by incorporating tumor biological behavior. The established role of age ≥55 years as a validated ETE risk factor, combined with ultrasonographic markers, enables the model to synergize tumor biological aggressiveness with anatomical evidence of invasion. This integration further underscores the efficacy and clinical utility of the proposed multimodal model.
However, there are several limitations in this study: (I) as a retrospective multicenter study, potential selection bias may exist. Future studies will incorporate larger multicenter datasets to enhance sample representativeness; (II) subgroup analyses identified performance variations in tumor size and invasion severity cohorts, where the model exhibited differential diagnostic ACC despite maintaining acceptable AUC values. This underscore potential limitations in its generalizability, which will be addressed in future studies through the integration of multimodal imaging and histopathological data.
Conclusions
The combined model integrating clinical and ultrasonographic features enables precise preoperative prediction of ETE in thyroid cancer patients. This approach not only mitigates overtreatment (e.g., unnecessary extensive resections) but also provides critical guidance for individualized surgical strategies, with the potential to improve prognostic outcomes and patient quality of life.
Acknowledgments
We would like to thank LySono Research Platform for providing significant resources and technical assistance for our research.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2654/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2654/dss
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2654/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of The First Affiliated Hospital, Jiangxi Medical College, Nanchang University (No. IIT-2024-724). Huashan Hospital and Fudan University were also informed of and agreed to the study. Individual consent for this retrospective analysis was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Su HK, Wenig BM, Haser GC, Rowe ME, Asa SL, Baloch Z, Du E, Faquin WC, Fellegara G, Giordano T, Ghossein R. LiVolsi VA, Lloyd R, Mete O, Ozbek U, Rosai J, Suster S, Thompson LD, Turk AT, Urken ML. Inter-Observer Variation in the Pathologic Identification of Minimal Extrathyroidal Extension in Papillary Thyroid Carcinoma. Thyroid 2016;26:512-7. [Crossref] [PubMed]
- Danilovic DLS, Castroneves LA, Suemoto CK, Elias LO, Soares IC, Camargo RY, Correa FA, Hoff AO, Marui S. Is There a Difference Between Minimal and Gross Extension into the Strap Muscles for the Risk of Recurrence in Papillary Thyroid Carcinomas? Thyroid 2020;30:1008-16. [Crossref] [PubMed]
- Chung SR, Baek JH, Choi YJ, Sung TY, Song DE, Kim TY, Lee JH. Sonographic Assessment of the Extent of Extrathyroidal Extension in Thyroid Cancer. Korean J Radiol 2020;21:1187-95. [Crossref] [PubMed]
- Bulfamante AM, Lori E, Bellini MI, Bolis E, Lozza P, Castellani L, Saibene AM, Pipolo C, Fuccillo E, Rosso C, Felisati G, De Pasquale L. Advanced Differentiated Thyroid Cancer: A Complex Condition Needing a Tailored Approach. Front Oncol 2022;12:954759. [Crossref] [PubMed]
- Qin Y, Sun W, Wang Z, Dong W, He L, Zhang T, Lv C, Zhang H. RBM47/SNHG5/FOXO3 axis activates autophagy and inhibits cell proliferation in papillary thyroid carcinoma. Cell Death Dis 2022;13:270. [Crossref] [PubMed]
- Haddad RI, Bischoff L, Ball D, Bernet V, Blomain E, Busaidy NL, et al. Thyroid Carcinoma, Version 2.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2022;20:925-51. [Crossref] [PubMed]
- Detweiler K, Elfenbein DM, Mayers D. Evaluation of Thyroid Nodules. Surg Clin North Am 2019;99:571-86. [Crossref] [PubMed]
- Jiao WP, Zhang L. Using Ultrasonography to Evaluate the Relationship between Capsular Invasion or Extracapsular Extension and Lymph Node Metastasis in Papillary Thyroid Carcinomas. Chin Med J (Engl) 2017;130:1309-13. [Crossref] [PubMed]
- Xu S, Ni X, Zhou W, Zhan W, Zhang H. Development and validation of a novel diagnostic tool for predicting the malignancy probability of thyroid nodules: A retrospective study based on clinical, B-mode, color doppler and elastographic ultrasonographic characteristics. Front Endocrinol (Lausanne) 2022;13:966572. [Crossref] [PubMed]
- Lamartina L, Bidault S, Hadoux J, Guerlain J, Girard E, Breuskin I, Attard M, Suciu V, Baudin E, Al Ghuzlan A, Leboulleux S, Hartl D. Can preoperative ultrasound predict extrathyroidal extension of differentiated thyroid cancer? Eur J Endocrinol 2021;185:13-22. [Crossref] [PubMed]
- Kwak JY, Kim EK, Youk JH, Kim MJ, Son EJ, Choi SH, Oh KK. Extrathyroid extension of well-differentiated papillary thyroid microcarcinoma on US. Thyroid 2008;18:609-14. [Crossref] [PubMed]
- Moon SJ, Kim DW, Kim SJ, Ha TK, Park HK, Jung SJ. Ultrasound assessment of degrees of extrathyroidal extension in papillary thyroid microcarcinoma. Endocr Pract 2014;20:1037-43. [Crossref] [PubMed]
- Qi Q, Huang X, Zhang Y, Cai S, Liu Z, Qiu T, Cui Z, Zhou A, Yuan X, Zhu W, Min X, Wu Y, Wang W, Zhang C, Xu P. Ultrasound image-based deep learning to assist in diagnosing gross extrathyroidal extension thyroid cancer: a retrospective multicenter study. EClinicalMedicine 2023;58:101905. [Crossref] [PubMed]
- Grani G, Cera G, Conzo G, Del Gatto V, di Gioia CRT, Maranghi M, Lucia P, Cantisani V, Metere A, Melcarne R, Borcea MC, Scorziello C, Menditto R, Summa M, Biffoni M, Durante C, Giacomelli L. Preoperative Ultrasonography in the Evaluation of Suspected Familial Non-Medullary Thyroid Cancer: Are We Able to Predict Multifocality and Extrathyroidal Extension? J Clin Med 2021;10:5277. [Crossref] [PubMed]
- Wang H, Zhao S, Yao J, Yu X, Xu D. Factors influencing extrathyroidal extension of papillary thyroid cancer and evaluation of ultrasonography for its diagnosis: a retrospective analysis. Sci Rep 2023;13:18344. [Crossref] [PubMed]
- Jiang L, Guo S, Zhao Y, Cheng Z, Zhong X, Zhou P. Predicting Extrathyroidal Extension in Papillary Thyroid Carcinoma Using a Clinical-Radiomics Nomogram Based on B-Mode and Contrast-Enhanced Ultrasound. Diagnostics (Basel) 2023;13:1734. [Crossref] [PubMed]
- Zhou J, Yin L, Wei X, Zhang S, Song Y, Luo B, et al. 2020 Chinese guidelines for ultrasound malignancy risk stratification of thyroid nodules: the C-TIRADS. Endocrine 2020;70:256-79. [Crossref] [PubMed]
- Chen Q, Lin M, Wu S. Validating and Comparing C-TIRADS, K-TIRADS and ACR-TIRADS in Stratifying the Malignancy Risk of Thyroid Nodules. Front Endocrinol (Lausanne) 2022;13:899575. [Crossref] [PubMed]
- Xiao W, Zhou W, Yuan H, Liu X, He F, Hu X, Ye X, Qin X. A radiopathomics model for predicting large-number cervical lymph node metastasis in clinical N0 papillary thyroid carcinoma. Eur Radiol 2025;35:4587-98. [Crossref] [PubMed]
- Li G, Li R, Song L, Chen W, Jiang K, Tang H, Wei T, Li Z, Gong R, Lei J, Zhu J. Implications of Extrathyroidal Extension Invading Only the Strap Muscles in Papillary Thyroid Carcinomas. Thyroid 2020;30:57-64. [Crossref] [PubMed]
- Haugen BR. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: What is new and what has changed? Cancer 2017;123:372-81. [Crossref] [PubMed]
- Kamaya A, Tahvildari AM, Patel BN, Willmann JK, Jeffrey RB, Desser TS. Sonographic Detection of Extracapsular Extension in Papillary Thyroid Cancer. J Ultrasound Med 2015;34:2225-30. [Crossref] [PubMed]
- Gambardella C, Offi C, Romano RM, De Palma M, Ruggiero R, Candela G, Puziello A, Docimo L, Grasso M, Docimo G. Transcutaneous laryngeal ultrasonography: a reliable, non-invasive and inexpensive preoperative method in the evaluation of vocal cords motility-a prospective multicentric analysis on a large series and a literature review. Updates Surg 2020;72:885-92. [Crossref] [PubMed]
- Wan F, He W, Zhang W, Zhang Y, Zhang H, Guang Y. Preoperative prediction of extrathyroidal extension: radiomics signature based on multimodal ultrasound to papillary thyroid carcinoma. BMC Med Imaging 2023;23:96. [Crossref] [PubMed]
- Lee CY, Kim SJ, Ko KR, Chung KW, Lee JH. Predictive factors for extrathyroidal extension of papillary thyroid carcinoma based on preoperative sonography. J Ultrasound Med 2014;33:231-8. [Crossref] [PubMed]
- Lu WJ, Mao L, Li J, OuYang LY, Chen JY, Chen SY, Lin YY, Wu YW, Chen SN, Qiu SD, Chen F. Three-dimensional ultrasound-based radiomics nomogram for the prediction of extrathyroidal extension features in papillary thyroid cancer. Front Oncol 2023;13:1046951. [Crossref] [PubMed]
- Shin JH, Baek JH, Chung J, Ha EJ, Kim JH, Lee YH, et al. Ultrasonography Diagnosis and Imaging-Based Management of Thyroid Nodules: Revised Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J Radiol 2016;17:370-95. [Crossref] [PubMed]
- Chakrabarty N, Mahajan A, Basu S, D'Cruz AK. Comprehensive Review of the Imaging Recommendations for Diagnosis, Staging, and Management of Thyroid Carcinoma. J Clin Med 2024;13:2904.
- Liu WL, Guan Q, Wen D, Ma B, Xu WB, Hu JQ, Wei WJ, Li DS, Wang Y, Xiang J, Liao T, Ji QH. PRDM16 Inhibits Cell Proliferation and Migration via Epithelial-to-Mesenchymal Transition by Directly Targeting Pyruvate Carboxylase in Papillary Thyroid Cancer. Front Cell Dev Biol 2021;9:723777. [Crossref] [PubMed]

