Prediction of thyroid function based on CT radiomics: a two-center study
Introduction
The thyroid gland, a critical endocrine organ in the human body, exerts a significant influence on the body’s metabolic processes, and overall growth and development. Beyond these essential roles, its impact extends critically to the cardiovascular system. Recent research has reported that thyroid hormones affect the cardiovascular system directly by acting on cardiac and vascular cells to regulate contractility and vascular tone, and indirectly by modifying key risk factors such as blood pressure and lipid profiles, as well as influencing coagulation and fibrinolysis (1).
Thyroid dysfunction is a common clinical condition that includes both hyperthyroidism and hypothyroidism. Hyperthyroidism is a clinical syndrome characterized by excessive thyroid hormone production by the thyroid gland, leading to increased metabolic activity and heightened excitability of the neuro-circulatory-digestive systems [thyroid-stimulating hormone (TSH) decreased and free thyroxine FT4 increased]. Hypothyroidism is defined as a systemic hypometabolic syndrome caused by low thyroid hormone levels or thyroid hormone resistance due to various reasons (TSH increased and FT4 decreased). These conditions not only affect the quality of life of patients, but can also lead to serious complications (2).
Traditional diagnostic methods rely on clinical symptoms, physical signs, and laboratory tests. However, these methods may not always be sufficiently sensitive or specific. In recent years, with the rapid development of medical imaging technology, computed tomography (CT) imaging has become increasingly important in the diagnosis and evaluation of thyroid disorders.
Radiomics, an emerging research field, provides new perspectives and methods for disease prediction and classification by extracting high-throughput feature information from medical images, and combining machine learning and deep learning algorithms (3). Previous studies have shown that CT radiomics performed well in the differentiation of benign and malignant thyroid nodules (4-7). However, the following question arises: Could CT radiomics, which is expected to offer accurate and efficient diagnosis, also be applied to thyroid dysfunction research?
This study sought to explore whether CT radiomics can feasibly and effectively predict thyroid dysfunction, enabling the early identification and treatment of patients with clinically insignificant symptoms. We collected thyroid CT imaging data from two different medical centers and combined it with patients’ clinical information to construct a comprehensive database, which may improve the generalizability and clinical applicability of the models (8,9). We present this article in accordance with the TRIPOD + AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/rc).
Methods
Patient enrollment
This retrospective study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Soochow University (No. JD-HG-2024-059), and the requirement individual consent for this retrospective analysis was waived. Changshu Hospital Affiliated to Nanjing University of Chinese Medicine was also informed of and approved the study.
Initially, the data of 1,250 participants were collected from The Second Affiliated Hospital of Soochow University between January 2018 and September 2023, as were the data of 450 participants from Changshu Hospital Affiliated to Nanjing University of Chinese Medicine during the same period. Thyroid function was detected by chemiluminescence immunoassay. Patients were included in the study if they met the following inclusion criteria: (I) had complete clinical data; (II) had a time interval between the CT scan and thyroid function test of less than 3 months; (III) had undergone a CT scan covering the entire thyroid tissue; and (IV) had not undergone any therapeutic interventions for thyroid conditions. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had a history of thyroid surgery; (II) had cervical lesions involving the thyroid; and/or (III) had unavailable CT images (Figure 1). Ultimately, 391 patients with hyperthyroidism, 393 patients with hypothyroidism, and 397 normal control participants from our hospital were included in the study. These patients were randomly divided into a training dataset (n=827) and an internal validation dataset (n=354) at a ratio of 7:3. Additionally, 415 patients (104 with hyperthyroidism, 110 with hypothyroidism, and 201 normal control participants) from another institute were used as the external validation dataset.
Measurement and selection of the clinical variables
The basic demographic information of the participants, including age and gender, was directly retrieved from the electronic medical record systems of the hospitals. The morphological features of the thyroid, including the left and right lobe anteroposterior diameter, left and right lobe inner-and-outer diameter, and the left and right lobe CT values were measured independently by two radiologists (with 7 and 13 years of experience, respectively). Both radiologists were blinded to the diagnostic reports. The mean values of these features were calculated. A multigroup significance test with Bonferroni correction was performed on the training dataset, and the clinical variables showing significant differences among the three groups were selected for model development.
Region of interest (ROI) annotation
The ITK-SNAP software (RRID:SCR_002010) was used to annotate the thyroid regions on the CT images. A radiologist with 7 years of experience manually segmented the ROI on the slice containing the largest left and right thyroid lobe, as well as on the two adjacent slices above and below (total slices: 5). These segmentations were subsequently reviewed and refined by a senior radiologist with over 13 years of experience. Both radiologists were blinded to the diagnostic reports.
Extraction and selection of radiomics features
Radiomics features were extracted from the labeled ROIs using the IBSI-compliant PyRadiomics package (RRID:SCR_026019). Prior to feature extraction, the CT images were re-sampled to a uniform voxel size of 1 mm × 1 mm × 1 mm. A total of 1,746 radiomics features were derived from each ROI by using a range of filters (i.e., Exponential, Gradient, Logarithm, Square, Square Root, and Wavelet). These features comprised 17 shape features, 323 first-order intensity statistics, and 1,406 high-order texture features [gray-level dependence matrix (GLDM; n=266), gray-level co-occurrence matrix (GLCM; n=437), gray-level size zone matrix (GLSZM; n=304), gray-level run length matrix (GLRLM; n=304), and neighboring gray-tone difference matrix (NGTDM; n=95)].
To identify the radiomics features most strongly associated with thyroid status, a systematic three-step feature selection process was employed. First, the intraclass correlation coefficient (ICC) was computed to evaluate feature reproducibility against variations in contour delineation. Only the features with ICC values >0.75 were considered stable and retained. Next, a Pearson correlation analysis was performed, and features with |r| values >0.95 were excluded to minimize redundancy. Finally, a multigroup t-test with Bonferroni correction was performed to identify features demonstrating statistically significant differences across groups, which were subsequently selected for model development.
Model construction and evaluation
Three predictive models were proposed in this study. The radiomics model was developed using a support vector machine (SVM) classifier based on the selected radiomics features, while the optimal parameters for the SVM classifier were determined through a grid-search approach. The clinical model was constructed using the logistic regression classifier based on the selected clinical variables. In addition, a combined model integrating the predicted risk scores for hyperthyroidism, normal thyroid function, and hypothyroidism from the radiomics model, as well as the selected clinical variables, was constructed using a logistic regression classifier. All the models were implemented using the scikit-learn package (RRID:SCR_002577). Median imputation was applied for the data with a missing rate less than 10%, and the clinical variables with a missing rate higher than 10% were not used for model development.
These predictive models were constructed in the training dataset, and their discriminative performance and clinical utility were assessed and compared in both the internal and external validation datasets. A flowchart of the study is shown in Figure 2.
Statistical analysis
All the statistical analyses were conducted using MedCalc software (RRID:SCR_015044) and SPSS software (RRID:SCR_002865). The differences in continuous variables between two groups were calculated using the t-test, while the categorical variables were compared using the Chi-squared test.
Performance in the three-class classification (hyperthyroidism, normal thyroid function, and hypothyroidism) was assessed using overall accuracy, the F1 score, and Cohen’s kappa. In addition, a receiver operating characteristic (ROC) curve analysis was employed to evaluate model performance in binary classifications for hyperthyroidism, normal thyroid function, and hypothyroidism by calculating the area under the curve (AUC) for each. The Delong’s test was applied to compare the AUC values of the two models. The sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV) were calculated at the optimal cut-off points, which were determined by maximizing the Youden index. For the binary classification of hyperthyroidism, normal thyroid function, and hypothyroidism, the goodness-of-fit of the predictive models was assessed using the Hosmer-Lemeshow test and visualized through calibration curves. Additionally, a decision curve analysis (DCA) was performed to evaluate and compare the clinical utility of the models by calculating the net benefits across a reasonable range of threshold probabilities.
A heatmap was generated using “HemI” software (version 1.0). The calibration analysis was performed using the R programming language (RRID:SCR_001905) with the “rms” package, while the DCA was conducted using the “rmda” package. A two-sided P value less than 0.05 was considered statistically significant.
Results
Patient characteristics
The clinical characteristics of the participants and the CT imaging features in the training, internal validation, and external validation datasets are listed in Table 1. The prevalence of thyroid nodules and the thyroid CT values of the participants in the external validation dataset were significantly higher than those of the participants in the training and internal validation datasets. No significant difference was observed among the three datasets in terms of the other clinical characteristics and CT imaging features.
Table 1
| Clinical variable | Training | Internal validation | External validation |
|---|---|---|---|
| Patient characteristics | |||
| Total No. of patients | 827 | 354 | 415 |
| Age (years) | 56.1±17.0 | 54.4±16.3 | 58.8±16.0 |
| Sex (female/male) | 517/310 | 221/133 | 247/168 |
| Thyroid nodules (no/yes) | 697/130 | 308/46 | 180/235 |
| Thyroid CT imaging features | |||
| Right anteroposterior diameter (mm) | 22.7±9.7 | 22.1±7.1 | 23.3±6.4 |
| Left anteroposterior diameter (mm) | 21.5±7.5 | 21.5±8.5 | 21.8±6.5 |
| Average anteroposterior diameter (mm) | 22.1±7.8 | 21.8±7.1 | 22.6±6.1 |
| Right inner-and-outer diameters (mm) | 14.1±5.6 | 13.8±5.4 | 15.0±5.7 |
| Left inner-and-outer diameters (mm) | 13.5±5.5 | 13.2±5.5 | 14.6±6.9 |
| Average inner-and-outer diameters (mm) | 13.8±5.3 | 13.5±5.2 | 14.8±5.7 |
| Right lobe thyroid CT value (HU) | 82.5±21.8 | 84.4±22.9 | 99.0±34.1 |
| Left lobe thyroid CT value (HU) | 83.2±22.5 | 84.7±22.7 | 103.2±82.9 |
| Average lobe thyroid CT value (HU) | 82.9±21.4 | 84.6±22.1 | 101.1±50.7 |
Data are presented as number or mean ± SD. CT, computed tomography; HU, Hounsfield unit; SD, standard deviation.
Selection of clinical variables
The clinical variables of the hyperthyroidism patients, hypothyroidism patients, and the participants with normal thyroid function were compared between each two groups. The age and the thyroid anteroposterior diameters (i.e., the right, left, and average anteroposterior diameters) in the CT images showed significant differences across the three groups, and were used for the construction of the clinical model and the combined model (Table 2).
Table 2
| Clinical variable | Hypo | Hyper | Normal | P | ||
|---|---|---|---|---|---|---|
| Hypo vs. hyper | Hypo vs. normal | Hyper vs. normal | ||||
| Patient characteristics | ||||||
| Total No. of patients | 283 | 284 | 260 | |||
| Age (years) | 65.0±13.9 | 58.5±16.7 | 43.7±12.7 | <0.001 | <0.001 | <0.001 |
| Sex (female/male) | 205/78 | 171/113 | 141/119 | 0.002 | <0.001 | 0.159 |
| Thyroid nodules (no/yes) | 250/33 | 205/79 | 242/18 | <0.001 | <0.001 | 0.059 |
| Thyroid CT imaging features | ||||||
| Right anteroposterior diameter (mm) | 18.9±5.1 | 28.5±7.5 | 20.6±12.4 | <0.001 | 0.029 | <0.001 |
| Left anteroposterior diameter (mm) | 17.9±5.0 | 27.4±8.5 | 19.1±3.9 | <0.001 | 0.001 | <0.001 |
| Average anteroposterior diameter (mm) | 18.4±4.8 | 27.9±7.6 | 19.9±6.9 | <0.001 | 0.003 | <0.001 |
| Right inner-and-outer diameters (mm) | 12.1±4.3 | 18.2±6.5 | 11.9±2.5 | <0.001 | 0.576 | <0.001 |
| Left inner-and-outer diameters (mm) | 11.6±4.0 | 17.3±6.6 | 11.4±2.5 | <0.001 | 0.505 | <0.001 |
| Average inner-and-outer diameters (mm) | 11.8±3.9 | 17.8±6.1 | 11.7±2.3 | <0.001 | 0.513 | <0.001 |
| Right lobe thyroid CT value (HU) | 77.4±22.4 | 75.6±20.5 | 95.5±16.1 | 0.320 | <0.001 | <0.001 |
| Left lobe thyroid CT value (HU) | 77.3±22.2 | 76.3±21.0 | 97.3±17.2 | 0.600 | <0.001 | <0.001 |
| Average lobe thyroid CT value (HU) | 77.3±21.3 | 76.0±20.2 | 96.4±16.0 | 0.431 | <0.001 | <0.001 |
Data are presented as number or mean ± SD. CT, computed tomography; HU, Hounsfield unit; SD, standard deviation.
Selection of radiomics features
The ICC analysis and Pearson correlation analysis identified 362 stable and non-redundant radiomics features. Following the multigroup t-test, the 21 radiomics features most strongly associated with thyroid status were selected for model development. The results of comparisons across the three groups (i.e., the hyperthyroidism, normal thyroid function, and hypothyroidism groups) and a heatmap of these selected features, based on standardized feature values, are presented in Table S1 and Figure 3 for the training, internal validation, and external validation datasets.
Model performance evaluation
The confusion matrixes for the three-class classification of hyperthyroidism, normal thyroid function, and hypothyroidism are shown in Figure 4. The combined model demonstrated the best classification performance, achieving an overall accuracy of 80.0% and an average F1 score of 0.779 in the training dataset, 74.3% and 0.739 in the internal validation dataset, and 70.1% and 0.697 in the external validation dataset, respectively. Good agreement between the predicted thyroid status and the ground truth was also observed, with Cohen’s kappa values for the combined model of 0.670, 0.609, and 0.531 in the training, internal validation, and external validation datasets, respectively. The overall accuracy, F1 score, and Cohen’s kappa results for all three models are summarized in Table 3.
Table 3
| Dataset | Model | Accuracy | F1hypo | F1normal | F1hyper | F1average | Weighted-Kappa |
|---|---|---|---|---|---|---|---|
| Training | Clinical | 68.2% | 0.696 | 0.673 | 0.675 | 0.682 | 0.588 |
| Radiomics | 74.6% | 0.691 | 0.789 | 0.754 | 0.743 | 0.662 | |
| Combined | 80.0% | 0.750 | 0.827 | 0.765 | 0.779 | 0.703 | |
| Internal validation | Clinical | 66.4% | 0.595 | 0.686 | 0.707 | 0.664 | 0.567 |
| Radiomics | 71.5% | 0.627 | 0.809 | 0.663 | 0.709 | 0.621 | |
| Combined | 74.3% | 0.670 | 0.833 | 0.689 | 0.739 | 0.655 | |
| External validation | Clinical | 53.3% | 0.469 | 0.581 | 0.536 | 0.540 | 0.424 |
| Radiomics | 67.2% | 0.409 | 0.821 | 0.615 | 0.660 | 0.556 | |
| Combined | 70.1% | 0.514 | 0.831 | 0.632 | 0.697 | 0.591 |
For the binary classification of hyperthyroidism/non-hyperthyroidism, normal/abnormal, and hypothyroidism/non-hypothyroidism, respectively, the ROC curves of the predictive models are also presented in Figure 5. The AUC values of the clinical model, radiomics model, and combined model for the binary classification of hyperthyroidism were 0.860, 0.880, and 0.906 in the training dataset, 0.859, 0.852, and 0.884 in the internal validation dataset, and 0.781, 0.839, and 0.861 in the external validation dataset, respectively. The Delong’s test showed that the AUC value of the combined model was significantly higher than that of the clinical model.
The ability of the models to distinguish between hypothyroidism and non-hypothyroidism was similar; the AUC values of the combined model were 0.895, 0.864, and 0.770 in the training, internal validation, and external validation datasets, which were higher than those of the radiomics model (0.842 in the training dataset, 0.813 in the internal validation dataset, and 0.715 in the external validation dataset) and those of the clinical model (0.849 in the training dataset, 0.797 in the internal validation dataset, and 0.688 in the external validation dataset).
For the binary classification of normal thyroid function, both the radiomics model and the combined model outperformed the clinical model with statistically significant results in all three datasets. The AUC values of the clinical model, radiomics model, and combined model were 0.851, 0.925, and 0.957 in the training dataset, 0.823, 0.920, and 0.950 in the internal validation dataset, and 0.784, 0.892, and 0.911 in the external validation dataset, respectively. The detailed SEN, SPE, PPV, and NPV of these models are presented in Tables 4-6.
Table 4
| Dataset | Model | AUC (95% CI) | P value | SEN, % | SPE, % | PPV, % | NPV, % |
|---|---|---|---|---|---|---|---|
| Training | Clinical | 0.860 (0.834–0.883) | Reference | 69.4 | 88.0 | 75.2 | 84.6 |
| Radiomics | 0.880 (0.856–0.902) | 0.147 | 77.5 | 85.3 | 73.3 | 87.9 | |
| Combined | 0.906 (0.884–0.925) | <0.001 | 85.9 | 81.0 | 70.3 | 91.7 | |
| Internal validation | Clinical | 0.859 (0.818–0.893) | Reference | 72.0 | 89.1 | 74.0 | 88.0 |
| Radiomics | 0.852 (0.810–0.887) | 0.759 | 72.9 | 84.2 | 66.7 | 87.8 | |
| Combined | 0.884 (0.845–0.915) | 0.130 | 87.9 | 77.3 | 62.7 | 93.6 | |
| External validation | Clinical | 0.781 (0.738–0.820) | Reference | 79.8 | 67.9 | 45.4 | 90.9 |
| Radiomics | 0.839 (0.800–0.873) | 0.029 | 91.4 | 66.2 | 47.5 | 95.8 | |
| Combined | 0.861 (0.824–0.893) | <0.001 | 80.8 | 76.5 | 53.5 | 92.2 |
AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.
Table 5
| Dataset | Model | AUC (95% CI) | P value | SEN, % | SPE, % | PPV, % | NPV, % |
|---|---|---|---|---|---|---|---|
| Training | Clinical | 0.849 (0.823–0.873) | Reference | 79.5 | 78.9 | 66.2 | 88.1 |
| Radiomics | 0.842 (0.815–0.866) | 0.705 | 69.6 | 86.6 | 73.0 | 84.6 | |
| Combined | 0.895 (0.872–0.915) | <0.001 | 88.0 | 78.7 | 68.2 | 92.6 | |
| Internal validation | Clinical | 0.797 (0.751–0.838) | Reference | 67.3 | 81.6 | 62.2 | 84.7 |
| Radiomics | 0.813 (0.769–0.853) | 0.627 | 70.0 | 82.0 | 63.6 | 85.8 | |
| Combined | 0.864 (0.824–0.898) | <0.001 | 84.6 | 77.5 | 62.8 | 91.7 | |
| External validation | Clinical | 0.688 (0.641–0.732) | Reference | 55.5 | 55.5 | 31.0 | 77.5 |
| Radiomics | 0.715 (0.669–0.758) | 0.459 | 76.4 | 59.3 | 40.4 | 87.4 | |
| Combined | 0.770 (0.727–0.810) | <0.001 | 76.4 | 64.9 | 44.0 | 88.4 |
AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.
Table 6
| Dataset | Model | AUC (95% CI) | P value | SEN, % | SPE, % | PPV, % | NPV, % |
|---|---|---|---|---|---|---|---|
| Training | Clinical | 0.851 (0.825–0.875) | Reference | 87.3 | 70.7 | 57.8 | 92.4 |
| Radiomics | 0.925 (0.905–0.942) | <0.001 | 88.1 | 83.4 | 70.9 | 93.8 | |
| Combined | 0.957 (0.941–0.970) | <0.001 | 90.0 | 90.3 | 81.0 | 95.2 | |
| Internal validation |
Clinical | 0.823 (0.780–0.862) | Reference | 86.1 | 67.3 | 62.4 | 88.5 |
| Radiomics | 0.920 (0.887–0.946) | <0.001 | 94.9 | 77.0 | 72.2 | 96.0 | |
| Combined | 0.950 (0.922–0.970) | <0.001 | 94.2 | 82.5 | 77.2 | 95.7 | |
| External validation |
Clinical | 0.784 (0.742–0.823) | Reference | 75.6 | 67.3 | 68.5 | 74.6 |
| Radiomics | 0.892 (0.858–0.920) | <0.001 | 79.6 | 87.9 | 86.0 | 82.1 | |
| Combined | 0.911 (0.879–0.937) | <0.001 | 87.6 | 81.8 | 81.9 | 87.5 |
AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.
Clinical utility analysis
All the models demonstrated good consistency between the predicted probabilities and actual rates in both the internal and external validation datasets. The non-significant Hosmer-Lemeshow statistics for the clinical model, radiomics model, and combined model showed no significant deviation from an ideal fitting, respectively (all P values >0.05). The corresponding calibration curves are presented in Figures S1,S2.
The DCA further demonstrated the clinical utility of the predictive models; all the models exhibited a higher net benefit than both the treat-none and treat-all strategies across almost the entire range of threshold probabilities. Additionally, the combined model consistently showed a higher net benefit than the clinical model across the majority of the threshold probabilities (Figure 6).
Discussion
This preliminary study developed a CT radiomics model for predicting thyroid function. Combining the model with clinical factors further improved the diagnostic efficiency of the model in predicting hyperthyroidism and hypothyroidism.
Ultrasonography is considered the optimal imaging modality for examining the thyroid gland, but it relies excessively on the clinical experience of ultrasound doctors, and its ability to visualize deep neck structures is limited (10). Thyroid CT scanning has been used as a supplementary examination, but its use is limited by its low sensitivity in diagnosis. However, neither ultrasonography nor CT can detect thyroid function.
The recent emergence of radiomics has led to a number of advances; radiomics can mine a large amount of deep data from images and improve the accuracy of image-based diagnoses (11). Extensive research has been conducted on the use of this technology in multiple diseases, including gastrointestinal tract diseases, pulmonary nodules, breast tumors, and other tumors (12-17). Preliminary studies have explored the application of radiomics to thyroid diseases, and shown its ability to distinguishing between different thyroid lesions (18-23). However, research on the diagnostic value of imaging in the prediction of thyroid function, which mainly uses ultrasonography and conventional non-contrast CT, is limited (24-28). Therefore, we sought to apply CT radiomics to develop models to predict thyroid function.
Our data revealed significantly lower mean thyroid CT values in the hyperthyroid (76.0±20.2 HU) and hypothyroid (77.3±21.3 HU) groups than the euthyroid group (96.4±16.0 HU) (both P<0.001), aligning with previous studies linking thyroid dysfunction to reduced tissue radiodensity (29-32). Notably, no significant differences were observed between the hyperthyroid and hypothyroid groups (P=0.431), which may reflect shared pathological disruptions despite distinct etiologies.
In hyperthyroidism, accelerated hormone synthesis depletes iodine-rich colloid stores in follicles and triggers immune-mediated inflammation, diluting functional thyroid parenchyma (29). The decreased CT density of the thyroid gland in patients with hypothyroidism is attributed to the replacement of thyroid follicular cells, the infiltration of inflammatory cells, and subsequent fibrosis (32). These pathological changes result in a reduction in thyroid CT density. Such findings suggest that reduced thyroid CT values could serve as a non-specific indicator of thyroid impairment, prompting further clinical investigation into underlying dysfunction.
In this study, a clinical model consisting of age and the thyroid anteroposterior diameters was developed. The age of the patients in the three groups (normal thyroid function, hyperthyroidism, and hypothyroidism) was statistically different, similar to the findings of previous studies (33-37). This may be related to several factors: (I) physiological degeneration: with age, thyroid tissue may undergo gradual degenerative changes, leading to a decrease in thyroid hormone synthesis and secretion, thus increasing the risk of hypothyroidism; (II) autoimmune factors: as we age, the body’s immune system function may change, leading to an increased incidence of autoimmune thyroid disease; and (III) environmental and lifestyle factors: older patients face more environmental and habitual challenges (e.g., smoking, drinking, and diet), which affect thyroid function, and increase the risk of hypothyroidism or hyperthyroidism.
In our study, we found that the anteroposterior diameter of the thyroid was strongly correlated with thyroid function. It is speculated that the thyroid exhibits compensatory hyperplasia in response to hypothyroidism, and that hyperthyroidism or hypothyroidism is often associated with various complications, such as hyperthyroidism-associated ophthalmopathy and hypothyroidism-associated myxedema, which indirectly affect thyroid structure and function through immune responses, inflammatory states, and metabolic abnormalities.
This study clarified the correlation between radiomics features and hyperthyroidism/hypothyroidism, and screened out 21 key radiomics features. These features may reflect the microscopic structure and functional changes of the thyroid gland under disease conditions, providing strong information support for subsequent diagnostic models.
The SVM algorithm was then used for model building. The SVM algorithm is a machine learning method with excellent performance in handling complex classification problems. Through training and validation, we examined the clinical, radiomic, and combined models. The radiomics model performed well in the three-class diagnosis of hyperthyroidism/hypothyroidism, and performed better than the clinical model in both the training and validation sets, reflecting the value of radiomics in auxiliary diagnosis. The diagnostic efficacy of the combined model was further improved, with the highest F1 and kappa values in the internal and external validation sets, while the ROC curves showed excellent binary diagnostic performance. Therefore, based on the results above, when patients undergo CT examination and thyroid abnormalities are detected, radiomics analysis may be considered to predict thyroid function and to help detect patients with underlying or atypical clinical symptoms of thyroid dysfunction.
Although the results of this study were encouraging, it still had some limitations. First, etiological stratification of hyperthyroidism cases was not performed; however, we intend to develop a CT radiomics-based predictive model to automatically differentiate between etiologies (e.g., Graves’ disease and toxic nodular goiter). Second, an analysis of TSH, a continuous variable in thyroid function, was omitted; thus, longitudinal studies need to be conducted to quantify dynamic relationships between TSH fluctuations and radiomics feature evolution.
Despite the limitations of this study, the findings of this two-center investigation underscore the significant clinical application potential. The most immediate clinical application lies in opportunistic screening; by developing an automated tool that can be integrated with the existing Picture Archiving and Communication Systems (PACS), the model could analyze the thyroid gland incidentally captured in scans performed for other indications (e.g., cervical spine or chest CT), thereby identifying individuals at high risk of thyroid dysfunction without additional radiation exposure or cost. This would facilitate early detection and prompt further endocrine laboratory evaluation. While the two-center design enhances the generalizability of our findings compared to a single-center study, future efforts must still focus on larger-scale, multi-center and prospective validation to rigorously assess the model’s robustness across an even broader range of populations and CT scanner platforms further. Concurrently, some research is needed to address the technical challenges of seamless integration into clinical workflows and to explore the biological underpinnings of the key radiomic features to enhance the model’s interpretability. Ultimately, studies investigating the impact of this tool on diagnostic efficiency, and patient clinical benefit will be crucial to definitively establish its clinical utility.
Conclusions
The predictive model based on plain CT radiomics showed good efficacy in the diagnosis of hyperthyroidism/hypothyroidism, and its accuracy was further improved when it was combined with clinical information. Our findings provide novel ideas and methods for the auxiliary diagnosis of thyroid diseases, and our models may have clinical application value.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/dss
Funding: The work was financially supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/coif). R.Y. reports that this work was supported by PreResearch Fund Project of The Second Affiliated Hospital of Soochow University (No. SDFEYLC2445). X.N. reports that this work was supported by Jiangsu Research Hospital Society Infection Imaging Research Special Fund Project (No. GY202308). D.J. reports that this work was supported by the Project of State Key Laboratory of Radiation Medicine and Protection, Soochow University (No. GZK12023041). The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted according to the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Soochow University (No. JD-HG-2024-059) and individual consent for this retrospective analysis was waived. Changshu Hospital Affiliated to Nanjing University of Chinese Medicine was also informed and approved the study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Jakubiak GK, Pawlas N, Morawiecka-Pietrzak M, Zalejska-Fiolka J, Stanek A, Cieślar G. Relationship of Thyroid Volume and Function with Ankle-Brachial Index, Toe-Brachial Index, and Toe Pressure in Euthyroid People Aged 18-65. Medicina (Kaunas) 2024;60:1445. [Crossref] [PubMed]
- Wu W, Fang X, Li J, Zhang A, Zou Y, Zheng X. Application of dual-source computed tomography in the diagnosis of thyroid cancer and evaluation of biological behaviors. J Clin Ultrasound 2023;51:195-202. [Crossref] [PubMed]
- Gao SY, Zhang XY, Wei W, Li XT, Li YL, Xu M, Sun YS, Zhang XP. Identification of benign and malignant thyroid nodules by in vivo iodine concentration measurement using single-source dual energy CT: A retrospective diagnostic accuracy study. Medicine (Baltimore) 2016;95:e4816. [Crossref] [PubMed]
- Song Z, Li Q, Zhang D, Li X, Yu J, Liu Q, Li Z, Huang J, Zhang X, Tang Z. Nomogram based on spectral CT quantitative parameters and typical radiological features for distinguishing benign from malignant thyroid micro-nodules. Cancer Imaging 2023;23:13. [Crossref] [PubMed]
- Kong D, Zhang J, Shan W, Duan S, Guo L. CT radiomics model for differentiating malignant and benign thyroid nodules. Chinese Journal of Radiology 2020;54:187-91.
- Chen DW, Lang BHH, McLeod DSA, Newbold K, Haymart MR. Thyroid cancer. Lancet 2023;401:1531-44. [Crossref] [PubMed]
- Jiang L, Liu D, Long L, Chen J, Lan X, Zhang J. Dual-source dual-energy computed tomography-derived quantitative parameters combined with machine learning for the differential diagnosis of benign and malignant thyroid nodules. Quant Imaging Med Surg 2022;12:967-78. [Crossref] [PubMed]
- Chen C, Liu Y, Yao J, Wang K, Zhang M, Shi F, Tian Y, Gao L, Ying Y, Pan Q, Wang H, Wu J, Qi X, Wang Y, Xu D. Deep learning approaches for differentiating thyroid nodules with calcification: a two-center study. BMC Cancer 2023;23:1139. [Crossref] [PubMed]
- Liu Y, Chen C, Wang K, Zhang M, Yan Y, Sui L, Yao J, Zhu X, Wang H, Pan Q, Wang Y, Liang P, Xu D. The auxiliary diagnosis of thyroid echogenic foci based on a deep learning segmentation model: A two-center study. Eur J Radiol 2023;167:111033. [Crossref] [PubMed]
- Superficial Organ and Vascular Ultrasound Group, Society of Ultrasound in Medicine, Chinese Medical Association. Chinese Artificial Intelligence Alliance for Thyroid and Breast Ultrasound. 2020 Chinese Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules The: C-TIRADS. Chin J Ultrasonogr 2021;30:185-200.
- Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
- Sohn JH, Fields BKK. Radiomics and Deep Learning to Predict Pulmonary Nodule Metastasis at CT. Radiology 2024;311:e233356. [Crossref] [PubMed]
- Miranda J, Horvat N, Araujo-Filho JAB, Albuquerque KS, Charbel C, Trindade BMC, Cardoso DL, de Padua Gomes de Farias L, Chakraborty J, Nomura CH. The Role of Radiomics in Rectal Cancer. J Gastrointest Cancer 2023;54:1158-80. [Crossref] [PubMed]
- Xia T, Zhao B, Li B, Lei Y, Song Y, Wang Y, Tang T, Ju S. MRI-Based Radiomics and Deep Learning in Biological Characteristics and Prognosis of Hepatocellular Carcinoma: Opportunities and Challenges. J Magn Reson Imaging 2024;59:767-83. [Crossref] [PubMed]
- Qi YJ, Su GH, You C, Zhang X, Xiao Y, Jiang YZ, Shao ZM. Radiomics in breast cancer: Current advances and future directions. Cell Rep Med 2024;5:101719. [Crossref] [PubMed]
- Warkentin MT, Al-Sawaihey H, Lam S, Liu G, Diergaarde B, Yuan JM, Wilson DO, Atkar-Khattra S, Grant B, Brhane Y, Khodayari-Moez E, Murison KR, Tammemagi MC, Campbell KR, Hung RJ. Radiomics analysis to predict pulmonary nodule malignancy using machine learning approaches. Thorax 2024;79:307-15. [Crossref] [PubMed]
- Yang J, Cai H, Liu N, Huang J, Pan Y, Zhang B, Tong M, Zhang Z. Application of radiomics in ischemic stroke. J Int Med Res 2024;52:3000605241238141. [Crossref] [PubMed]
- Tuna IS. Editorial Comment: CT Radiomics-An Emerging Tool for Thyroid Nodule Evaluation. AJR Am J Roentgenol 2024;223:e2431393. [Crossref] [PubMed]
- Xu H, Wang X, Guan C, Tan R, Yang Q, Zhang Q, Liu A, Liu Q. Value of Whole-Thyroid CT-Based Radiomics in Predicting Benign and Malignant Thyroid Nodules. Front Oncol 2022;12:828259. [Crossref] [PubMed]
- Lin S, Gao M, Yang Z, Yu R, Dai Z, Jiang C, Yao Y, Xu T, Chen J, Huang K, Lin D. CT-Based Radiomics Models for Differentiation of Benign and Malignant Thyroid Nodules: A Multicenter Development and Validation Study. AJR Am J Roentgenol 2024;223:e2431077. [Crossref] [PubMed]
- Li Z, Zhong Y, Lv Y, Zheng J, Hu Y, Yang Y, Li Y, Sun M, Liu S, Guo Y, Zhang M, Zhou L. A CT based radiomics analysis to predict the CN0 status of thyroid papillary carcinoma: a two- center study. Cancer Imaging 2024;24:62. [Crossref] [PubMed]
- Gurun E, Cakir IM, Ozturk M. Radiomics of Thyroid Malignancy: Going Beyond the Picture. Acad Radiol 2023;30:2169-71. [Crossref] [PubMed]
- Wu X, Yu P, Jia C, Mao N, Che K, Li G, Zhang H, Mou Y, Song X. Radiomics Analysis of Computed Tomography for Prediction of Thyroid Capsule Invasion in Papillary Thyroid Carcinoma: A Multi-Classifier and Two-Center Study. Front Endocrinol (Lausanne) 2022;13:849065. [Crossref] [PubMed]
- Jeong SH, Hong HS, Lee JY. The association between thyroid echogenicity and thyroid function in pediatric and adolescent Hashimoto's thyroiditis. Medicine (Baltimore) 2019;98:e15055. [Crossref] [PubMed]
- Li ZT, Wang M, Pan DM, Ren ZJ, Liu HM. LI Q. Dual-source CT evaluation on correlation between thyroid function and thyroid iodine concentration. Chin J Med Imaging Technol 2020;36:610-3.
- Shi XD, Liu H. Clinical study on relationship between thyroid CT value and thyroid function. Chinese Journal of CT and MRI 2020;18:36-8.
- Ji Y. The diagnostic value of thyroid ultrasonography combined with thyroid function index in benign and malignant thyroid nodules. Journal of Clinical Research 2015;32:2330-3.
- Chen JG, Lu JF, Liu CD, Ruan DB. Study on the definition of ultrasonic thyroid parameters and their relationship with thyroid function in pregnant women. Chinese Journal of Medical Ultrasound 2018;15:822-5. (Electronic edition).
- Lee Y. Dual-energy computed tomography-based volumetric thyroid iodine quantification: correlation with thyroid hormonal status, pathologic diagnosis, and phantom validation. Diagn Interv Radiol 2025;31:226-33. [Crossref] [PubMed]
- Scheepers MHMC, Al-Difaie ZJJ, Bouvy ND, Havekes B, Postma AA. Four-Dimensional Dual-Energy Computed Tomography-Derived Parameters and Their Correlation with Thyroid Gland Functional Status. Tomography 2025;11:22. [Crossref] [PubMed]
- Chaudhary P, Pamnani J, Rana K, Khandekar AK. Exploring the Association Between Thyroid Density Assessed by Non-contrast Computed Tomography and Serum Thyroid-Stimulating Hormone (TSH) Levels in Hypothyroid Patients. Cureus 2023;15:e48653. [Crossref] [PubMed]
- Kikuchi T, Hanaoka S, Nakao T, Nomura Y, Yoshikawa T, Alam MA, Mori H, Hayashi N. Relationship between Thyroid CT Density, Volume, and Future TSH Elevation: A 5-Year Follow-Up Study. Life (Basel) 2023;13:2303. [Crossref] [PubMed]
- Zhang R, Dong J, Li Y, Xiao S, Qiu L. A Horizontal and Longitudinal Study on the Changes of Aging Thyroid Function in Elderly Male Population. Discov Med 2024;36:827-35. [Crossref] [PubMed]
- Qiu L, Wang DC, Tao Xu, Cheng YQ, Sun Q, Hu YY, Liu HC, Lu SY, Yang G, Wang Z. Effects of sex, age and season on thyroid hormone reference interval. National Medical Journal of China 2018;20:1582-7. [Crossref] [PubMed]
- Yao C, Wu M, Liu M, Chen X, Zhu H, Xiong C, Wang D, Xiang Y, Suo G, Wang J, Sun H, Yuan C, Xia Y. Age- and sex-specific reference intervals for thyroid hormones in a Chinese pediatrics: a prospective observational study of 1,279 healthy children. Transl Pediatr 2021;10:2479-88. [Crossref] [PubMed]
- Walsh JP. Thyroid Function across the Lifespan: Do Age-Related Changes Matter? Endocrinol Metab (Seoul) 2022;37:208-19. [Crossref] [PubMed]
- Cai YZ, He DD, Wang YY, Liu XY, Xu XL, Dong LJ, Liu N, Yu DD, Wang N. Correlation between different ages and pubertal development stages and reference intervals of thyroid function indicators in adolescent females. Fudan University Journal of Medical Sciences 2024;51:566-73.



