Prediction of thyroid function based on CT radiomics: a two-center study
Original Article

Prediction of thyroid function based on CT radiomics: a two-center study

Rui Yu1#, Yanhuan Tan2#, Jinpeng Hou1#, Yiting Chen3, Xiaoqiong Ni1, Liang Xu1, Guohua Fan1, Dan Jin1,4

1Department of Radiology, The Second Affiliated Hospital of Soochow University, Suzhou, China; 2Department of Radiology, Changshu Hospital Affiliated to Nanjing University of Chinese Medicine, Suzhou, China; 3Suzhou Medical College, Soochow University, Suzhou, China; 4State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China

Contributions: (I) Conception and design: D Jin; (II) Administrative support: L Xu, G Fan; (III) Provision of study materials or patients: R Yu, Y Tan; (IV) Collection and assembly of data: X Ni, Y Chen; (V) Data analysis and interpretation: R Yu, Y Tan, J Hou; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Dan Jin, MD. Department of Radiology, The Second Affiliated Hospital of Soochow University, 1055 Sanxiang Road, Suzhou 215004, China; State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China. Email: jindan0512@126.com.

Background: In clinical practice, some patients with hyperthyroidism or hypothyroidism may lack typical clinical manifestations. Due to its popularity, this study sought to explore whether the computed tomography (CT) examination method could be used to conduct radiomics research to identify such patients. Specifically, this study aimed to investigate the value of radiomics based on non-contrast CT in the auxiliary diagnosis of hyperthyroidism/hypothyroidism.

Methods: The data of 1,181 patients at The Second Affiliated Hospital of Soochow University were retrospectively collected. The patients were randomly divided into a training group (n=827) and an internal validation group (n=354) at a ratio of 7:3. Additionally, the data of 415 patients from Changshu Hospital Affiliated to Nanjing University of Chinese Medicine were collected to serve as the external validation set. Radiomics features were extracted from non-contrast CT images, and the features related to hyperthyroidism/hypothyroidism were selected by a significance analysis and Pearson correlation analysis. A clinical model was then built based on clinical indicators, and a combined model was developed by integrating radiomics-predicted risk values with clinical indicators. The model’s three-class results were analyzed using a confusion matrix, and its binary classification diagnosis results were assessed using receiver operating characteristic (ROC) curves.

Results: A total of 21 radiomics features and four clinical features were screened for modeling. The average F1 scores and kappa values for the three-classification diagnosis of thyroid function in the clinical, radiomics, and combined models were 0.664, 0.709, 0.739 and 0.493, 0.556, and 0.609 in the internal validation set, and 0.540, 0.660, 0.697, and 0.310, 0.483, and 0.531 in the external validation set, respectively. The ROC curve analysis showed that the combined model outperformed the clinical and radiomics model in the two-classification diagnosis in both the internal and external validation sets.

Conclusions: The predictive model based on non-contrast CT radiomics had good diagnostic efficacy for hyperthyroidism/hypothyroidism. Its accuracy was further improved by combining clinical information. This model could alert clinicians to patients lacking typical clinical manifestations.

Keywords: Thyroid function; non-contrast computed tomography (non-contrast CT); radiomics


Submitted Mar 02, 2025. Accepted for publication Sep 24, 2025. Published online Oct 21, 2025.

doi: 10.21037/qims-2025-525


Introduction

The thyroid gland, a critical endocrine organ in the human body, exerts a significant influence on the body’s metabolic processes, and overall growth and development. Beyond these essential roles, its impact extends critically to the cardiovascular system. Recent research has reported that thyroid hormones affect the cardiovascular system directly by acting on cardiac and vascular cells to regulate contractility and vascular tone, and indirectly by modifying key risk factors such as blood pressure and lipid profiles, as well as influencing coagulation and fibrinolysis (1).

Thyroid dysfunction is a common clinical condition that includes both hyperthyroidism and hypothyroidism. Hyperthyroidism is a clinical syndrome characterized by excessive thyroid hormone production by the thyroid gland, leading to increased metabolic activity and heightened excitability of the neuro-circulatory-digestive systems [thyroid-stimulating hormone (TSH) decreased and free thyroxine FT4 increased]. Hypothyroidism is defined as a systemic hypometabolic syndrome caused by low thyroid hormone levels or thyroid hormone resistance due to various reasons (TSH increased and FT4 decreased). These conditions not only affect the quality of life of patients, but can also lead to serious complications (2).

Traditional diagnostic methods rely on clinical symptoms, physical signs, and laboratory tests. However, these methods may not always be sufficiently sensitive or specific. In recent years, with the rapid development of medical imaging technology, computed tomography (CT) imaging has become increasingly important in the diagnosis and evaluation of thyroid disorders.

Radiomics, an emerging research field, provides new perspectives and methods for disease prediction and classification by extracting high-throughput feature information from medical images, and combining machine learning and deep learning algorithms (3). Previous studies have shown that CT radiomics performed well in the differentiation of benign and malignant thyroid nodules (4-7). However, the following question arises: Could CT radiomics, which is expected to offer accurate and efficient diagnosis, also be applied to thyroid dysfunction research?

This study sought to explore whether CT radiomics can feasibly and effectively predict thyroid dysfunction, enabling the early identification and treatment of patients with clinically insignificant symptoms. We collected thyroid CT imaging data from two different medical centers and combined it with patients’ clinical information to construct a comprehensive database, which may improve the generalizability and clinical applicability of the models (8,9). We present this article in accordance with the TRIPOD + AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/rc).


Methods

Patient enrollment

This retrospective study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Soochow University (No. JD-HG-2024-059), and the requirement individual consent for this retrospective analysis was waived. Changshu Hospital Affiliated to Nanjing University of Chinese Medicine was also informed of and approved the study.

Initially, the data of 1,250 participants were collected from The Second Affiliated Hospital of Soochow University between January 2018 and September 2023, as were the data of 450 participants from Changshu Hospital Affiliated to Nanjing University of Chinese Medicine during the same period. Thyroid function was detected by chemiluminescence immunoassay. Patients were included in the study if they met the following inclusion criteria: (I) had complete clinical data; (II) had a time interval between the CT scan and thyroid function test of less than 3 months; (III) had undergone a CT scan covering the entire thyroid tissue; and (IV) had not undergone any therapeutic interventions for thyroid conditions. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had a history of thyroid surgery; (II) had cervical lesions involving the thyroid; and/or (III) had unavailable CT images (Figure 1). Ultimately, 391 patients with hyperthyroidism, 393 patients with hypothyroidism, and 397 normal control participants from our hospital were included in the study. These patients were randomly divided into a training dataset (n=827) and an internal validation dataset (n=354) at a ratio of 7:3. Additionally, 415 patients (104 with hyperthyroidism, 110 with hypothyroidism, and 201 normal control participants) from another institute were used as the external validation dataset.

Figure 1 Patient enrollment flowchart. CT, computed tomography.

Measurement and selection of the clinical variables

The basic demographic information of the participants, including age and gender, was directly retrieved from the electronic medical record systems of the hospitals. The morphological features of the thyroid, including the left and right lobe anteroposterior diameter, left and right lobe inner-and-outer diameter, and the left and right lobe CT values were measured independently by two radiologists (with 7 and 13 years of experience, respectively). Both radiologists were blinded to the diagnostic reports. The mean values of these features were calculated. A multigroup significance test with Bonferroni correction was performed on the training dataset, and the clinical variables showing significant differences among the three groups were selected for model development.

Region of interest (ROI) annotation

The ITK-SNAP software (RRID:SCR_002010) was used to annotate the thyroid regions on the CT images. A radiologist with 7 years of experience manually segmented the ROI on the slice containing the largest left and right thyroid lobe, as well as on the two adjacent slices above and below (total slices: 5). These segmentations were subsequently reviewed and refined by a senior radiologist with over 13 years of experience. Both radiologists were blinded to the diagnostic reports.

Extraction and selection of radiomics features

Radiomics features were extracted from the labeled ROIs using the IBSI-compliant PyRadiomics package (RRID:SCR_026019). Prior to feature extraction, the CT images were re-sampled to a uniform voxel size of 1 mm × 1 mm × 1 mm. A total of 1,746 radiomics features were derived from each ROI by using a range of filters (i.e., Exponential, Gradient, Logarithm, Square, Square Root, and Wavelet). These features comprised 17 shape features, 323 first-order intensity statistics, and 1,406 high-order texture features [gray-level dependence matrix (GLDM; n=266), gray-level co-occurrence matrix (GLCM; n=437), gray-level size zone matrix (GLSZM; n=304), gray-level run length matrix (GLRLM; n=304), and neighboring gray-tone difference matrix (NGTDM; n=95)].

To identify the radiomics features most strongly associated with thyroid status, a systematic three-step feature selection process was employed. First, the intraclass correlation coefficient (ICC) was computed to evaluate feature reproducibility against variations in contour delineation. Only the features with ICC values >0.75 were considered stable and retained. Next, a Pearson correlation analysis was performed, and features with |r| values >0.95 were excluded to minimize redundancy. Finally, a multigroup t-test with Bonferroni correction was performed to identify features demonstrating statistically significant differences across groups, which were subsequently selected for model development.

Model construction and evaluation

Three predictive models were proposed in this study. The radiomics model was developed using a support vector machine (SVM) classifier based on the selected radiomics features, while the optimal parameters for the SVM classifier were determined through a grid-search approach. The clinical model was constructed using the logistic regression classifier based on the selected clinical variables. In addition, a combined model integrating the predicted risk scores for hyperthyroidism, normal thyroid function, and hypothyroidism from the radiomics model, as well as the selected clinical variables, was constructed using a logistic regression classifier. All the models were implemented using the scikit-learn package (RRID:SCR_002577). Median imputation was applied for the data with a missing rate less than 10%, and the clinical variables with a missing rate higher than 10% were not used for model development.

These predictive models were constructed in the training dataset, and their discriminative performance and clinical utility were assessed and compared in both the internal and external validation datasets. A flowchart of the study is shown in Figure 2.

Figure 2 Illustration of study design. DCA, decision curve analysis; EMR, electronic medical record; GLCM, gray-level co-occurrence matrix; GLDM, gray-level dependence matrix; GLRLM, gray-level run length matrix; GLSZM, gray-level size zone matrix; GT, groundtruth; ICC, intraclass correlation coefficient; NGTDM, neighboring gray-tone difference matrix; ROC, receiver operating characteristic; ROI, region of interest.

Statistical analysis

All the statistical analyses were conducted using MedCalc software (RRID:SCR_015044) and SPSS software (RRID:SCR_002865). The differences in continuous variables between two groups were calculated using the t-test, while the categorical variables were compared using the Chi-squared test.

Performance in the three-class classification (hyperthyroidism, normal thyroid function, and hypothyroidism) was assessed using overall accuracy, the F1 score, and Cohen’s kappa. In addition, a receiver operating characteristic (ROC) curve analysis was employed to evaluate model performance in binary classifications for hyperthyroidism, normal thyroid function, and hypothyroidism by calculating the area under the curve (AUC) for each. The Delong’s test was applied to compare the AUC values of the two models. The sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV) were calculated at the optimal cut-off points, which were determined by maximizing the Youden index. For the binary classification of hyperthyroidism, normal thyroid function, and hypothyroidism, the goodness-of-fit of the predictive models was assessed using the Hosmer-Lemeshow test and visualized through calibration curves. Additionally, a decision curve analysis (DCA) was performed to evaluate and compare the clinical utility of the models by calculating the net benefits across a reasonable range of threshold probabilities.

A heatmap was generated using “HemI” software (version 1.0). The calibration analysis was performed using the R programming language (RRID:SCR_001905) with the “rms” package, while the DCA was conducted using the “rmda” package. A two-sided P value less than 0.05 was considered statistically significant.


Results

Patient characteristics

The clinical characteristics of the participants and the CT imaging features in the training, internal validation, and external validation datasets are listed in Table 1. The prevalence of thyroid nodules and the thyroid CT values of the participants in the external validation dataset were significantly higher than those of the participants in the training and internal validation datasets. No significant difference was observed among the three datasets in terms of the other clinical characteristics and CT imaging features.

Table 1

Comparison of the patient characteristics across the training, internal validation, and external validation datasets

Clinical variable Training Internal validation External validation
Patient characteristics
   Total No. of patients 827 354 415
   Age (years) 56.1±17.0 54.4±16.3 58.8±16.0
   Sex (female/male) 517/310 221/133 247/168
   Thyroid nodules (no/yes) 697/130 308/46 180/235
Thyroid CT imaging features
   Right anteroposterior diameter (mm) 22.7±9.7 22.1±7.1 23.3±6.4
   Left anteroposterior diameter (mm) 21.5±7.5 21.5±8.5 21.8±6.5
   Average anteroposterior diameter (mm) 22.1±7.8 21.8±7.1 22.6±6.1
   Right inner-and-outer diameters (mm) 14.1±5.6 13.8±5.4 15.0±5.7
   Left inner-and-outer diameters (mm) 13.5±5.5 13.2±5.5 14.6±6.9
   Average inner-and-outer diameters (mm) 13.8±5.3 13.5±5.2 14.8±5.7
   Right lobe thyroid CT value (HU) 82.5±21.8 84.4±22.9 99.0±34.1
   Left lobe thyroid CT value (HU) 83.2±22.5 84.7±22.7 103.2±82.9
   Average lobe thyroid CT value (HU) 82.9±21.4 84.6±22.1 101.1±50.7

Data are presented as number or mean ± SD. CT, computed tomography; HU, Hounsfield unit; SD, standard deviation.

Selection of clinical variables

The clinical variables of the hyperthyroidism patients, hypothyroidism patients, and the participants with normal thyroid function were compared between each two groups. The age and the thyroid anteroposterior diameters (i.e., the right, left, and average anteroposterior diameters) in the CT images showed significant differences across the three groups, and were used for the construction of the clinical model and the combined model (Table 2).

Table 2

Selection of clinical variables

Clinical variable Hypo    Hyper Normal P
Hypo vs. hyper Hypo vs. normal Hyper vs. normal
Patient characteristics
   Total No. of patients 283 284 260
   Age (years) 65.0±13.9 58.5±16.7 43.7±12.7 <0.001 <0.001 <0.001
   Sex (female/male) 205/78 171/113 141/119 0.002 <0.001 0.159
   Thyroid nodules (no/yes) 250/33 205/79 242/18 <0.001 <0.001 0.059
Thyroid CT imaging features
   Right anteroposterior diameter (mm) 18.9±5.1 28.5±7.5 20.6±12.4 <0.001 0.029 <0.001
   Left anteroposterior diameter (mm) 17.9±5.0 27.4±8.5 19.1±3.9 <0.001 0.001 <0.001
   Average anteroposterior diameter (mm) 18.4±4.8 27.9±7.6 19.9±6.9 <0.001 0.003 <0.001
   Right inner-and-outer diameters (mm) 12.1±4.3 18.2±6.5 11.9±2.5 <0.001 0.576 <0.001
   Left inner-and-outer diameters (mm) 11.6±4.0 17.3±6.6 11.4±2.5 <0.001 0.505 <0.001
   Average inner-and-outer diameters (mm) 11.8±3.9 17.8±6.1 11.7±2.3 <0.001 0.513 <0.001
   Right lobe thyroid CT value (HU) 77.4±22.4 75.6±20.5 95.5±16.1 0.320 <0.001 <0.001
   Left lobe thyroid CT value (HU) 77.3±22.2 76.3±21.0 97.3±17.2 0.600 <0.001 <0.001
   Average lobe thyroid CT value (HU) 77.3±21.3 76.0±20.2 96.4±16.0 0.431 <0.001 <0.001

Data are presented as number or mean ± SD. CT, computed tomography; HU, Hounsfield unit; SD, standard deviation.

Selection of radiomics features

The ICC analysis and Pearson correlation analysis identified 362 stable and non-redundant radiomics features. Following the multigroup t-test, the 21 radiomics features most strongly associated with thyroid status were selected for model development. The results of comparisons across the three groups (i.e., the hyperthyroidism, normal thyroid function, and hypothyroidism groups) and a heatmap of these selected features, based on standardized feature values, are presented in Table S1 and Figure 3 for the training, internal validation, and external validation datasets.

Figure 3 Heatmap of the selected radiomics features.

Model performance evaluation

The confusion matrixes for the three-class classification of hyperthyroidism, normal thyroid function, and hypothyroidism are shown in Figure 4. The combined model demonstrated the best classification performance, achieving an overall accuracy of 80.0% and an average F1 score of 0.779 in the training dataset, 74.3% and 0.739 in the internal validation dataset, and 70.1% and 0.697 in the external validation dataset, respectively. Good agreement between the predicted thyroid status and the ground truth was also observed, with Cohen’s kappa values for the combined model of 0.670, 0.609, and 0.531 in the training, internal validation, and external validation datasets, respectively. The overall accuracy, F1 score, and Cohen’s kappa results for all three models are summarized in Table 3.

Figure 4 Confusion matrices of the predictive models in the training dataset (A-C), internal validation dataset (D-F), and external validation dataset (G-I), respectively. GT, groundtruth.

Table 3

Detailed three-category classification performance of the clinical, radiomics, and combined models

Dataset Model Accuracy F1hypo F1normal F1hyper F1average Weighted-Kappa
Training Clinical 68.2% 0.696 0.673 0.675 0.682 0.588
Radiomics 74.6% 0.691 0.789 0.754 0.743 0.662
Combined 80.0% 0.750 0.827 0.765 0.779 0.703
Internal validation Clinical 66.4% 0.595 0.686 0.707 0.664 0.567
Radiomics 71.5% 0.627 0.809 0.663 0.709 0.621
Combined 74.3% 0.670 0.833 0.689 0.739 0.655
External validation Clinical 53.3% 0.469 0.581 0.536 0.540 0.424
Radiomics 67.2% 0.409 0.821 0.615 0.660 0.556
Combined 70.1% 0.514 0.831 0.632 0.697 0.591

For the binary classification of hyperthyroidism/non-hyperthyroidism, normal/abnormal, and hypothyroidism/non-hypothyroidism, respectively, the ROC curves of the predictive models are also presented in Figure 5. The AUC values of the clinical model, radiomics model, and combined model for the binary classification of hyperthyroidism were 0.860, 0.880, and 0.906 in the training dataset, 0.859, 0.852, and 0.884 in the internal validation dataset, and 0.781, 0.839, and 0.861 in the external validation dataset, respectively. The Delong’s test showed that the AUC value of the combined model was significantly higher than that of the clinical model.

Figure 5 ROC curve analysis of the clinical, radiomics, and combined models for the binary classification of hyperthyroidism/non-hyperthyroidism (A-C), hypothyroidism/non-hypothyroidism (D-F), and normal thyroid function/abnormal thyroid function (G-I) in the training, internal validation, and external validation datasets, respectively. AUC, area under the curve; ROC, receiver operating characteristic.

The ability of the models to distinguish between hypothyroidism and non-hypothyroidism was similar; the AUC values of the combined model were 0.895, 0.864, and 0.770 in the training, internal validation, and external validation datasets, which were higher than those of the radiomics model (0.842 in the training dataset, 0.813 in the internal validation dataset, and 0.715 in the external validation dataset) and those of the clinical model (0.849 in the training dataset, 0.797 in the internal validation dataset, and 0.688 in the external validation dataset).

For the binary classification of normal thyroid function, both the radiomics model and the combined model outperformed the clinical model with statistically significant results in all three datasets. The AUC values of the clinical model, radiomics model, and combined model were 0.851, 0.925, and 0.957 in the training dataset, 0.823, 0.920, and 0.950 in the internal validation dataset, and 0.784, 0.892, and 0.911 in the external validation dataset, respectively. The detailed SEN, SPE, PPV, and NPV of these models are presented in Tables 4-6.

Table 4

Detailed performance of the clinical, radiomics, and combined models for the binary classification of hyperthyroidism

Dataset Model AUC (95% CI) P value SEN, % SPE, % PPV, % NPV, %
Training Clinical 0.860 (0.834–0.883) Reference 69.4 88.0 75.2 84.6
Radiomics 0.880 (0.856–0.902) 0.147 77.5 85.3 73.3 87.9
Combined 0.906 (0.884–0.925) <0.001 85.9 81.0 70.3 91.7
Internal validation Clinical 0.859 (0.818–0.893) Reference 72.0 89.1 74.0 88.0
Radiomics 0.852 (0.810–0.887) 0.759 72.9 84.2 66.7 87.8
Combined 0.884 (0.845–0.915) 0.130 87.9 77.3 62.7 93.6
External validation Clinical 0.781 (0.738–0.820) Reference 79.8 67.9 45.4 90.9
Radiomics 0.839 (0.800–0.873) 0.029 91.4 66.2 47.5 95.8
Combined 0.861 (0.824–0.893) <0.001 80.8 76.5 53.5 92.2

AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Table 5

Detailed performance of the clinical, radiomics, and combined models for the binary classification of hypothyroidism

Dataset Model AUC (95% CI) P value SEN, % SPE, % PPV, % NPV, %
Training Clinical 0.849 (0.823–0.873) Reference 79.5 78.9 66.2 88.1
Radiomics 0.842 (0.815–0.866) 0.705 69.6 86.6 73.0 84.6
Combined 0.895 (0.872–0.915) <0.001 88.0 78.7 68.2 92.6
Internal validation Clinical 0.797 (0.751–0.838) Reference 67.3 81.6 62.2 84.7
Radiomics 0.813 (0.769–0.853) 0.627 70.0 82.0 63.6 85.8
Combined 0.864 (0.824–0.898) <0.001 84.6 77.5 62.8 91.7
External validation Clinical 0.688 (0.641–0.732) Reference 55.5 55.5 31.0 77.5
Radiomics 0.715 (0.669–0.758) 0.459 76.4 59.3 40.4 87.4
Combined 0.770 (0.727–0.810) <0.001 76.4 64.9 44.0 88.4

AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Table 6

Detailed performance of the clinical, radiomics, and combined models for the binary classification of normal thyroid function

Dataset Model AUC (95% CI) P value SEN, % SPE, % PPV, % NPV, %
Training Clinical 0.851 (0.825–0.875) Reference 87.3 70.7 57.8 92.4
Radiomics 0.925 (0.905–0.942) <0.001 88.1 83.4 70.9 93.8
Combined 0.957 (0.941–0.970) <0.001 90.0 90.3 81.0 95.2
Internal
validation
Clinical 0.823 (0.780–0.862) Reference 86.1 67.3 62.4 88.5
Radiomics 0.920 (0.887–0.946) <0.001 94.9 77.0 72.2 96.0
Combined 0.950 (0.922–0.970) <0.001 94.2 82.5 77.2 95.7
External
validation
Clinical 0.784 (0.742–0.823) Reference 75.6 67.3 68.5 74.6
Radiomics 0.892 (0.858–0.920) <0.001 79.6 87.9 86.0 82.1
Combined 0.911 (0.879–0.937) <0.001 87.6 81.8 81.9 87.5

AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Clinical utility analysis

All the models demonstrated good consistency between the predicted probabilities and actual rates in both the internal and external validation datasets. The non-significant Hosmer-Lemeshow statistics for the clinical model, radiomics model, and combined model showed no significant deviation from an ideal fitting, respectively (all P values >0.05). The corresponding calibration curves are presented in Figures S1,S2.

The DCA further demonstrated the clinical utility of the predictive models; all the models exhibited a higher net benefit than both the treat-none and treat-all strategies across almost the entire range of threshold probabilities. Additionally, the combined model consistently showed a higher net benefit than the clinical model across the majority of the threshold probabilities (Figure 6).

Figure 6 Decision curve analysis of the clinical, radiomics, and combined models for the binary classification of the hyperthyroidism/non-hyperthyroidism (A,D), hypothyroidism/non-hypothyroidism (B,E), and normal thyroid function/abnormal thyroid function (C,F) in the internal validation and external validation datasets.

Discussion

This preliminary study developed a CT radiomics model for predicting thyroid function. Combining the model with clinical factors further improved the diagnostic efficiency of the model in predicting hyperthyroidism and hypothyroidism.

Ultrasonography is considered the optimal imaging modality for examining the thyroid gland, but it relies excessively on the clinical experience of ultrasound doctors, and its ability to visualize deep neck structures is limited (10). Thyroid CT scanning has been used as a supplementary examination, but its use is limited by its low sensitivity in diagnosis. However, neither ultrasonography nor CT can detect thyroid function.

The recent emergence of radiomics has led to a number of advances; radiomics can mine a large amount of deep data from images and improve the accuracy of image-based diagnoses (11). Extensive research has been conducted on the use of this technology in multiple diseases, including gastrointestinal tract diseases, pulmonary nodules, breast tumors, and other tumors (12-17). Preliminary studies have explored the application of radiomics to thyroid diseases, and shown its ability to distinguishing between different thyroid lesions (18-23). However, research on the diagnostic value of imaging in the prediction of thyroid function, which mainly uses ultrasonography and conventional non-contrast CT, is limited (24-28). Therefore, we sought to apply CT radiomics to develop models to predict thyroid function.

Our data revealed significantly lower mean thyroid CT values in the hyperthyroid (76.0±20.2 HU) and hypothyroid (77.3±21.3 HU) groups than the euthyroid group (96.4±16.0 HU) (both P<0.001), aligning with previous studies linking thyroid dysfunction to reduced tissue radiodensity (29-32). Notably, no significant differences were observed between the hyperthyroid and hypothyroid groups (P=0.431), which may reflect shared pathological disruptions despite distinct etiologies.

In hyperthyroidism, accelerated hormone synthesis depletes iodine-rich colloid stores in follicles and triggers immune-mediated inflammation, diluting functional thyroid parenchyma (29). The decreased CT density of the thyroid gland in patients with hypothyroidism is attributed to the replacement of thyroid follicular cells, the infiltration of inflammatory cells, and subsequent fibrosis (32). These pathological changes result in a reduction in thyroid CT density. Such findings suggest that reduced thyroid CT values could serve as a non-specific indicator of thyroid impairment, prompting further clinical investigation into underlying dysfunction.

In this study, a clinical model consisting of age and the thyroid anteroposterior diameters was developed. The age of the patients in the three groups (normal thyroid function, hyperthyroidism, and hypothyroidism) was statistically different, similar to the findings of previous studies (33-37). This may be related to several factors: (I) physiological degeneration: with age, thyroid tissue may undergo gradual degenerative changes, leading to a decrease in thyroid hormone synthesis and secretion, thus increasing the risk of hypothyroidism; (II) autoimmune factors: as we age, the body’s immune system function may change, leading to an increased incidence of autoimmune thyroid disease; and (III) environmental and lifestyle factors: older patients face more environmental and habitual challenges (e.g., smoking, drinking, and diet), which affect thyroid function, and increase the risk of hypothyroidism or hyperthyroidism.

In our study, we found that the anteroposterior diameter of the thyroid was strongly correlated with thyroid function. It is speculated that the thyroid exhibits compensatory hyperplasia in response to hypothyroidism, and that hyperthyroidism or hypothyroidism is often associated with various complications, such as hyperthyroidism-associated ophthalmopathy and hypothyroidism-associated myxedema, which indirectly affect thyroid structure and function through immune responses, inflammatory states, and metabolic abnormalities.

This study clarified the correlation between radiomics features and hyperthyroidism/hypothyroidism, and screened out 21 key radiomics features. These features may reflect the microscopic structure and functional changes of the thyroid gland under disease conditions, providing strong information support for subsequent diagnostic models.

The SVM algorithm was then used for model building. The SVM algorithm is a machine learning method with excellent performance in handling complex classification problems. Through training and validation, we examined the clinical, radiomic, and combined models. The radiomics model performed well in the three-class diagnosis of hyperthyroidism/hypothyroidism, and performed better than the clinical model in both the training and validation sets, reflecting the value of radiomics in auxiliary diagnosis. The diagnostic efficacy of the combined model was further improved, with the highest F1 and kappa values in the internal and external validation sets, while the ROC curves showed excellent binary diagnostic performance. Therefore, based on the results above, when patients undergo CT examination and thyroid abnormalities are detected, radiomics analysis may be considered to predict thyroid function and to help detect patients with underlying or atypical clinical symptoms of thyroid dysfunction.

Although the results of this study were encouraging, it still had some limitations. First, etiological stratification of hyperthyroidism cases was not performed; however, we intend to develop a CT radiomics-based predictive model to automatically differentiate between etiologies (e.g., Graves’ disease and toxic nodular goiter). Second, an analysis of TSH, a continuous variable in thyroid function, was omitted; thus, longitudinal studies need to be conducted to quantify dynamic relationships between TSH fluctuations and radiomics feature evolution.

Despite the limitations of this study, the findings of this two-center investigation underscore the significant clinical application potential. The most immediate clinical application lies in opportunistic screening; by developing an automated tool that can be integrated with the existing Picture Archiving and Communication Systems (PACS), the model could analyze the thyroid gland incidentally captured in scans performed for other indications (e.g., cervical spine or chest CT), thereby identifying individuals at high risk of thyroid dysfunction without additional radiation exposure or cost. This would facilitate early detection and prompt further endocrine laboratory evaluation. While the two-center design enhances the generalizability of our findings compared to a single-center study, future efforts must still focus on larger-scale, multi-center and prospective validation to rigorously assess the model’s robustness across an even broader range of populations and CT scanner platforms further. Concurrently, some research is needed to address the technical challenges of seamless integration into clinical workflows and to explore the biological underpinnings of the key radiomic features to enhance the model’s interpretability. Ultimately, studies investigating the impact of this tool on diagnostic efficiency, and patient clinical benefit will be crucial to definitively establish its clinical utility.


Conclusions

The predictive model based on plain CT radiomics showed good efficacy in the diagnosis of hyperthyroidism/hypothyroidism, and its accuracy was further improved when it was combined with clinical information. Our findings provide novel ideas and methods for the auxiliary diagnosis of thyroid diseases, and our models may have clinical application value.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/dss

Funding: The work was financially supported by the Project of State Key Laboratory of Radiation Medicine and Protection, Soochow University (No. GZK12023041), PreResearch Fund Project of The Second Affiliated Hospital of Soochow University (No. SDFEYLC2445), and Jiangsu Research Hospital Society Infection Imaging Research Special Fund Project (No. GY202308).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-525/coif). R.Y. reports that this work was supported by PreResearch Fund Project of The Second Affiliated Hospital of Soochow University (No. SDFEYLC2445). X.N. reports that this work was supported by Jiangsu Research Hospital Society Infection Imaging Research Special Fund Project (No. GY202308). D.J. reports that this work was supported by the Project of State Key Laboratory of Radiation Medicine and Protection, Soochow University (No. GZK12023041). The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted according to the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Soochow University (No. JD-HG-2024-059) and individual consent for this retrospective analysis was waived. Changshu Hospital Affiliated to Nanjing University of Chinese Medicine was also informed and approved the study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Jakubiak GK, Pawlas N, Morawiecka-Pietrzak M, Zalejska-Fiolka J, Stanek A, Cieślar G. Relationship of Thyroid Volume and Function with Ankle-Brachial Index, Toe-Brachial Index, and Toe Pressure in Euthyroid People Aged 18-65. Medicina (Kaunas) 2024;60:1445. [Crossref] [PubMed]
  2. Wu W, Fang X, Li J, Zhang A, Zou Y, Zheng X. Application of dual-source computed tomography in the diagnosis of thyroid cancer and evaluation of biological behaviors. J Clin Ultrasound 2023;51:195-202. [Crossref] [PubMed]
  3. Gao SY, Zhang XY, Wei W, Li XT, Li YL, Xu M, Sun YS, Zhang XP. Identification of benign and malignant thyroid nodules by in vivo iodine concentration measurement using single-source dual energy CT: A retrospective diagnostic accuracy study. Medicine (Baltimore) 2016;95:e4816. [Crossref] [PubMed]
  4. Song Z, Li Q, Zhang D, Li X, Yu J, Liu Q, Li Z, Huang J, Zhang X, Tang Z. Nomogram based on spectral CT quantitative parameters and typical radiological features for distinguishing benign from malignant thyroid micro-nodules. Cancer Imaging 2023;23:13. [Crossref] [PubMed]
  5. Kong D, Zhang J, Shan W, Duan S, Guo L. CT radiomics model for differentiating malignant and benign thyroid nodules. Chinese Journal of Radiology 2020;54:187-91.
  6. Chen DW, Lang BHH, McLeod DSA, Newbold K, Haymart MR. Thyroid cancer. Lancet 2023;401:1531-44. [Crossref] [PubMed]
  7. Jiang L, Liu D, Long L, Chen J, Lan X, Zhang J. Dual-source dual-energy computed tomography-derived quantitative parameters combined with machine learning for the differential diagnosis of benign and malignant thyroid nodules. Quant Imaging Med Surg 2022;12:967-78. [Crossref] [PubMed]
  8. Chen C, Liu Y, Yao J, Wang K, Zhang M, Shi F, Tian Y, Gao L, Ying Y, Pan Q, Wang H, Wu J, Qi X, Wang Y, Xu D. Deep learning approaches for differentiating thyroid nodules with calcification: a two-center study. BMC Cancer 2023;23:1139. [Crossref] [PubMed]
  9. Liu Y, Chen C, Wang K, Zhang M, Yan Y, Sui L, Yao J, Zhu X, Wang H, Pan Q, Wang Y, Liang P, Xu D. The auxiliary diagnosis of thyroid echogenic foci based on a deep learning segmentation model: A two-center study. Eur J Radiol 2023;167:111033. [Crossref] [PubMed]
  10. Superficial Organ and Vascular Ultrasound Group, Society of Ultrasound in Medicine, Chinese Medical Association. Chinese Artificial Intelligence Alliance for Thyroid and Breast Ultrasound. 2020 Chinese Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules The: C-TIRADS. Chin J Ultrasonogr 2021;30:185-200.
  11. Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
  12. Sohn JH, Fields BKK. Radiomics and Deep Learning to Predict Pulmonary Nodule Metastasis at CT. Radiology 2024;311:e233356. [Crossref] [PubMed]
  13. Miranda J, Horvat N, Araujo-Filho JAB, Albuquerque KS, Charbel C, Trindade BMC, Cardoso DL, de Padua Gomes de Farias L, Chakraborty J, Nomura CH. The Role of Radiomics in Rectal Cancer. J Gastrointest Cancer 2023;54:1158-80. [Crossref] [PubMed]
  14. Xia T, Zhao B, Li B, Lei Y, Song Y, Wang Y, Tang T, Ju S. MRI-Based Radiomics and Deep Learning in Biological Characteristics and Prognosis of Hepatocellular Carcinoma: Opportunities and Challenges. J Magn Reson Imaging 2024;59:767-83. [Crossref] [PubMed]
  15. Qi YJ, Su GH, You C, Zhang X, Xiao Y, Jiang YZ, Shao ZM. Radiomics in breast cancer: Current advances and future directions. Cell Rep Med 2024;5:101719. [Crossref] [PubMed]
  16. Warkentin MT, Al-Sawaihey H, Lam S, Liu G, Diergaarde B, Yuan JM, Wilson DO, Atkar-Khattra S, Grant B, Brhane Y, Khodayari-Moez E, Murison KR, Tammemagi MC, Campbell KR, Hung RJ. Radiomics analysis to predict pulmonary nodule malignancy using machine learning approaches. Thorax 2024;79:307-15. [Crossref] [PubMed]
  17. Yang J, Cai H, Liu N, Huang J, Pan Y, Zhang B, Tong M, Zhang Z. Application of radiomics in ischemic stroke. J Int Med Res 2024;52:3000605241238141. [Crossref] [PubMed]
  18. Tuna IS. Editorial Comment: CT Radiomics-An Emerging Tool for Thyroid Nodule Evaluation. AJR Am J Roentgenol 2024;223:e2431393. [Crossref] [PubMed]
  19. Xu H, Wang X, Guan C, Tan R, Yang Q, Zhang Q, Liu A, Liu Q. Value of Whole-Thyroid CT-Based Radiomics in Predicting Benign and Malignant Thyroid Nodules. Front Oncol 2022;12:828259. [Crossref] [PubMed]
  20. Lin S, Gao M, Yang Z, Yu R, Dai Z, Jiang C, Yao Y, Xu T, Chen J, Huang K, Lin D. CT-Based Radiomics Models for Differentiation of Benign and Malignant Thyroid Nodules: A Multicenter Development and Validation Study. AJR Am J Roentgenol 2024;223:e2431077. [Crossref] [PubMed]
  21. Li Z, Zhong Y, Lv Y, Zheng J, Hu Y, Yang Y, Li Y, Sun M, Liu S, Guo Y, Zhang M, Zhou L. A CT based radiomics analysis to predict the CN0 status of thyroid papillary carcinoma: a two- center study. Cancer Imaging 2024;24:62. [Crossref] [PubMed]
  22. Gurun E, Cakir IM, Ozturk M. Radiomics of Thyroid Malignancy: Going Beyond the Picture. Acad Radiol 2023;30:2169-71. [Crossref] [PubMed]
  23. Wu X, Yu P, Jia C, Mao N, Che K, Li G, Zhang H, Mou Y, Song X. Radiomics Analysis of Computed Tomography for Prediction of Thyroid Capsule Invasion in Papillary Thyroid Carcinoma: A Multi-Classifier and Two-Center Study. Front Endocrinol (Lausanne) 2022;13:849065. [Crossref] [PubMed]
  24. Jeong SH, Hong HS, Lee JY. The association between thyroid echogenicity and thyroid function in pediatric and adolescent Hashimoto's thyroiditis. Medicine (Baltimore) 2019;98:e15055. [Crossref] [PubMed]
  25. Li ZT, Wang M, Pan DM, Ren ZJ, Liu HM. LI Q. Dual-source CT evaluation on correlation between thyroid function and thyroid iodine concentration. Chin J Med Imaging Technol 2020;36:610-3.
  26. Shi XD, Liu H. Clinical study on relationship between thyroid CT value and thyroid function. Chinese Journal of CT and MRI 2020;18:36-8.
  27. Ji Y. The diagnostic value of thyroid ultrasonography combined with thyroid function index in benign and malignant thyroid nodules. Journal of Clinical Research 2015;32:2330-3.
  28. Chen JG, Lu JF, Liu CD, Ruan DB. Study on the definition of ultrasonic thyroid parameters and their relationship with thyroid function in pregnant women. Chinese Journal of Medical Ultrasound 2018;15:822-5. (Electronic edition).
  29. Lee Y. Dual-energy computed tomography-based volumetric thyroid iodine quantification: correlation with thyroid hormonal status, pathologic diagnosis, and phantom validation. Diagn Interv Radiol 2025;31:226-33. [Crossref] [PubMed]
  30. Scheepers MHMC, Al-Difaie ZJJ, Bouvy ND, Havekes B, Postma AA. Four-Dimensional Dual-Energy Computed Tomography-Derived Parameters and Their Correlation with Thyroid Gland Functional Status. Tomography 2025;11:22. [Crossref] [PubMed]
  31. Chaudhary P, Pamnani J, Rana K, Khandekar AK. Exploring the Association Between Thyroid Density Assessed by Non-contrast Computed Tomography and Serum Thyroid-Stimulating Hormone (TSH) Levels in Hypothyroid Patients. Cureus 2023;15:e48653. [Crossref] [PubMed]
  32. Kikuchi T, Hanaoka S, Nakao T, Nomura Y, Yoshikawa T, Alam MA, Mori H, Hayashi N. Relationship between Thyroid CT Density, Volume, and Future TSH Elevation: A 5-Year Follow-Up Study. Life (Basel) 2023;13:2303. [Crossref] [PubMed]
  33. Zhang R, Dong J, Li Y, Xiao S, Qiu L. A Horizontal and Longitudinal Study on the Changes of Aging Thyroid Function in Elderly Male Population. Discov Med 2024;36:827-35. [Crossref] [PubMed]
  34. Qiu L, Wang DC, Tao Xu, Cheng YQ, Sun Q, Hu YY, Liu HC, Lu SY, Yang G, Wang Z. Effects of sex, age and season on thyroid hormone reference interval. National Medical Journal of China 2018;20:1582-7. [Crossref] [PubMed]
  35. Yao C, Wu M, Liu M, Chen X, Zhu H, Xiong C, Wang D, Xiang Y, Suo G, Wang J, Sun H, Yuan C, Xia Y. Age- and sex-specific reference intervals for thyroid hormones in a Chinese pediatrics: a prospective observational study of 1,279 healthy children. Transl Pediatr 2021;10:2479-88. [Crossref] [PubMed]
  36. Walsh JP. Thyroid Function across the Lifespan: Do Age-Related Changes Matter? Endocrinol Metab (Seoul) 2022;37:208-19. [Crossref] [PubMed]
  37. Cai YZ, He DD, Wang YY, Liu XY, Xu XL, Dong LJ, Liu N, Yu DD, Wang N. Correlation between different ages and pubertal development stages and reference intervals of thyroid function indicators in adolescent females. Fudan University Journal of Medical Sciences 2024;51:566-73.
Cite this article as: Yu R, Tan Y, Hou J, Chen Y, Ni X, Xu L, Fan G, Jin D. Prediction of thyroid function based on CT radiomics: a two-center study. Quant Imaging Med Surg 2025;15(12):12190-12204. doi: 10.21037/qims-2025-525

Download Citation