Comparative study of different worldwide versions of the thyroid risk stratification system in patients with thyroid nodules in China
Original Article

Comparative study of different worldwide versions of the thyroid risk stratification system in patients with thyroid nodules in China

Jia-Jia Wang1# ORCID logo, Hai-Qing Huang2# ORCID logo, Ruo-Ting Zheng3# ORCID logo, Zhi-Hui Lin1 ORCID logo, Mu-Min Wu2 ORCID logo, Shao-Zhi Cai2 ORCID logo, Xian-Ying Liao1 ORCID logo, Dong-Ming Guo1 ORCID logo, Zhe Chen1,2 ORCID logo

1Department of Interventional Ultrasound, Cancer Hospital of Shantou University Medical College, Shantou, China; 2Department of Ultrasound, Cancer Hospital of Shantou University Medical College, Shantou, China; 3Department of Ultrasound, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China

Contributions: (I) Conception and design: Z Chen, JJ Wang, HQ Huang; (II) Administrative support: Z Chen, HQ Huang; (III) Provision of study materials or patients: JJ Wang, RT Zheng, SZ Cai; (IV) Collection and assembly of data: JJ Wang, RT Zheng, SZ Cai, XY Liao; (V) Data analysis and interpretation: JJ Wang, RT Zheng, ZH Lin, MM Wu, Z Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Zhe Chen, MMed. Department of Interventional Ultrasound, Cancer Hospital of Shantou University Medical College, 1 Xuecheng Road, Shantou 515041, China; Department of Ultrasound, Cancer Hospital of Shantou University Medical College, Shantou, China. Email: sdycsyxk@163.com.

Background: In China, the risk stratification system for thyroid nodules has not yet reached a uniform consensus. Therefore, the purpose of this study was to compare the effectiveness of the Chinese Thyroid Imaging Reporting and Data Systems (C-TIRADS), American College of Radiology TIRADS (ACR-TIRADS), European TIRADS (EU-TIRADS), and Korean TIRADS (K-TIRADS) in distinguishing benign from malignant thyroid nodules and guiding fine-needle aspiration (FNA).

Methods: A total of 1,174 thyroid nodules from 1,174 patients with an average age of 45.86±13.66 years were included in our study. Ultrasound features of the nodules were evaluated and categorized according to the four thyroid risk stratification systems. These TIRADSs were compared by using the area under the receiver operating characteristic curve (AUROC) and the optimal cut-off classification was determined from the four systems and the diagnostic performance of the corresponding recommended FNAs was compared.

Results: Out of the 1,174 thyroid nodules, 699 were benign, and 475 were malignant. The best diagnostic performance cut-off categories for each risk stratification system were C-TIRADS 4B [area under the curve (AUC) =0.864], ACR-TIRADS 5 (AUC =0.882), EU-TIRADS 5 (AUC =0.861) and K-TIRADS 5 (AUC =0.856). The AUROC of ACR-TIRADS 5 was significantly greater than those of C-TIRADS 4B, EU-TIRADS 5, and K-TIRADS 5 (vs. C-TIRADS 4B, P=0.0191; vs. EU-TIRADS 5, P=0.0031; vs. K-TIRADS 5, P=0.0001). Comparisons of the AUROCs of C-TIRADS 4B, EU-TIRADS 5, and K-TIRADS 5 revealed no statistically significant differences (C-TIRADS 4B vs. EU-TIRADS 5, P=0.4184; C-TIRADS 4B vs. K-TIRADS 5, P=0.2388; EU-TIRADS 5 vs. K-TIRADS 5, P=0.4659). The FNA recommendation of CI-TIRADS 4B showed the best FNA diagnostic performance, with an AUROC of 0.658. Compared with those of ACR-TIRADS 5, EU-TIRADS 5, and KI-TIRADS 5, the AUROC of the FNA recommendation of CI-TIRADS 4B was significantly better (all P<0.0001).

Conclusions: All four thyroid risk stratification systems with different geographical settings have high diagnostic performance. FNAs based on C-TIRADS recommendations have significantly better diagnostic performance and may be more suitable for use in patients with thyroid nodules in China.

Keywords: Thyroid nodule; ultrasound features; fine-needle aspiration (FNA); risk stratification system


Submitted Mar 02, 2025. Accepted for publication Nov 03, 2025. Published online Dec 31, 2025.

doi: 10.21037/qims-2025-527


Introduction

The detection of thyroid nodules has increased dramatically over the past two decades as ultrasound screening has become widely available. The vast majority of thyroid nodules do not require intervention; however, a small percentage are malignant tumours. Therefore, accurate and effective risk assessment is crucial (1). In this context, researchers have developed different versions of risk stratification systems for thyroid nodules in different geographical regions worldwide. These regions include Europe, North America, Korea, and China (2-5). These risk stratification systems are established based on local population nodule characteristics; thus, the ultrasound characteristics, malignancy risk categorization, and biopsy indications may differ accordingly (6). However, the goal of each system is the same: identify risk and guide clinicians in thyroid nodule management.

In China, the risk stratification system for thyroid nodules has not yet reached a uniform consensus, which may lead to the application of different risk stratification systems in medical institutions across different regions. Currently, risk stratification systems for thyroid nodules can be broadly classified into two types worldwide: one is a modality-based system that defines several categories with different risks of malignancy, such as the Korean Thyroid Imaging Reporting and Data Systems (K-TIRADS), developed by the Korean Society of Radiology and the Korean Society of Thyroid Radiology in 2016 (2), and the European TIRADS (EU-TIRADS), developed by the European Thyroid Association and the European Association of Radiology in 2017 (3). The other is a fraction-based calculation system that calculates mainly with different ultrasound image features with corresponding weights, such as the American College of Radiology TIRADS (ACR-TIRADS) created by the American College of Radiology in 2017 (4) and the Chinese TIRADS (C-TIRADS) created by The Superficial Organ and Vascular Ultrasound Group in the Chinese Medical Association in 2020 (5). According to several previous studies, different versions of the thyroid risk stratification system in different geographical settings may perform slightly differently for the same nodule. Additionally, different systems have different size thresholds for fine-needle aspiration (FNA), which also affects the final diagnostic performance (7-10). To date, researchers have not reached a consensus on which system is ideal for patients in China. Therefore, this study aimed to compare the applicability of different geographical versions of the thyroid risk stratification system in patients with thyroid nodules in China. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-527/rc).


Methods

Compliance with ethical standards

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by institutional ethics board of Cancer Hospital of Shantou University Medical College (No. 2024055) and individual consent for this retrospective analysis was waived.

Study population

Patients with thyroid nodules who underwent surgical resection or core needle biopsy at the Second Affiliated Hospital of Shantou University Medical College from January 2020 to December 2023 were included. Thyroid nodules were included if they had the following complete information: (I) clear histopathological findings, either based on surgical resection or core needle biopsy; and (II) documentation of clear ultrasound images retained at the Second Affiliated Hospital of Shantou University Medical College within 1 month before surgery. Thyroid nodules were excluded from the database if (I) there were no clear histopathological findings and (II) no clear or complete ultrasound image data had been retained at the Second Affiliated Hospital of Shantou University Medical College for one month before surgery. When a patient had more than one nodule, a nodule which has a typical image and final clear histology will be included in the database. In total, 1,196 patients with 1,196 thyroid nodules underwent surgical resection or core needle biopsy. Among them, 22 nodules in 22 patients were excluded from this study due to blurred ultrasound images or a lack of final pathological confirmation. Ultimately, 1,174 thyroid nodules from 1,174 patients were included in the study (Figure 1).

Figure 1 Flowchart showing the recruitment of study participants.

The ultrasound images were acquired by Logiq E9 (GE Healthcare, Chicago, USA) with a linear high-frequency probe (ML6-15) and RS80A (SAMSUNG Healthcare, Seoul, Korea) with a linear high-frequency probe (L3-12A). All neck and thyroid images have clear grayscale images and colour Doppler images in both transverse and longitudinal sections, as well as documented measurements on the images. Two radiologists with more than 10 years of experience reread, analysed, and scored the ultrasound images retrospectively. Furthermore, postoperative pathology and prior risk stratification for the nodules were unknown. In the event of a discrepancy between the two radiologists’ judgements, a radiologist with more than 30 years of extensive experience made the final confirmation. The thyroid nodule ultrasound image features were revised by two radiologists according to the requirements of the four risk stratification systems described above. They include nodule size thresholds, composition (spongiform, cystic, or almost completely cystic; mixed cystic and solid; solid or almost completely solid), echogenicity (anechoic, hyperechoic, or isoechoic; hypoechoic; markedly hypoechoic), shape (wider-than-tall; taller-than-wide), margin (smooth; lobulated or irregular; extrathyroidal extension), and echogenic foci (none or comet-tail artefacts; macrocalcification; peripheral calcification; punctate echogenic foci). Each nodule was then stratified for malignant risk according to the stratification methodology of each of the above risk stratification systems. The necessary FNA recommendations were given accordingly (Figures 2,3).

Figure 2 A solid nodule in the right lobe of the thyroid gland, with a maximum diameter of 3.5 cm, smooth margin, shape of wider-than-taller, and iso-echogenicity inside.
Figure 3 A solid nodule in the left lobe of the thyroid gland, with a maximum diameter of 0.7 cm, irregular margin, shape of wider-than-taller, and hypo-echogenicity and punctate echogenic foci inside.

Data and statistical analysis

In the case of continuous variables with normal distributions, the mean ± standard deviation (SD) is expressed, whereas for variables with an abnormal distribution, the median is expressed (interquartile range). Frequencies and proportions were used to express nominal and ordinal variables. Nonparametric data were analysed using the Mann-Whitney U test, and parametric data were analysed using the unpaired t-test. Fisher’s exact test or chi-square test was used to compare categorical variables, multiple group rates, or component ratios.

The risk of malignancy for each node was categorized according to each of the four risk stratification systems. The final histopathological findings of each nodule were subsequently used to determine the proportion of benign and malignant nodules in each risk stratification. Moreover, according to each risk stratification system’s FNA recommendations, the percentage of malignant nodules was also calculated. Afterwards, receiver operating characteristic (ROC) curves for the medium- and high-risk nodules (categories 4 and 5) are drawn for each risk stratification system. The cut-off of categories for their respective areas under the receiver operating characteristic curves (AUROCs) were determined. Based on the established cut-off values, the Delong test was used to compare the differences in the AUROC among the groups. Moreover, the AUROC differences among the groups of different sizes (nodules <1.0 mm; nodules ≥1.0 mm) were also compared. Finally, the AUROCs under the FNA recommendations based on each risk stratification system were compared using the Delong test. All statistical data were analysed with SPSS software for Windows (version 26.0; SPSS Institute, USA) and MedCalc software (version 18.2.1; Mariakerke, Belgium). Two-sided P values <0.05 were considered to indicate statistical significance.


Results

Clinical baseline characteristics and ultrasound features of the nodules

Among the 1,174 thyroid nodules (201 males and 973 females; average age, 45.86±13.66 years), 699 were benign (120 males and 579 females; average age, 45.92±13.95 years), and 475 were malignant (81 males and 394 females; average age, 45.77±13.25 years). There were no statistically significant differences in age or sex between patients with benign and malignant thyroid nodules (age, P=0.640; sex, P=0.959). Among each thyroid nodule range threshold, a total of 109 cases (35 benign and 74 malignant) had nodules with a maximum diameter between 15 mm, a total of 297 cases (123 benign and 174 malignant) had nodules with a maximum diameter between 5 and 10 mm, a total of 193 cases (104 benign and 89 malignant) had nodules with a maximum diameter between 10 and 15 mm, a total of 170 cases (105 benign and 65 malignant) had nodules with a maximum diameter between 15 mm and 20 mm, and a total of 405 cases (332 benign and 73 malignant) had nodules with a minimum diameter of 20 mm. The differences between the groups were statistically significant (P<0.001). More detailed information is presented in Table 1.

Table 1

Clinical baseline characteristics and ultrasound features of the nodules

Basic characteristics Total, n (%) Benign, n (%) Malignant, n (%) P value
Age (years) 45.86±13.66 45.92±13.95 45.77±13.25 0.640
Sex 0.959
   Male 201 (17.1) 120 (17.2) 81 (17.1)
   Female 973 (82.9) 579 (82.8) 394 (82.9)
Nodule size (mm) <0.001
   <5 109 (9.3) 35 (5.0) 74 (15.6)
   ≥5 and <10 297 (25.3) 123 (17.6) 174 (36.6)
   ≥10 and <15 193 (16.4) 104 (14.9) 89 (18.7)
   ≥15 and <20 170 (14.5) 105 (15.0) 65 (13.7)
   ≥20 405 (34.5) 332 (47.5) 73 (15.4)
Composition <0.001
   Spongiform, cystic, or almost completely cystic 169 (14.4) 169 (24.2) 0
   Mixed cystic or solid 170 (14.5) 156 (22.3) 14 (2.9)
   Solid or almost completely solid 835 (71.1) 374 (53.5) 461 (97.1)
Echogenicity <0.001
   Anechoic 71 (6.0) 71 (10.2) 0
   Hyperechoic or isoechoic 427 (36.4) 403 (57.6) 24 (5.0)
   Hypoechoic 597 (50.9) 211 (30.2) 386 (81.3)
   Markedly hypoechoic 79 (6.7) 14 (2.0) 65 (13.7)
Shape <0.001
   Wider-than-taller 928 (79.0) 668 (95.6) 260 (54.7)
   Taller-than-wide 246 (21.0) 31 (4.4) 215 (45.3)
Margin <0.001
   Smooth 912 (77.7) 687 (98.3) 225 (47.4)
   Lobulated or irregular 105 (8.9) 12 (1.7) 93 (19.6)
   Extra-thyroidal extension 157 (13.4) 0 157 (33.0)
Echogenic foci <0.001
   None or comet-tail artifacts 839 (59.4) 699 (82.0) 140 (25.0)
   Macrocalcifications 199 (14.1) 85 (10.0) 114 (20.3)
   Peripheral calcifications 29 (2.1) 14 (1.6) 15 (2.7)
   Punctate echogenic foci 345 (24.4) 54 (6.4) 291 (52.0)

Data are presented as mean ± standard deviation or n (%).

Malignancy rates and malignant detection of FNAs in each category

The malignancy rates and malignant detection results of the FNAs in each category are summarised in Table 2. In the C-TIRADS risk stratification system, the number of nodules, malignancy rate, and FNA malignancy detection rate for categories 2, 3, 4A, 4B, 4C, and 5 were 45 cases, 0.0%, and not applicable (NA); 252 cases, 1.2%; NA; 370 cases, 16.5%, 7.7%; 210 cases, 64.3%, 67.0%; and 284 cases, 93.0%, 98.5% and 13 cases, 92.3%, 100.0%, respectively. In the ACR-TIRADS risk stratification system, the number of nodules, malignancy rate, and FNA malignancy detection rate for categories 1 through 5 were as follows: 66 cases, 0.0%, NA; 245 cases, 0.4%, NA; 130 cases, 2.3%, 0.0%; 256 cases, 24.2%, 17.8%; and 477 cases, 85.7%, 88.5%, respectively. In the EU-TIRADS risk stratification system, the number of nodules, malignancy rate, and FNA malignancy detection rate for categories 2, 3, 4, and 5 were as follows: 171 cases, 0.0%, NA; 281 cases, 1.8%, 0.6%; 213 cases, 28.2%, 18.1%; and 509 cases, 80.6%, 81.8%, respectively. In the K-TIRADS risk stratification system, the number of nodules, malignancy rate, and FNA malignancy detection rate for categories 2, 3, 4, and 5 were as follows: 197 cases, 0.0%, 0.0%; 269 cases, 2.2%, 1.0%; 266 cases, 33.5%, 29.3%; and 442 cases, 86.0%, 89.1%, respectively.

Table 2

Malignancy rates and malignant detection of FNA in each category

Category Total nodules Nodule nature, n (%) Nodule size, n (%) FNA, n (%)
Benign Malignancy ≥10 mm <10 mm Total Malignant detection
C-TIRADS 1,174 699 (59.5) 475 (40.5) 768 (65.4) 406 (34.6) 520 (44.3) 300 (57.7)
   2 45 45 (100.0) 0 (0.0) 43 (95.6) 2 (4.4) 0 NA
   3 252 249 (98.8) 3 (1.2) 229 (90.9) 23 (9.1) 0 NA
   4A 370 309 (83.5) 61 (16.5) 255 (68.9) 115 (31.1) 194 (51.6) 15 (7.7)
   4B 210 75 (35.7) 135 (64.3) 84 (40.0) 126 (60.0) 115 (54.8) 77 (67.0)
   4C 284 20 (7.0) 264 (93.0) 150 (52.8) 134 (47.2) 202 (71.1) 199 (98.5)
   5 13 1 (7.7) 12 (92.3) 7 (53.8) 6 (41.2) 9 (69.2) 9 (100.0)
ACR-TIRADS 1,174 699 (59.5) 475 (40.5) 768 (65.4) 406 (34.6) 382 (32.5) 217 (56.8)
   1 66 66 (100.0) 0 63 (95.5) 3 (4.5) 0 NA
   2 245 244 (99.6) 1 (0.4) 224 (91.4) 21 (8.6) 0 NA
   3 130 127 (97.7) 3 (2.3) 115 (88.5) 15 (11.5) 65 (50.0) 0
   4 256 194 (75.8) 62 (24.2) 139 (54.3) 117 (45.7) 90 (35.2) 16 (17.8)
   5 477 68 (14.3) 409 (85.7) 227 (47.6) 250 (52.4) 227 (47.6) 201 (88.5)
EU-TIRADS 1,174 699 (59.5) 475 (40.5) 768 (65.4) 406 (34.6) 482 (41.1) 216 (44.8)
   2 171 171 (100.0) 0 158 (92.4) 13 (7.6) 0 NA
   3 281 276 (98.2) 5 (1.8) 248 (88.3) 33 (11.4) 163 (58.0) 1 (0.6)
   4 213 153 (71.8) 60 (28.2) 115 (54.0) 98 (46.0) 72 (33.8) 13 (18.1)
   5 509 99 (19.4) 410 (80.6) 247 (48.5) 262 (51.5) 247 (48.5) 202 (81.8)
K-TIRADS 1,174 699 (59.5) 475 (40.5) 768 (65.4) 406 (34.6) 619 (52.7) 226 (36.5)
   2 197 197 (100.0) 0 180 (91.4) 17 (8.6) 64 (32.5) 0
   3 269 263 (97.8) 6 (2.2) 236 (87.7) 33 (12.3) 203 (75.5) 2 (1.0)
   4 266 177 (66.5) 89 (33.5) 150 (56.4) 116 (43.6) 150 (56.4) 44 (29.3)
   5 442 62 (14.0) 380 (86.0) 202 (45.7) 240 (54.3) 202 (4.7) 180 (89.1)

ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Imaging Reporting and Data System; FNA, fine-needle aspiration; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NA, not applicable.

The cut-off category in each risk stratification system

The receiver operating characteristic curves show the cut-off category for each risk stratification system. The best diagnostic performance cut-off categories for each risk stratification system were C-TIRADS 4B [area under the curve (AUC) =0.864], ACR-TIRADS 5 (AUC =0.882), EU-TIRADS 5 (AUC =0.861) and K-TIRADS 5 (AUC =0.856). More detailed information is presented in Table 3.

Table 3

The cut-off category in each risk stratification system

Cut-off of category SEN, % (n/N) SPE, % (n/N) PPV, % (n/N) NPV, % (n/N) ACC, % (n/N) AUC (95% CI)
C-TIRADS 4A 99.4 (472/475) 42.1 (294/699) 53.8 (472/877) 99.0 (294/297) 65.2 (766/1,174) 0.707 (0.680–0.733)
C-TIRADS 4B 86.5 (411/475) 86.3 (603/699) 81.1 (411/507) 90.4 (603/667) 86.4 (1,014/1,174) 0.864 (0.843–0.883)
C-TIRADS 4C 58.1 (276/475) 97.0 (678/699) 92.9 (276/297) 77.3 (678/877) 81.3 (954/1,174) 0.776 (0.751–0.799)
C-TIRADS 5 2.5 (12/475) 99.9 (698/699) 92.3 (12/13) 60.1 (698/1,161) 60.4 (710/1,174) 0.512 (0.483–0.541)
ACR-TIRADS 4 99.2 (471/475) 62.5 (437/699) 64.3 (471/733) 99.1 (437/441) 77.3 (908/1,174) 0.808 (0.785–0.831)
ACR-TIRADS 5 86.1 (409/475) 90.3 (631/699) 85.7 (409/477) 90.5 (631/697) 88.6 (1,040/1,174) 0.882 (0.862–0.900)
EU-TIRADS 4 99.0 (470/475) 64.0 (447/699) 65.1 (470/722) 98.9 (447/452) 78.1 (917/1,174) 0.814 (0.791–0.836)
EU-TIRADS 5 86.3 (410/475) 85.4 (600/699) 80.6 (410/509) 90.2 (600/665) 86.0 (1,010/1,174) 0.861 (0.840–0.880)
K-TIRADS 4 98.7 (469/475) 65.8 (460/699) 66.2 (469/708) 98.7 (460/466) 79.1 (929/1,174) 0.823 (0.800–0.844)
K-TIRADS 5 80.0 (380/475) 91.1 (637/699) 86.0 (380/442) 87.0 (637/732) 86.6 (1,017/1,174) 0.856 (0.834–0.875)

ACC, accuracy; ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; AUROC, area under the receiver operating characteristic curve; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; CI, confidence interval; EU-TIRADS, European Thyroid Imaging Reporting and Data System; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Comparison of diagnostic performance for each cut-off category

Comparisons of the four risk stratification systems’ best cut-off categories are shown in Figures 4-6 and Table 4. The AUROC of ACR-TIRADS 5 was significantly greater than those of C-TIRADS 4B, EU-TIRADS 5, and K-TIRADS 5 (vs. C-TIRADS 4B, P=0.0191; vs. EU-TIRADS 5, P=0.0031; vs. K-TIRADS 5, P=0.0001). Comparisons of the AUROCs of C-TIRADS 4B, EU-TIRADS 5, and K-TIRADS 5 revealed no statistically significant differences (C-TIRADS 4B vs. EU-TIRADS 5, P=0.4184; C-TIRADS 4B vs. K-TIRADS 5, P=0.2388; EU-TIRADS 5 vs. K-TIRADS 5, P=0.4659).

Figure 4 Receiver operating characteristic curve of C-TIRADS, ACR-TIRADS, EU-TIRADS, and K-TIRADS in each cut-off categories. ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Imaging Reporting and Data System; K-TIRADS, Korean Thyroid Imaging Reporting and Data System.
Figure 5 Receiver operating characteristic curve of C-TIRADS, ACR-TIRADS, EU-TIRADS, and K-TIRADS in each cut-off categories with nodules size <10 mm. ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Imaging Reporting and Data System; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NS, nodule size.
Figure 6 Receiver operating characteristic curve of C-TIRADS, ACR-TIRADS, EU-TIRADS, and K-TIRADS in each cut-off categories with nodules size ≥10 mm. ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Imaging Reporting and Data System; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NS, nodule size.

Table 4

Comparison of diagnostic performance for each cut-off categories

Cut-off of category SEN, % (n/N) SPE, % (n/N) PPV, % (n/N) NPV, % (n/N) ACC, % (n/N) AUC (95% CI) AUCs P value
C-TIRADS 4B 86.5 (411/475) 86.3 (603/699) 81.1 (411/507) 90.4 (603/667) 86.4 (1,014/1,174) 0.864 (0.843–0.883) 0.0191a, 0.4184b, 0.2388c
   NS <10 mm 85.1 (211/248) 65.2 (103/158) 79.3 (211/266) 73.6 (103/140) 77.1 (313/406) 0.751 (0.706–0.793) 0.0367a1, 0.7621b1, 0.0736c1, <0.001
   NS ≥10 mm 88.1 (200/227) 92.4 (500/541) 83.0 (200/241) 94.9 (500/527) 91.1 (700/768) 0.903 (0.879–0.923) 0.1119a2, 0.8821b2, 0.0096c2
ACR-TIRADS 5 86.1 (409/475) 90.3 (631/699) 85.7 (409/477) 90.5 (631/697) 88.6 (1,040/1,174) 0.882 (0.862–0.900) 0.0031d, 0.0001e
   NS <10 mm 83.9 (208/248) 73.4 (116/158) 83.2 (208/250) 73.0 (116/159) 79.8 (324/406) 0.786 (0.743–0.825) 0.0145d1, 0.3460e1, <0.001
   NS ≥10 mm 88.5 (201/227) 95.2 (515/541) 88.5 (201/227) 95.2 (515/541) 93.2 (716/768) 0.919 (0.897–0.937) 0.0974d2, 0.0001e2
EU-TIRADS 5 86.3 (410/475) 85.4 (600/699) 80.6 (410/509) 90.2 (600/665) 86.0 (1,010/1,174) 0.861 (0.840–0.880) 0.4659f
   NS <10 mm 83.9 (208/248) 65.8 (104/158) 79.4 (208/262) 72.2 (104/144) 76.8 (312/406) 0.748 (0.703–0.790) 0.0260f1, <0.001
   NS ≥10 mm 89.0 (202/227) 91.7 (496/541) 81.8 (202/247) 95.2 (496/521) 90.9 (698/768) 0.903 (0.880–0.923) 0.0114f2
K-TIRADS 5 80.0 (380/475) 91.1 (637/699) 86.0 (380/442) 87.0 (637/732) 86.6 (1,017/1,174) 0.856 (0.834–0.875)
   NS <10 mm 80.6 (200/248) 74.7 (118/158) 83.3 (200/240) 71.1 (118/166) 78.3 (318/406) 0.777 (0.733–0.816) <0.001
   NS ≥10 mm 79.3 (180/227) 95.9 (519/541) 89.1 (180/202) 91.7 (519/566) 91.0 (699/768) 0.876 (0.851–0.899)

a, C-TIRADS 4B vs. ACR-TIRADS 5; b, C-TIRADS 4B vs. EU-TIRADS 5; c, C-TIRADS 4B vs. K-TIRADS 5; d, ACR-TIRADS 5 vs. EU-TIRADS 5; e, ACR-TIRADS 5 vs. K-TIRADS 5; f, EU-TIRADS 5 vs. K-TIRADS 5. a1-f1, the same comparisons with a-f, but all NS <10 mm. a2-f2, the same comparisons with a-f, but all NS ≥10 mm. , NS <10 vs. ≥10 mm, in each category. ACC, accuracy; ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; AUROC, area under the receiver operating characteristic curve; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; CI, confidence interval; EU-TIRADS, European Thyroid Imaging Reporting and Data System; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NPV, negative predictive value; NS, nodule size; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Comparison of diagnostic performance for each FNA recommendation in each cut-off category

The FNA recommendation of CI-TIRADS 4B showed the best FNA diagnostic performance, with an AUROC of 0.658. Compared with those of ACR-TIRADS 5, EU-TIRADS 5, and KI-TIRADS 5, the AUROC of the FNA recommendation of ACR-TIRADS 5 was significantly better (all P<0.0001). Additionally, compared with those of the EU-TIRADS 5 (vs. EU-TIRADS 5, P<0.0001; vs. KI-TIRADS 5, P=0.0133), the AUROC of the ACR-TIRADS 5 was significantly better. However, there was no significant difference in the comparison between the FNA recommendations of the EU-TIRADS-5 and KI-TIRADS-5 (P=0.8275). More detailed information is presented in Table 5 and Figure 7.

Table 5

Comparison of diagnostic performance for each FNA recommendations in each cut-off categories

Cut-off of category SEN, % (n/N) SPE, % (n/N) PPV, % (n/N) NPV, % (n/N) ACC, % (n/N) AUC (95% CI) AUCs P value
C-TIRADS 4B 63.2 (300/475) 68.5 (479/699) 57.7 (300/520) 73.2 (479/654) 66.4 (779/1174) 0.658 (0.630–0.686) <0.0001a, <0.0001b, <0.0001c
ACR-TIRADS 5 45.7 (217/475) 76.4 (534/699) 56.8 (217/382) 67.4 (534/792) 64.0 (751/1174) 0.610 (0.582–0.638) <0.0001d, 0.0133e
EU-TIRADS 5 45.5 (216/475) 61.9 (433/699) 44.8 (216/482) 62.6 (433/692) 55.3 (649/1174) 0.537 (0.508–0.566) 0.8275F
KI-TIRADS 5 47.6 (226/475) 43.8 (306/699) 36.5 (226/619) 55.1 (306/555) 43.3 (532/1174) 0.543 (0.514–0.572)

a, C-TIRADS 4B vs. ACR-TIRADS 5; b, C-TIRADS 4B vs. EU-TIRADS 5; c, C-TIRADS 4B vs. K-TIRADS 5; d, ACR-TIRADS 5 vs. EU-TIRADS 5; e, ACR-TIRADS 5 vs. K-TIRADS 5; f, EU-TIRADS 5 vs. K-TIRADS 5. ACC, accuracy; ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; AUROC, area under the receiver operating characteristic curve; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; CI, confidence interval; EU-TIRADS, European Thyroid Imaging Reporting and Data System; FNA, fine-needle aspiration; K-TIRADS, Korean Thyroid Imaging Reporting and Data System; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Figure 7 Receiver operating characteristic curve of C-TIRADS, ACR-TIRADS, EU-TIRADS, and K-TIRADS in each cut-off category of FNA recommendation. ACR-TIRADS, Thyroid Imaging Reporting and Data System of the American College of Radiology; C-TIRADS, Chinese Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Imaging Reporting and Data System; FNA, fine-needle aspiration; K-TIRADS, Korean Thyroid Imaging Reporting and Data System.

Discussion

The results of our study revealed significant differences between benign and malignant nodules in terms of composition, echogenicity, shape, margin, and echogenic foci, suggesting that these features are associated with malignancy in patients from China. In our study, four different geographical setting versions of the thyroid risk stratification system were used to evaluate and manage the above five ultrasound features. Although they use similar ultrasound features for evaluation, inconsistent weights are assigned to these ultrasound features by different versions, making the calculation different. Owing to the difference in how features are considered concerning for each system, the risk stratification varies.

Our results revealed that the optimal cut-off values for the four risk stratification systems were C-TIRADS 4B, ACR-TIRADS 5, EU-TIRADS 5, and KI-TIRADS 5. Our results are in general agreement with those of previous studies (11,12). The data in Table 2 show that the malignancy rates for their subgroups increase rapidly at these optimal cut-off values. In this way, optimal cut-offs can be obtained for each risk stratification system from the subgroups mentioned above. At these optimal cut-off values, the diagnostic performance of the ACR-TIRADS was slightly better than that of the other three risk stratification systems, while there was no difference in diagnostic performance between the other three systems. This result may be explained by the fact that the ACR-TIRADS uses a weighted scoring system, assigning different scores to various acoustic features, which may allow for greater diagnostic accuracy (13). Although the diagnostic performance of the ACR is better than that of the other three systems, the evaluation process of the ACR takes longer and requires more experience from evaluators. This leads to a decrease in evaluation efficiency in situations where resources are limited or when doctors lack experience, thereby affecting the timeliness and accuracy of diagnosis. All three of the other risk stratification systems also have high diagnostic performance, with all of them having an AUROC of approximately 0.86. However, their assessment methods are relatively rapid, and they require the identification of only a few key ultrasound features to complete the assessment. In our assessment, overall, all four systems demonstrated an appropriate ability to assess thyroid nodules in patients from China.

In the subcentimetre subgroup exploration, the diagnostic performance of subcentimetre nodules was inferior to that of large nodules in all four risk stratification systems in our study, which is consistent with previous findings (14). We determined that this may be related to the display resolution of the image. Larger thyroid nodules may be easier to recognize accurately with ultrasound features, which explains why nodules larger than 1 cm had better diagnostic performance in our study. However, some previous studies have shown that the diagnostic performance of these risk stratification systems does not vary with nodule size (15,16). We suggest that the differences in results may be related to case selection bias.

FNA recommendations following the evaluation of thyroid nodules are a crucial part of the risk stratification system, as is their diagnostic performance. The higher the diagnostic performance recommended by FNA is, the higher the accuracy of nodule diagnosis while reducing unnecessary FNA. Our study revealed that the diagnostic performance of the best-performing C-TIRADS was significantly better than that of the other three stratification systems. In addition to specificity, the recommended FNA of the C-TIRADS had the highest sensitivity, PPV, NPV, and accuracy among the four systems, which is consistent with previous study (17). In the Chinese geographic setting, although patients may know that most thyroid malignancies are indolent, they still present with higher levels of concern when test results suggest a high risk. In particular, patients may be more inclined to undergo surgical treatment when FNA is recommended through a risk stratification system, but no definitive pathological results can be obtained. This excessive treatment of thyroid nodules has significantly increased the socioeconomic burden in China, which may not be consistent with the management of thyroid nodules abroad. In cases where a diagnosis cannot be made, patients may be more willing to undergo follow-up examinations. Compared with the other three systems, the C-TIRADS has better recommended FNA diagnostic performance, which may be related to the range of FNAs selected. In the C-TIRADS, multifocal, subperitoneal, tracheal, and recurrent laryngeal nerve invasion are recognized as predictors of poor prognosis in papillary thyroid cancer (18-22). This system recommends FNA for nodules with a diameter larger than 10 mm in category 4A, as well as for nodules with a diameter larger than 5 mm in categories 4B, 4C, and 5, if the aforementioned adverse prognostic factors exist. For the same situation, the other three risk stratification systems in different geographic settings maintain conservative recommendations. To some extent, these factors certainly increase the diagnostic performance of the recommended FNA. In summary, the C-TIRADS evaluation process is relatively simple, rapid, and adaptable to evaluators at different levels. Moreover, it also has high diagnostic performance, especially when high FNA is recommended. Therefore, we believe that in China’s specific geographic environment and national consciousness, the C-TIRADS is indeed more suitable for recommendation.

There are several limitations to our study. First, our study was a retrospective static image analysis, which may affect the assessment of ultrasound features, especially the margins of the nodule as well as the overall internal structure, a shortcoming that may be compensated for by real-time dynamic assessment. Second, the proportion of malignant nodules was greater in this study than in other studies, which may be related to the fact that we are a tertiary referral hospital with a high prevalence of frontal thyroid tumours. Finally, this was a single-centre retrospective data study, and some cases without biopsy or surgical diagnosis were not included in this study, which may have resulted in potential sample errors. Therefore, we hope that future studies will involve more centres and a larger number of cases.


Conclusions

Overall, all four thyroid risk stratification systems with different geographical settings have high diagnostic performance. Although the diagnostic performance of the ACR-TIRADS may be slightly better than that of the other three methods, FNA based on C-TIRADS recommendations has significantly better diagnostic performance and may be more suitable for use in patients with thyroid nodules in China.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-527/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-527/dss

Funding: This work was supported by the Science and Technology Planning Project of Shantou (Nos. 240507196498846 and 240416216497338).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-527/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by institutional ethics board of Cancer Hospital of Shantou University Medical College (No. 2024055) and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Haugen BR, Alexander EK, Bible KC, Doherty GM, Mandel SJ, Nikiforov YE, Pacini F, Randolph GW, Sawka AM, Schlumberger M, Schuff KG, Sherman SI, Sosa JA, Steward DL, Tuttle RM, Wartofsky L. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid 2016;26:1-133. [Crossref] [PubMed]
  2. Shin JH, Baek JH, Chung J, Ha EJ, Kim JH, Lee YH, et al. Ultrasonography Diagnosis and Imaging-Based Management of Thyroid Nodules: Revised Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J Radiol 2016;17:370-95. [Crossref] [PubMed]
  3. Russ G, Bonnema SJ, Erdogan MF, Durante C, Ngu R, Leenhardt L. European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults: The EU-TIRADS. Eur Thyroid J 2017;6:225-37. [Crossref] [PubMed]
  4. Tessler FN, Middleton WD, Grant EG, Hoang JK, Berland LL, Teefey SA, Cronan JJ, Beland MD, Desser TS, Frates MC, Hammers LW, Hamper UM, Langer JE, Reading CC, Scoutt LM, Stavros AT. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587-95. [Crossref] [PubMed]
  5. Zhou J, Yin L, Wei X, Zhang S, Song Y, Luo B, et al. 2020 Chinese guidelines for ultrasound malignancy risk stratification of thyroid nodules: the C-TIRADS. Endocrine 2020;70:256-79. [Crossref] [PubMed]
  6. Russ G, Trimboli P, Buffet C. The New Era of TIRADSs to Stratify the Risk of Malignancy of Thyroid Nodules: Strengths, Weaknesses and Pitfalls. Cancers (Basel) 2021;13:4316. [Crossref] [PubMed]
  7. Fu C, Cui Y, Li J, Yu J, Wang Y, Si C, Cui K. Effect of the categorization method on the diagnostic performance of ultrasound risk stratification systems for thyroid nodules. Front Oncol 2023;13:1073891. [Crossref] [PubMed]
  8. Huh S, Yoon JH, Lee HS, Moon HJ, Park VY, Kwak JY. Comparison of diagnostic performance of the ACR and Kwak TIRADS applying the ACR TIRADS’ size thresholds for FNA. Eur Radiol 2021;31:5243-50. [Crossref] [PubMed]
  9. Huh S, Lee HS, Yoon J, Kim EK, Moon HJ, Yoon JH, Park VY, Kwak JY. Diagnostic performances and unnecessary US-FNA rates of various TIRADS after application of equal size thresholds. Sci Rep 2020;10:10632. [Crossref] [PubMed]
  10. Na DG, Paik W, Cha J, Gwon HY, Kim SY, Yoo RE. Diagnostic performance of the modified Korean Thyroid Imaging Reporting and Data System for thyroid malignancy according to nodule size: a comparison with five society guidelines. Ultrasonography 2021;40:474-85. [Crossref] [PubMed]
  11. Qi Q, Zhou A, Guo S, Huang X, Chen S, Li Y, Xu P. Explore the Diagnostic Efficiency of Chinese Thyroid Imaging Reporting and Data Systems by Comparing With the Other Four Systems (ACR TI-RADS, Kwak-TIRADS, KSThR-TIRADS, and EU-TIRADS): A Single-Center Study. Front Endocrinol (Lausanne) 2021;12:763897. [Crossref] [PubMed]
  12. Chen Q, Lin M, Wu S. Validating and Comparing C-TIRADS, K-TIRADS and ACR-TIRADS in Stratifying the Malignancy Risk of Thyroid Nodules. Front Endocrinol (Lausanne) 2022;13:899575. [Crossref] [PubMed]
  13. Dong W, Wu Y, Cai T, Wang X. Comparison of diagnostic performance and FNA management of the ACR-TIRADS and Chinese-TIRADS based on surgical histological evidence. Quant Imaging Med Surg 2023;13:1711-22. [Crossref] [PubMed]
  14. Gao L, Xi X, Jiang Y, Yang X, Wang Y, Zhu S, Lai X, Zhang X, Zhao R, Zhang B. Comparison among TIRADS (ACR TI-RADS and KWAK- TI-RADS) and 2015 ATA Guidelines in the diagnostic efficiency of thyroid nodules. Endocrine 2019;64:90-6. [Crossref] [PubMed]
  15. Mendes GF, Garcia MR, Falsarella PM, Rahal A, Cavalcante FA Junior, Nery DR, Garcia RG. Fine needle aspiration biopsy of thyroid nodule smaller than 1.0 cm: accuracy of TIRADS classification system in more than 1000 nodules. Br J Radiol 2018;91:20170642. [Crossref] [PubMed]
  16. Ha SM, Kim JK, Baek JH. Detection of Malignancy Among Suspicious Thyroid Nodules <1 cm on Ultrasound with Various Thyroid Image Reporting and Data Systems. Thyroid 2017;27:1307-15. [Crossref] [PubMed]
  17. Cai Y, Yang R, Yang S, Lu L, Ma R, Xiao Z, Lin N, Huang Y, Chen L. Comparison of the C-TIRADS, ACR-TIRADS, and ATA guidelines in malignancy risk stratification of thyroid nodules. Quant Imaging Med Surg 2023;13:4514-25. [Crossref] [PubMed]
  18. Al Afif A, Williams BA, Rigby MH, Bullock MJ, Taylor SM, Trites J, Hart RD. Multifocal Papillary Thyroid Cancer Increases the Risk of Central Lymph Node Metastasis. Thyroid 2015;25:1008-12. [Crossref] [PubMed]
  19. Kim H, Jung HJ, Lee SY, Kwon TK, Kim KH, Sung MW, Hun Hah J. Prognostic factors of locally invasive well-differentiated thyroid carcinoma involving the trachea. Eur Arch Otorhinolaryngol 2016;273:1919-26. [Crossref] [PubMed]
  20. Chen W, Lei J, You J, Lei Y, Li Z, Gong R, Tang H, Zhu J. Predictive factors and prognosis for recurrent laryngeal nerve invasion in papillary thyroid carcinoma. Onco Targets Ther 2017;10:4485-91. [Crossref] [PubMed]
  21. Genpeng L, Jianyong L, Jiaying Y, Ke J, Zhihui L, Rixiang G, Lihan Z, Jingqiang Z. Independent predictors and lymph node metastasis characteristics of multifocal papillary thyroid cancer. Medicine (Baltimore) 2018;97:e9619. [Crossref] [PubMed]
  22. Jiang J, Lu H. Immediate Surgery Might Be a Better Option for Subcapsular Thyroid Microcarcinomas. Int J Endocrinol 2019;2019:3619864. [Crossref] [PubMed]
Cite this article as: Wang JJ, Huang HQ, Zheng RT, Lin ZH, Wu MM, Cai SZ, Liao XY, Guo DM, Chen Z. Comparative study of different worldwide versions of the thyroid risk stratification system in patients with thyroid nodules in China. Quant Imaging Med Surg 2026;16(1):67. doi: 10.21037/qims-2025-527

Download Citation