Establishing predictive models for malignant and inflammatory pulmonary nodules using clinical data and CT imaging features
Introduction
With the rise in global health awareness and the widespread adoption of early lung cancer screening, the detection of pulmonary nodules has become increasingly common (1-4). Lung cancer is the leading cause of cancer-related death worldwide (4) and is the most common cancer in China (5). In 2022, new cases of lung cancer accounted for 22.0% of all malignant tumors, and 28.5% of deaths were attributed to malignant tumors (5). Thus, the accurate qualitative diagnosis of pulmonary nodules is an important clinical issue that needs to be addressed (6,7). However, the pathological basis of pulmonary nodules is notably intricate and diverse (8), which adds to the diagnostic challenge they pose. Solid pulmonary nodules, in particular, have heightened diagnostic complexities compared to other nodule types (9). Timely detection, precise diagnosis, and prompt intervention are paramount in improving patient prognosis and the survival rate (10).
Computed tomography (CT) has emerged as the cornerstone imaging modality for lung nodule diagnosis (6,11-14) and plays an indispensable role in the assessment of lung nodules. Despite the use of established guidelines based on CT examinations, such as Fleischner’s Guidelines and the American College of Chest Physicians Guidelines, existing models for predicting the benign and malignant nature of pulmonary nodules still lack optimal accuracy (15), potentially leading to delayed diagnosis and unnecessary invasive procedures (16).
Artificial intelligence (AI) has demonstrated promise in augmenting the interpretation of lung nodule malignancy. However, due to various limitations, such as those related to the development of universal AI algorithms, current AI models cannot be fully integrated into clinical practice, and their widespread clinical application is difficult (17). Standardized big data study on the application of AI for the adjunctive diagnosis of pulmonary nodules remain relatively scarce (11). Further, the accuracy of these models has yet to surpass that of manual interpretation (18-20). However, traditional logistic regression models can provide higher values for clinical use, this is because traditional logistic regression models are both interpretable and operable.
Guidelines issued by professional societies provide clinicians and institutions with a framework for nodule management while allowing flexibility for individual patient decision making (12). Notably, the overwhelming majority (at least 95%) of pulmonary nodules identified are benign and commonly comprise granulomas or intrapulmonary lymph nodes (11). Research has established a positive correlation between nodule size on CT images and the likelihood of malignancy, such that nodule size serves as a primary determinant of nodule classification (21,22). Consequently, there is a pressing need to enhance the ability of physicians to differentiate between benign granulomas and malignant nodules, particularly in solid nodules.
The primary objective of this study was to establish a methodology that assists clinical diagnosticians to predict the malignancy of solid nodules through the use of clinical imaging models. Specifically, we conducted comprehensive quantitative and qualitative analyses of nodule characteristics based on clinical imaging parameters. Our analyses involved stratifying nodules by size and establishing a simple and convenient clinical imaging model. We present this article in accordance with the TRIPOD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2338/rc).
Methods
Study population
This multicenter retrospective study was conducted in accordance with the Declaration of Helsinki (as revised in 2013), and was approved by the respective Ethics Committees of Tongde Hospital of Zhejiang Province (No. 2022-029-JY), Anqing Municipal Hospital (No. 83230471), Taizhou Municipal Hospital (No. LWYJ2023059), and Shaoxing People’s Hospital (No. 2019-K-Y-257-01). The requirement of individual informed consent for this retrospective analysis was waived. The study focused on incidentally detected lung nodules identified through CT scan imaging between 2014 and 2023. The dataset comprised the data of 5,201 patients who underwent surgical resection or percutaneous biopsy with confirmed pathologic results. The above-mentioned healthcare facilities maintain electronic health records and have standardized variable definitions for demographics, smoking history, surgical history, and tumor indicators. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had ground-glass nodules on CT scans; (II) had a lesion diameter >3 cm; (III) had preoperative CT time interval more than one month; (IV) had benign tumors; (V) had metastatic tumors; and/or (VI) had blurry CT findings (Figure 1).

CT image acquisition
The chest CT scans were conducted using nine CT scanners (SOMATOM Definition Flash, FORCE CT, Sensation16, Definition AS 40, Siemens Medical, Forchheim, Germany; Revolution, Optima 680, LightSpeed VCT XT, GE Medical, Waukesha, WI, USA; Brilliant CT 64, Philips Medical, Best, Netherlands; and the Aquilion one tsx-301c CT, Canon Medical, Otawara, Japan). All patients were placed in the supine position with their arms raised and instructed to hold their breath at the end of a deep inhalation during scanning. The scanning area covered the area from the thoracic inlet to the level of the adrenal glands. The scanning parameters were set as follows: tube voltage: 120 kV; automatic tube current modulation technology; collimation: 0.6 mm or 0.625 mm (adjustable); field of view (FOV) range: 360 mm × 360 mm to 400 mm × 400 mm; matrix: 512×512 (fixed); and helical scanning speed: 0.6–0.8 seconds per rotation.
Images analysis
Two radiologists, who had 18 and 15 years of diagnostic radiography experience, respectively, and who were blinded to the patients’ clinical information, independently analyzed the imaging features of the lung nodules. The radiologists performed the imaging analysis and evaluation on the CT characteristics of the nodules. When the evaluation results were consistent, they were directly accepted. In cases of disagreement, a third senior radiologist evaluated the results. All features were processed using the z-score normalization method to ensure the comparability of the quantitative and qualitative data. Intraclass correlation coefficients (ICCs) and the kappa test were separately used to evaluate the stability and homogenization of the imaging continuous and categorical feature extraction. After the observer analyses, a total of 15 imaging features with ICCs >0.75 (continuous variables) and kappa values >0.80 (categorized variables) were used in the subsequent modeling. All the CT images were displayed at standard lung [window width: 1,200–1,500 Hounsfield unit (HU), window level: −600 to −400 HU] and mediastinal (window width: 400 HU, window level: 40 HU) window settings.
The following characteristics were assessed: (I) the long diameter and short diameter (SD) of the nodules; (II) the intrapulmonary conditions, including emphysema/bullae, heterogeneous ventilation or perfusion, bronchiectasis or interstitial lung disease, and multiple lung comorbidity (MLC); (III) nodule signs and surroundings, such as location, lobulation, spiculation, airspace, air bronchogram, pleural tag, calcification, pleural effusion, the rimmed sign, satellite nodules, the halo sign, the cutting sign, mixed signs, and multiple lesions; and (IV) the quantitative CT characteristics of nodules, such as the maximum, minimum, mean, and standard deviation of the CT values. The “rimmed sign” was defined as the ratio of more than 3 or the difference of more than 100 HU of the mean CT value of the edge compared to the interior of a lung nodule. “Multiple lesions” were defined as the presence of several nodules within the lungs, of which the largest nodule was selected for analysis. “Satellite nodules” were described as the small solid nodules that are found surrounding a larger nodule (11). The “Halo sign” was defined as ground-glass opacity surrounding a pulmonary nodule or mass on lung window settings (Figure 2). The “cutting sign” was defined as a straight lesion edge resembling a knife cut with a diameter of >5 mm (Figure 2).

Descriptive statistics and statistical analysis
The patients were allocated to the following four groups based on nodule size: Group 1: nodules ≤10 mm; Group 2: nodules >10 and ≤20 mm; Group 3: nodules >20 and ≤30 mm; and Group 4: all nodules. The patient and nodule characteristics are presented as the frequency and percentage for the categorical variables and the mean ± standard deviation or median for the continuous variables. The independent sample t-test was used for the continuous variables conforming to a normal distribution, otherwise, the Mann-Whitney U test was used. For the qualitative variables, Pearson’s χ2 test, Spearman’s χ2 test, and Fisher’s exact test were employed. Prediction models were obtained through binary logistic regression for the four groups of patients. Receiver operating characteristic (ROC) curves were used to evaluate the two-category capability of the prediction models, and the area under the curve (AUC) was used to assess the classification performance of the models. The statistical significance of the improvement in the AUC was calculated using Delong’s test. A P value <0.05 was considered statistically significant.
Results
A total of 948 patients (536 males) presenting with malignant and inflammatory solid pulmonary nodules were enrolled in the study. The mean age of the patients with malignant nodules was 64.3±9.8 years, and that of the patients with inflammatory nodules was 56.0±11.9 years. After applying the inclusion and exclusion criteria, 4,253 patients were excluded from the analysis as illustrated in the flow chart presented in Figure 1. The remaining 948 patients were categorized into the following two groups: malignant (n=638) and inflammatory (n=310). The malignant nodules were further categorized as follows: adenocarcinoma (n=503), squamous cell carcinoma (n=46), mucous adenocarcinoma (n=35), adenosquamous carcinoma (n=28), and other types of malignant nodules (n=26). The inflammatory nodules included granuloma (n=126), tuberculosis (n=76), fungal infection (n=58), and other types of inflammatory nodules (n=50).
Demographic characteristics of patients with malignant and inflammatory lung nodules
The demographic characteristics of the patients included in the study are detailed in Table 1. A statistically significant difference was observed in the age distribution between the patients with malignant and inflammatory lung nodules. Specifically, in Group 1, the mean age of the patients with malignant nodules was significantly higher than that of those with inflammatory nodules (malignant: 62.0±8.8 years, inflammatory: 55.9±11.1 years) (P<0.001). Similar significant differences in age were also noted in Group 2 (malignant: 64.2±10.1 years, inflammatory: 55.5±12.3 years) and Group 4 (malignant: 64.3±9.8 years, inflammatory: 56.0±11.9 years).
Table 1
Variable | Group 1 | Group 2 | Group 3 | Group 4 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mal (n=42) |
Inf (n=109) |
P value | Mal (n=297) |
Inf (n=133) |
P value | Mal (n=299) |
Inf (n=68) |
P value | Mal (n=638) |
Inf (n=310) |
P value | ||||||||
Single | Multiple | Single | Multiple | Single | Multiple | Single | Multiple | ||||||||||||
Gender | 0.231 | 0.024 | 0.136 | 0.473 | 0.101 | ||||||||||||||
Male | 28 (66.7) | 61 (56.0) | 155 (52.2) | 85 (63.9) | 166 (55.5) | 41 (60.3) | 349 (54.7) | 187 (60.3) | |||||||||||
Female | 14 (33.3) | 48 (44.0) | 142 (47.8) | 48 (36.1) | 133 (44.5) | 27 (39.7) | 289 (45.3) | 123 (39.7) | |||||||||||
Age (years) | 62.0±8.8 | 55.9±11.1 | 0.001 | 0.004 | 64.2±10.1 | 55.5±12.3 | <0.001 | <0.001 | 64.8±9.6 | 57.0±12.6 | <0.001 | 0.246 | 64.3±9.8 | 56.0±11.9 | <0.001 | <0.001 | |||
Smoke | 0.308 | 0.298 | 0.285 | 0.135 | |||||||||||||||
Never | 32 (76.2) | 94 (86.2) | 237 (79.8) | 106 (79.7) | 230 (76.9) | 58 (85.3) | 499 (78.2) | 258 (83.2) | |||||||||||
Current | 7 (16.7) | 9 (8.3) | 39 (13.1) | 22 (16.5) | 46 (15.4) | 7 (10.3) | 92 (14.4) | 38 (12.3) | |||||||||||
Former | 3 (7.1) | 6 (5.5) | 21 (7.1) | 5 (3.8) | 23 (7.7) | 3 (4.4) | 47 (7.4) | 14 (4.5) | |||||||||||
Surgical history | <0.001 | 0.003 | 0.639 | 0.814 | 0.349 | ||||||||||||||
No | 29 (69.0) | 103 (94.5) | 270 (90.9) | 119 (89.5) | 271 (90.6) | 61 (89.7) | 570 (89.3) | 283 (91.3) | |||||||||||
Yes | 13 (31.0) | 6 (5.5) | 27 (9.1) | 14 (10.5) | 28 (9.4) | 7 (10.3) | 68 (10.7) | 27 (8.7) | |||||||||||
Tumor indicator | 0.048 | 0.272 | 0.075 | 0.024 | 0.397 | <0.001 | 0.678 | ||||||||||||
Normal | 28 (66.7) | 89 (81.7) | 215 (72.4) | 107 (80.5) | 195 (65.2) | 54 (79.4) | 438 (68.7) | 250 (80.6) | |||||||||||
Abnormal | 14 (33.3) | 20 (18.3) | 82 (27.6) | 26 (19.5) | 104 (34.8) | 14 (20.6) | 200 (31.3) | 60 (19.4) |
Data are presented as mean ± standard deviation, and the statistical values are the independent sample t-test results or n (%) for the qualitative variables, and the statistical values are the Pearson’s χ2 test, Spearman’s χ2 test and Fisher’s exact test results. Group 1: nodules ≤10 mm; Group 2: nodules >10 and ≤20 mm; Group 3: nodules >20 and ≤30 mm; and Group 4: all nodules. Mal, malignant; Inf, inflammatory.
CT qualitative features of patients with malignant and inflammatory lung nodules
The analysis of the qualitative CT features revealed notable differences between the malignant and inflammatory lung nodules across various groups. Notably, lobulation was significantly more pronounced in the malignant nodules than the inflammatory nodules in all groups (P<0.001). In Group 2, the inflammatory nodules were found to be more prevalent in the right lower lobe (P<0.05). Additionally, calcification was more commonly observed in the inflammatory nodules in Groups 2 and 4 (P<0.001). Satellite lesions were also more frequently observed in the inflammatory nodules than the malignant nodules in Groups 2, 3, and 4 (P<0.001). The presence of the halo sign and cutting sign was more common in the inflammatory nodules in Groups 3 and 4 (P<0.05). Conversely, mixed signs were less prevalent in the malignant nodules in Groups 2 and 4 (P<0.001). However, no statistically significant differences were observed in the remaining quantitative CT characteristics between the malignant and inflammatory lung nodules (Table 2).
Table 2
Variable | Group 1 | Group 2 | Group 3 | Group 4 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mal (n=42) | Inf (n=109) | P value | Mal (n=297) | Inf (n=133) | P value | Mal (n=299) | Inf (n=68) | P value | Mal (n=638) | Inf (n=310) | P value | ||||||||
Single | Multiple | Single | Multiple | Single | Multiple | Single | Multiple | ||||||||||||
Emphysema/bullae | 9 (21.4) | 16 (14.7) | 0.317 | 77 (25.9) | 23 (17.3) | 0.050 | 84 (28.1) | 4 (5.9) | <0.001 | 0.004 | 170 (26.6) | 43 (13.9) | <0.001 | 0.242 | |||||
HVP | 5 (11.9) | 19 (17.4) | 0.405 | 43 (14.5) | 15 (11.3) | 0.369 | 40 (13.4) | 9 (13.2) | 0.975 | 88 (13.8) | 43 (13.9) | 0.974 | |||||||
Bronchiectasis or ILD | 3 (7.1) | 0 | 0.005 | 0.999 | 11 (3.7) | 4 (3.0) | 0.937 | 20 (6.7) | 2 (2.9) | 0.372 | 34 (5.3) | 6 (1.9) | 0.015 | 0.596 | |||||
Multiple lung comorbidity | 1 (2.4) | 2 (1.8) | 0.830 | 15 (5.1) | 6 (4.5) | 0.810 | 19 (6.4) | 0 | 0.067 | 35 (5.5) | 8 (2.6) | 0.044 | 0.028 | ||||||
Location | 0.869 | 0.014 | 0.034 | 0.056 | 0.076 | ||||||||||||||
Right upper lobe | 16 (38.1) | 32 (29.4) | 75 (25.3) | 39 (29.3) | 78 (26.1) | 18 (26.5) | 169 (26.5) | 89 (28.7) | |||||||||||
Right middle lobe | 5 (11.9) | 17 (15.6) | 23 (7.7) | 3 (2.3) | 23 (7.7) | 9 (13.2) | 51 (8.0) | 29 (9.4) | |||||||||||
Right lower lobe | 7 (16.7) | 18 (16.5) | 58 (19.5) | 41 (30.8) | 65 (21.7) | 9 (13.2) | 130 (20.4) | 68 (21.9) | |||||||||||
Left upper lobe | 6 (14.3) | 17 (15.6) | 80 (26.9) | 26 (19.5) | 83 (27.8) | 13 (19.1) | 169 (26.5) | 56 (18.1) | |||||||||||
Left lower lobe | 8 (19.0) | 25 (22.9) | 61 (20.5) | 24 (18.0) | 50 (16.7) | 19 (27.9) | 119 (18.7) | 68 (21.9) | |||||||||||
Lobulation | 19 (45.2) | 11 (10.1) | <0.001 | <0.001 | 233 (78.5) | 48 (36.1) | <0.001 | <0.001 | 259 (86.6) | 37 (54.4) | <0.001 | <0.001 | 511 (80.1) | 96 (31.0) | <0.001 | <0.001 | |||
Spiculation | 12 (28.6) | 22 (20.2) | 0.269 | 124 (41.8) | 22 (16.5) | <0.001 | 0.361 | 208 (69.6) | 28 (41.2) | <0.001 | 0.048 | 397 (62.2) | 106 (34.2) | <0.001 | 0.340 | ||||
Airspace | 6 (14.3) | 5 (4.6) | 0.088 | 50 (16.8) | 12 (9.0) | 0.033 | 0.172 | 61 (20.4) | 7 (10.3) | 0.053 | 117 (18.3) | 24 (7.7) | <0.001 | 0.021 | |||||
Air bronchogram | 4 (9.5) | 3 (2.8) | 0.180 | 51 (17.2) | 15 (11.3) | 0.117 | 65 (21.7) | 20 (29.4) | 0.176 | 120 (18.8) | 38 (12.3) | 0.011 | 0.197 | ||||||
Pleural tag | 22 (52.4) | 68 (62.4) | 0.262 | 228 (76.8) | 106 (79.7) | 0.500 | 271 (90.6) | 60 (88.2) | 0.548 | 521 (81.7) | 234 (75.5) | 0.027 | 0.292 | ||||||
Calcification | 1 (2.4) | 10 (9.2) | 0.276 | 6 (2.0) | 16 (12.0) | <0.001 | <0.001 | 15 (5.0) | 8 (11.8) | 0.073 | 22 (3.4) | 34 (11.0) | <0.001 | <0.001 | |||||
Pleural effusion | 1 (2.4) | 0 | 0.108 | 3 (1.0) | 0 | 0.135 | 9 (3.0) | 0 | 0.310 | 13 (2.0) | 0 | 0.026 | 0.998 | ||||||
Rimmed sign | 15 (35.7) | 18 (16.5) | 0.011 | 0.140 | 97 (32.7) | 31 (23.3) | 0.050 | 68 (22.7) | 14 (20.6) | 0.700 | 180 (28.2) | 63 (20.3) | 0.009 | 0.020 | |||||
Satellite nodules | 0 | 5 (4.6) | 0.366 | 4 (1.3) | 27 (20.3) | <0.001 | <0.001 | 5 (1.7) | 26 (38.2) | <0.001 | <0.001 | 9 (1.4) | 58 (18.7) | <0.001 | <0.001 | ||||
Halo sign | 1 (2.4) | 9 (8.3) | 0.349 | 0 | 9 (6.8) | <0.001 | 0.998 | 4 (1.3) | 10 (14.7) | <0.001 | <0.001 | 5 (0.8) | 28 (9.0) | <0.001 | <0.001 | ||||
Cutting sign | 0 | 3 (2.8) | 0.279 | 3 (1.0) | 3 (2.3) | 0.567 | 1 (0.3) | 3 (4.4) | 0.013 | 0.018 | 4 (0.6) | 9 (2.9) | 0.011 | 0.028 | |||||
Mixed signs | 0 | 3 (2.8) | 0.279 | 1 (0.3) | 14 (10.5) | <0.001 | 0.009 | 2 (0.7) | 11 (16.2) | <0.001 | 0.619 | 3 (0.5) | 28 (9.0) | <0.001 | <0.001 | ||||
Multiple lesions | 0 | 16 (14.7) | 0.020 | 0.998 | 7 (2.4) | 28 (21.1) | <0.001 | <0.001 | 3 (1.0) | 19 (27.9) | <0.001 | <0.001 | 10 (1.6) | 63 (20.3) | <0.001 | <0.001 |
Data are presented as the number of lesions. Data in parentheses are the percentage. Group 1: nodules ≤10 mm; Group 2: nodules >10 and ≤20 mm; Group 3: nodules >20 and ≤30 mm; and Group 4: all nodules. CT, computed tomography; Mal, malignant; Inf, inflammatory; HVP, heterogeneous ventilation or perfusion; ILD, interstitial lung disease.
CT quantitative features of patients with malignant and inflammatory lung nodules
Significant differences were observed in the quantitative CT features of the patients with malignant and inflammatory lung nodules, particularly in Groups 2 and 4. In these groups, the inflammatory nodules had shorter SDs than the malignant nodules (P<0.001). Further, in Group 3, the minimum CT value of the inflammatory nodules was measured at −101.3 HU with a standard deviation of ±104.6 HU, while that of the malignant nodules was −65.6 HU with a standard deviation of ±76.9 HU, and this difference was statistically significant (P<0.001) (Table 3).
Table 3
Variable | Group 1 | Group 2 | Group 3 | Group 4 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mal (n=42) |
Inf (n=109) |
P value | Mal (n=297) |
Inf (n=133) |
P value | Mal (n=299) |
Inf (n=68) |
P value | Mal (n=638) |
Inf (n=310) |
P value | ||||||||
Single | Multiple | Single | Multiple | Single | Multiple | Single | Multiple | ||||||||||||
LD (mm) | 8.8±1.5 | 7.7±1.7 | <0.001 | 0.807 | 15.8±2.9 | 14.9±2.9 | 0.002 | 0.098 | 25.3±2.9 | 25.0±2.7 | 0.439 | 19.8±6.1 | 14.6±6.8 | <0.001 | 0.604 | ||||
SD (mm) | 7.2±1.8 | 6.1±1.8 | 0.001 | 0.087 | 12.5±3.0 | 11.2±2.8 | <0.001 | <0.001 | 19.4±3.6 | 18.6±4.5 | 0.114 | 15.4±5.2 | 11.1±5.5 | <0.001 | <0.001 | ||||
CTmax value (HU) | 219.2±93.0 | 167.8±99.6 | 0.004 | 0.118 | 195.0±86.1 | 167.0±84.9 | 0.002 | 0.202 | 181.0±87.0 | 190.8±98.2 | 0.412 | 190.0±87.5 | 172.5±94.5 | 0.006 | 0.175 | ||||
CTmin value (HU) | −9.8±89.8 | −19.0±72.0 | 0.513 | −61.0±82.6 | −48.0±66.7 | 0.111 | −65.6±76.9 | −101.3±104.6 | 0.009 | <0.001 | −59.8±81.5 | −49.5±83.6 | 0.071 | ||||||
CTmean value (HU) | 94.3±64.9 | 71.5±47.3 | 0.042 | 0.904 | 64.6±44.9 | 57.4±39.5 | 0.110 | 55.2±31.0 | 45.6±24.2 | 0.006 | 0.673 | 62.1±41.8 | 59.8±40.9 | 0.408 | |||||
CTsd value (HU) | 69.0±34.5 | 61.5±39.8 | 0.289 | 67.6±34.3 | 58.5±34.0 | 0.011 | 0.885 | 58.5±33.5 | 67.9±40.8 | 0.079 | 63.4±34.2 | 61.6±37.7 | 0.466 |
Data are presented as mean ± standard deviation, and the statistical values are the independent sample t-test results. Group 1: nodules ≤10 mm; Group 2: nodules >10 and ≤20 mm; Group 3: nodules >20 and ≤30 mm; and Group 4: all nodules. CT, computed tomography; CTmax, CT density maximum; CTmin, CT density minimum; CTmean, CT density mean; CTsd, CT density standard deviation; inf, inflammatory; LD, long diameter; mal, malignant; SD, short diameter.
Feature selection and model construction
To construct each model, different independent risk factors were screened from 29 features using binary logistic regression. Ultimately, four models were constructed, comprising variations of two clinical features (age and surgical history) and 15 imaging features (lobulation, multiple lesions, satellite lesions, the halo sign, calcification, mixed signs, the cutting sign, airspace, the rimmed sign, MLC, SD, location, spiculation, minimum CT value, and the standard deviations of the CT values), all of which had AUC values >0.86.
Models 1–4 were based on Groups 1–4. Model 1 incorporated only three features, including two clinical features (age and surgical history) and one imaging feature (lobulation). Model 2, which was used to detect malignant nodules in lung nodules measuring 1–2 cm, included 8 features (lobulation, age, multiple lesions, satellite lesions, calcification, mixed signs, SD, and location), and had an AUC value of 0.902 [95% confidence interval (CI): 0.873–0.931], a sensitivity of 0.747, a specificity of 0.880, and an accuracy of 0.828. Model 3, which was used to detect malignant nodules measuring 2–3 cm, included 8 features (lobulation, multiple lesions, satellite lesions, the halo sign, the cutting sign, airspace, spiculation, and the minimum CT value), and had an AUC value of 0.943 (95% CI: 0.914–0.972), a sensitivity of 0.873, a specificity of 0.897, and an accuracy of 0.905. Model 4, which was used to predict malignancy, included one clinical feature (age) and 11 imaging features (lobulation, multiple lesions, satellite lesions, the halo sign, calcification, mixed signs, the cutting sign, airspace, the rimmed sign, MLC, and the standard deviations of the CT values), and had an AUC of 0.921 (95% CI: 0.903–0.940), a sensitivity of 0.831, a specificity of 0.868, and an accuracy of 0.847 (Table 4 and Figure 3). A confusion matrix was used to show the diagnostic efficiency of the different models, and the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of the models were calculated (Figure 4). The weight ratio of each feature in predicting inflammatory and malignant nodules in the models was calculated (Figure 4). Notably, the model features of lobulation, age, multiple lesions, and satellite lesions had greater weighting in the models (Figure 5).
Table 4
Model type | AUC (95% CI) | Standard error | Accuracy (%) | Sensitivity (%) | Specificity (%) | Z value | P value |
---|---|---|---|---|---|---|---|
Model 1 | 0.861 (0.803–0.921) | 0.030 | 73.5 | 81.0 | 78.9 | −1.916 | 0.055 |
Model 2 | 0.902 (0.873–0.931) | 0.015 | 82.8 | 74.7 | 88.0 | −1.086 | 0.277 |
Model 3 | 0.943 (0.914–0.972) | 0.015 | 90.5 | 87.3 | 89.7 | 1.258 | 0.209 |
Model 4 | 0.921 (0.903–0.940) | 0.009 | 84.7 | 83.1 | 86.8 | NA | NA |
Model 1: two clinical features (age and surgical history) and one imaging feature (lobulation). Model 2: 8 features (lobulation, age, multiple lesions, satellite lesions, calcification, mixed signs, short diameter, and location); Model 3: 8 features (lobulation, multiple lesions, satellite lesions, the halo sign, the cutting sign, airspace, spiculation, and the minimum CT value); Model 4: one clinical feature (age) and 11 imaging features (lobulation, multiple lesions, satellite lesions, the halo sign, calcification, mixed signs, the cutting sign, airspace, the rimmed sign, MLC, and the standard deviations of the CT values). AUC, area under the curve; CI, confidence interval; CT, computed tomography; MLC, multiple lung comorbidity; NA, not applicable.



Discussion
This study found that all four models had high efficacy in differentiating between inflammatory and malignant lung nodules (all models had AUC values >0.86). Specifically, Model 3, which included eight features for differentiation and which assigned greater weight to the four features of lobulation, age, the presence of multiple lesions, and the presence of satellite lesions, exhibited superior discriminative ability, particularly in nodules ranging from 2 to 3 cm. This model enables rapid and efficient differential diagnosis using fewer imaging features, thereby enhancing diagnostic efficiency. Thus, in this study, we established a novel clinical imaging model characterized by simplicity and clarity that can provide diagnosticians with a reliable basis for diagnosis.
We identified 17 features for the differential diagnosis of lung nodules. These features align with the factors such as age, spiculation, and calcification reported in previous studies by Yi et al., Wu et al. and Tan et al. (9,23,24). Further, we found no correlation between smoking status, gender, and age in the diagnosis of inflammatory and malignant lung nodules (25). Smoking history was not included as a characteristic in our models due to the low percentage of squamous carcinomas in the patient cohort (46/638) and the weaker association between smoking history and adenocarcinomas and other malignant tumors relative to squamous carcinomas. Using features such as spiculation and airspace, our models were able to effectively differentiate between solid lung adenocarcinomas and tuberculous granulomatous nodules. Additionally, consistent with previous findings (24), we found that satellite nodules were a characteristic manifestation of inflammatory granulomas. Moreover, our study reaffirmed the importance of nodule size and nodule growth rate as crucial predictors of malignancy (6). Smaller diameters are more effective at distinguishing between inflammatory and malignant nodules, diameter of pulmonary nodule might be caused by differing growth patterns.
Model 1 only included three features (age, surgical history, and lobulation). For nodules <1 cm, lobulation served as the primary identifier between inflammatory and malignant nodules, which is consistent with Chen et al.’s findings for patients with solitary sub-centimeter solid nodules (26). As lesions decrease in size, distinguishing between two lesions becomes more challenging due to limited information availability. We found that lobulation and satellite lesions served as common differentiation in Models 2 and 3. In nodules of 1–2 cm, malignant nodules progress faster than inflammatory nodules, resulting in more pronounced lobulation signs and an increased occurrence of satellite lesions in inflammatory nodules. These two imaging features had significant value in differentiating between malignant and inflammatory nodules. Model 4, which included 12 features, was developed to predict malignant nodules in lung nodules measuring <3 cm, and had an AUC value of 0.921. Despite its broader diagnostic scope, Model 4 did not outperform the other models. Lobulation was consistently effective across all models in discriminating between inflammatory and malignant nodules. Unlike inflammatory nodules, malignant nodules grow unevenly in all directions. All models showed strong performance, achieving AUC values >0.86. Model 3 had the highest AUC value. Model 4 was able to effectively identify progression between the inflammatory and malignant nodules; however, the simplicity of the other models facilitated the rapid identification of inflammatory and malignant lung nodules, enabling the accurate diagnosis of patients with potential malignant lesions.
The Mayo Clinic Model (MCM) holds considerable significance in clinical settings for assessing the lung cancer risk associated with pulmonary nodules. In one training set, the AUC of the Mayo model was 0.8328 (27). The MCM enjoys widespread international recognition and is extensively cited in the literature. Gould et al. established a clinical model of 375 confirmed solid pulmonary nodules, which had good accuracy and an AUC of the ROC curve of 0.79 (95% CI: 0.74 to 0.84) (28). We constructed Model 4 to predict malignant nodules, and it had an AUC of 0.921, and thus surpassed the performance of the aforementioned models. By incorporating additional valuable imaging features, we enhanced the accuracy of nodule characterization. Our models could provide clinicians and radiologists with essential imaging evidence that could aid in diagnostic decision-making.
AI is advancing rapidly, especially in the realm of lung nodule detection. However, extensive external validation is necessary to ensure models can accurately distinguish between benign and malignant nodules (29). Many AI systems are not yet available commercially for clinical use (2); thus, diagnosticians remain pivotal in nodule characterization. Some studies have indicated that validated risk calculator scores (ranging from 0.70 to 0.80) are effective at identifying malignant lung nodules (30-32). In our study, the AUC of all models exceeded 0.86, demonstrating strong model performance. Our models exhibited superior discriminatory power in this regard.
Our study had a number of advantages. First, all the patients included in the study underwent either surgical resection or percutaneous biopsy and had a clear pathological diagnosis. Second, we developed four new models using patients from multiple centers, an approach rarely employed in other similar studies. Third, data grouping modeling meets the needs of precision medical and could better guide clinical diagnosis and differentiation. Our models could reduce the misdiagnosis of malignant nodules, as relevant clinical and imaging data are easily available. We presented the risk coefficients of the independent risk factors in different models in the form of scale plots (Figure 4). The main limitations of this study include insufficient data and a lack of a proper external validation cohort. Despite the standardization of features, our images were obtained from different hospitals using different equipment; therefore, there are still certain differences in the qualitative and quantitative CT features. These limitations indicate areas for future research. In this study, nodule size was classified and modeled; in a subsequent study, nodule growth will be analyzed. We also intend to employ CT imaging deep-learning technology to determine the nature of pulmonary nodules. By effectively combining these aspects, we aim to conduct valuable research into the qualitative diagnosis of lung nodules.
Conclusions
The lung nodule grouping models based on clinical data and chest CT features were better able to directly determine whether the sold pulmonary nodules were inflammatory or malignant. Such grouping models could enable convenient and rapid diagnosis. In the future, we intend to explore the whole process related to the scientific management of lung nodules to achieve individualized precision medicine for lung nodules.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2338/rc
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2338/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and was approved by the respective Ethics Committees of Tongde Hospital of Zhejiang Province (No. 2022-029-JY), Anqing Municipal Hospital (No. 83230471), Taizhou Municipal Hospital (No. LWYJ2023059), and Shaoxing People’s Hospital (No. 2019-K-Y-257-01). The requirement of individual informed consent for this retrospective analysis was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Adams SJ, Stone E, Baldwin DR, Vliegenthart R, Lee P, Fintelmann FJ. Lung cancer screening. Lancet 2023;401:390-408. [Crossref] [PubMed]
- Wolf AMD, Oeffinger KC, Shih TY, Walter LC, Church TR, Fontham ETH, Elkin EB, Etzioni RD, Guerra CE, Perkins RB, Kondo KK, Kratzer TB, Manassaram-Baptiste D, Dahut WL, Smith RA. Screening for lung cancer: 2023 guideline update from the American Cancer Society. CA Cancer J Clin 2024;74:50-81. [Crossref] [PubMed]
- Yang Y, Liu J, Sun C, Shi Y, Hsing JC, Kamya A, Keller CA, Antil N, Rubin D, Wang H, Ying H, Zhao X, Wu YH, Nguyen M, Lu Y, Yang F, Huang P, Hsing AW, Wu J, Zhu S. Nonalcoholic fatty liver disease (NAFLD) detection and deep learning in a Chinese community-based population. Eur Radiol 2023;33:5894-906. [Crossref] [PubMed]
- Schwartz SM. Epidemiology of Cancer. Clin Chem 2024;70:140-9. [Crossref] [PubMed]
- Han B, Zheng R, Zeng H, Wang S, Sun K, Chen R, Li L, Wei W, He J. Cancer incidence and mortality in China, 2022. J Natl Cancer Cent 2024;4:47-53. [Crossref] [PubMed]
- Yao Y, Wang X, Guan J, Xie C, Zhang H, Yang J, Luo Y, Chen L, Zhao M, Huo B, Yu T, Lu W, Liu Q, Du H, Liu Y, Huang P, Luan T, Liu W, Hu Y. Metabolomic differentiation of benign vs malignant pulmonary nodules with high specificity via high-resolution mass spectrometry analysis of patient sera. Nat Commun 2023;14:2339. [Crossref] [PubMed]
Warkentin MT Al-Sawaihey H Lam S Liu G Diergaarde B Yuan J-M 2022 .- Li Y, Wang X, Zhang J, Zhang S, Jiao J. Applications of artificial intelligence (AI) in researches on non-alcoholic fatty liver disease(NAFLD) : A systematic review. Rev Endocr Metab Disord 2022;23:387-400. [Crossref] [PubMed]
- Yi L, Peng Z, Chen Z, Tao Y, Lin Z, He A, Jin M, Peng Y, Zhong Y, Yan H, Zuo M. Identification of pulmonary adenocarcinoma and benign lesions in isolated solid lung nodules based on a nomogram of intranodal and perinodal CT radiomic features. Front Oncol 2022;12:924055. [Crossref] [PubMed]
- McLoud TC, Little BP. Thoracic Radiology: Recent Developments and Future Trends. Radiology 2023;306:e223121. [Crossref] [PubMed]
- Mazzone PJ, Lam L. Evaluating the Patient With a Pulmonary Nodule: A Review. JAMA 2022;327:264-73. [Crossref] [PubMed]
- Huang YS, Wang TC, Huang SZ, Zhang J, Chen HM, Chang YC, Chang RF. An improved 3-D attention CNN with hybrid loss and feature fusion for pulmonary nodule classification. Comput Methods Programs Biomed 2023;229:107278. [Crossref] [PubMed]
- Venkadesh KV, Aleef TA, Scholten ET, Saghir Z, Silva M, Sverzellati N, Pastorino U, van Ginneken B, Prokop M, Jacobs C, Prior CT. Improves Deep Learning for Malignancy Risk Estimation of Screening-detected Pulmonary Nodules. Radiology 2023;308:e223308. [Crossref] [PubMed]
- Jirapatnakul A, Yip R, Myers KJ, Cai S, Henschke CI, Yankelevitz D. Assessing the impact of nodule features and software algorithm on pulmonary nodule measurement uncertainty for nodules sized 20 mm or less. Quant Imaging Med Surg 2024;14:5057-71. [Crossref] [PubMed]
- Peng M. Classification of pulmonary nodules in the era of precision medicine. Lancet Digit Health 2023;5:e633-4. [Crossref] [PubMed]
- Kammer MN, Mahapatra S, Paez R, Chen H, Kaizer A, Deppen S, et al. EP01.05-009 Simulation-Based Sample Size Estimation for an Early Detection of Lung Cancer Clinical Utility Trial in Indeterminate Pulmonary Nodules. J Thorac Oncol 2022;17:S185-S6. [Crossref]
- Zhang R, Hong M, Cai H, Liang Y, Chen X, Liu Z, Wu M, Zhou C, Bao C, Wang H, Yang S, Hu Q. Predicting the pathological invasiveness in patients with a solitary pulmonary nodule via Shapley additive explanations interpretation of a tree-based machine learning radiomics model: a multicenter study. Quant Imaging Med Surg 2023;13:7828-41. [Crossref] [PubMed]
- Silva M, Schaefer-Prokop CM, Jacobs C, Capretti G, Ciompi F, van Ginneken B, Pastorino U, Sverzellati N. Detection of Subsolid Nodules in Lung Cancer Screening: Complementary Sensitivity of Visual Reading and Computer-Aided Diagnosis. Invest Radiol 2018;53:441-9. [Crossref] [PubMed]
- Armato SG 3rd, Drukker K, Li F, Hadjiiski L, Tourassi GD, Engelmann RM, Giger ML, Redmond G, Farahani K, Kirby JS, Clarke LP. LUNGx Challenge for computerized lung nodule classification. J Med Imaging (Bellingham) 2016;3:044506. [Crossref] [PubMed]
- Lancaster H, Zheng S, Aleshina O, Yu D, Chernina V, Heuvelmans M, et al. Inter-Reader Agreement When Using Artificial Intelligence for Classification of Solid Pulmonary Nodules in Ultra-Low Dose Ct Baseline Lung Cancer Screening. Chest 2022;161:A565. [Crossref]
- McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369:910-9. [Crossref] [PubMed]
- Takahashi Y, Dungubat E, Kusano H, Fukusato T. Artificial intelligence and deep learning: New tools for histopathological diagnosis of nonalcoholic fatty liver disease/nonalcoholic steatohepatitis. Comput Struct Biotechnol J 2023;21:2495-501. [Crossref] [PubMed]
- Wu Z, Huang T, Zhang S, Cheng D, Li W, Chen B. A prediction model to evaluate the pretest risk of malignancy in solitary pulmonary nodules: evidence from a large Chinese southwestern population. J Cancer Res Clin Oncol 2021;147:275-85. [Crossref] [PubMed]
- Tan H, Wang Y, Jiang Y, Li H, You T, Fu T, Peng J, Tan Y, Lu R, Peng B, Huang W, Xiong F. A study on the differential of solid lung adenocarcinoma and tuberculous granuloma nodules in CT images by Radiomics machine learning. Sci Rep 2023;13:5853. [Crossref] [PubMed]
- Li L, Ye Z, Yang S, Yang H, Jin J, Zhu Y, Tao J, Chen S, Xu J, Liu Y, Liang W, Wang B, Yang M, Huang Q, Chen Z, Li W, Fan JB, Liu D. Diagnosis of pulmonary nodules by DNA methylation analysis in bronchoalveolar lavage fluids. Clin Epigenetics 2021;13:185. [Crossref] [PubMed]
- Chen X, Feng B, Chen Y, Liu K, Li K, Duan X, Hao Y, Cui E, Liu Z, Zhang C, Long W, Liu X. A CT-based radiomics nomogram for prediction of lung adenocarcinomas and granulomatous lesions in patient with solitary sub-centimeter solid nodules. Cancer Imaging 2020;20:45. [Crossref] [PubMed]
- Lockhart ME, Smith AD. Fatty Liver Disease: Artificial Intelligence Takes on the Challenge. Radiology 2020;295:351-2. [Crossref] [PubMed]
- Gould MK, Ananth L, Barnett PGVeterans Affairs SNAP Cooperative Study Group. A clinical model to estimate the pretest probability of lung cancer in patients with solitary pulmonary nodules. Chest 2007;131:383-8. [Crossref] [PubMed]
- Li D, Mikela Vilmun B, Frederik Carlsen J, Albrecht-Beste E, Ammitzbøl Lauridsen C, Bachmann Nielsen M, Lindskov Hansen K. The Performance of Deep Learning Algorithms on Automatic Pulmonary Nodule Detection and Classification Tested on Different Datasets That Are Not Derived from LIDC-IDRI: A Systematic Review. Diagnostics (Basel) 2019;9:207. [Crossref] [PubMed]
- Balekian AA, Silvestri GA, Simkovich SM, Mestaz PJ, Sanders GD, Daniel J, Porcel J, Gould MK. Accuracy of clinicians and models for estimating the probability that a pulmonary nodule is malignant. Ann Am Thorac Soc 2013;10:629-35. [Crossref] [PubMed]
- MacMahon H, Li F, Jiang Y, Armato SG 3rd. Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists. Chest 2019;156:112-9. [Crossref] [PubMed]
- Du W, He B, Luo X, Chen M. Diagnostic Value of Artificial Intelligence Based on CT Image in Benign and Malignant Pulmonary Nodules. J Oncol 2022;2022:5818423. [Crossref] [PubMed]