Development of a nomogram for predicting the risk of lymph node metastasis in non-small cell lung cancer
Introduction
Lung cancer is one of the leading causes of cancer-related death worldwide (1,2). Approximately 80% of lung cancers are non-small cell lung cancers (NSCLCs) (3), and lymph node metastasis (LNM) is an important factor for determining patient prognosis and treatment strategy. LNM indicates that the disease is in an advanced stage, and is associated with a higher risk of recurrence and a lower survival rate. LNM portends a poor prognosis for patients with NSCLC (4).
With the widespread use of low-dose spiral computed tomography (CT) in health examinations and disease diagnosis, the incidence of NSCLC is on the rise (5). Segmentectomy is becoming increasingly popular in the treatment of early-stage NSCLC (6). However, in some case reports of NSCLC, the predictive accuracy of preoperative lymph node (LN) staging by CT scan is only 78.7% (7). The thorough removal of metastatic LNs during surgery is crucial for improving patient survival rates. Therefore, the accurate prediction of LN status is essential for the formulation of personalized treatment plans and the improvement of patient prognosis.
Significant advancements have been made in the study of LNM in NSCLC. Wang et al. found that NSCLC with ground-glass opacity (GGO) nodules of 10 mm or less in diameter do not exhibit LNM and may not require systematic LN dissection (8). Therefore, this study incorporated indicators such as mixed ground-glass opacity (mGGO) and the shortest diameter of the mass for research. Ma et al. and Shimada et al. used deep-learning methods to identify LNM in NSCLC; however, their models did not also include the patient’s underlying lung diseases, and difficulties arose in their clinical application (9,10). Thus, developing a non-invasive tool with strong clinical applicability that can predict LN status preoperatively is of the utmost importance.
In this study, we comprehensively analyzed the clinical, laboratory, and imaging characteristics of NSCLC patients to explore the risk factors for LNM in NSCLC patients. Using binary logistic regression, we created a nomogram model to develop and validate a diagnostic model with high accuracy and clinical applicability for predicting LNM in NSCLC patients. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2016/rc).
Methods
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of Tongde Hospital of Zhejiang Province Afflicted to Zhejiang Chinese Medical University (Tongde Hospital of Zhejiang Province) (No. 2022-012-JY, center 1), The First Affiliated Hospital of Bengbu Medical University (No. LWSL202300145, center 2), and Shaoxing People’s Hospital (No. 2015-118, center 3). As this was a retrospective study, the requirement of informed consent was waived.
We conducted a retrospective analysis of NSCLC patients from Centers 1 and 2 (January 2014 to October 2023) and validated the results externally with patients from center 3 (January 2023 to December 2024).
Patients were included in the study if they met the following inclusion criteria: (I) had NSCLC confirmed by surgical pathology and had had their LNM status determined; (II) complete clinical and pathological data available; (III) had not received neoadjuvant chemotherapy or radiotherapy before surgery; and (IV) had no atelectasis or active pulmonary inflammation. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had blurred CT imaging or the presence of artifacts; (II) had adenocarcinoma in situ; (III) had atypical adenomatous hyperplasia; and/or (IV) had a time interval exceeding one month between the imaging examination and pathological results. The study flowchart is provided in Figure 1.
Clinical data
The clinical data and laboratory test results of the patients selected for inclusion in the study were retrieved from the respective information systems of the three central hospitals, including age, gender, smoking history, high blood pressure, diabetes mellitus, cardiovascular and cerebrovascular diseases, hepatitis/cirrhosis, and tumor markers.
CT scanning protocol
CT imaging was conducted using the following equipment: Optima CT540, Optima CT680 (GE Healthcare, Chicago, IL, USA), Lightspeed 16 (GE), SOMATOM Definition Flash (Siemens Healthineers, Forchheim, Germany), and Incisive CT Power (Philips Healthcare, Best, Netherlands). Post-processing reconstruction produced CT images with slice thicknesses ranging from 0.625–1.5 mm. The CT scanning parameters were as follows: tube voltage: 100–120 kV, tube current: 250–300 mA, slice thickness: 1.0–7.5 mm, and matrix size: 512×512.
Image analysis
Two radiologists with five years of experience each assessed the CT images of each patient. The morphological characteristics of the enrolled patients included emphysema or bullae, heterogeneous ventilation or perfusion (HVP), interstitial lung disease, bronchiectasis, multiple lung comorbidity, mGGO, location, morphology, lobulation, spiculation, airspace, air bronchogram, pleural tags, obstructive inflammation, calcification, and pleural effusion. They also measured each lesion’s long diameter and short diameter (SD), and determined the maximum attenuation, minimum attenuation, mean attenuation, and standard deviation of attenuation. If any discrepancies arose, a senior radiologist with 19 years of experience re-evaluated the CT images.
Emphysema or bullae refers to the abnormal and persistent dilatation of the distal parts of the respiratory bronchioles, accompanied by the destruction of alveolar walls and small bronchioles, which manifests on CT as cystic lucencies without lung markings (11,12). HPV refers to uneven lung tissue density, presenting as mottled areas with alternating high and low attenuation. mGGO refers to nodules with mixed patterns/subsolid nodules that include areas of GGO interspersed with solid components. The air bronchogram sign refers to the presence of dilated bronchi filled with air within the lesion.
Statistical analysis
The statistical analysis was conducted using SPSS 26 (IBM, USA) and R statistical software (R 4.3.3). The categorical variables were compared using the Pearson Chi-squared or Fisher’s exact test, and are represented as the number (percentage). The normally distributed continuous variables were compared using the independent samples t-test, while those not normally distributed were assessed using the Mann-Whitney U test. The CT imaging, laboratory tests, and clinical data were compared with the pathological results. A univariate analysis was performed initially to identify significant predictors (P<0.05). Feature selection was then performed using recursive feature elimination with cross-validation in Python (3.9) and the scikit-learn package was used to further reduce the number of predictors. Predictors with a P value less than 0.05 in the univariate analysis were included in the multivariate regression analysis; a P less than 0.05 indicated statistical significance. Multivariate logistic regression analyses were performed to estimate the strength of association using the odds ratios (ORs) and their 95% confidence intervals (CIs) (Figure S1).
Predictive model development and validation
Construction of nomogram
A binary logistic regression analysis was used to identify independent risk factors, and a nomogram was constructed using R statistical software. The receiver operating characteristic (ROC) curve was plotted to determine the area under the curve (AUC). Each independent risk factor was assigned a score based on the regression model, and the sum of all these scores was calculated to estimate the predictive probability of LN positivity in patients with NSCLC.
Nomogram performance
Calibration curve, clinical decision, and clinical impact curve (CIC) analyses were conducted to assess the predictive performance of the nomogram. The ROC curve was employed to evaluate the model’s discriminative ability. Calibration, which measures the agreement between predicted probabilities and actual outcomes, was performed. The Hosmer-Lemeshow test was used to assess the calibration capability, and a Brier score of less than 0.25 indicated statistical significance (13). Internal validation was conducted using a bootstrap method with 1,000 repetitions, and a calibration plot was ultimately generated. The clinical effectiveness of the predictive nomogram was evaluated using a decision curve analysis (DCA) based on the net benefit at various threshold probabilities. The total benefit to the population was assessed using the CIC (14).
Graph plot
All figures in this article were created using Adobe Illustrator 2024, Prism 8.0, and FIGDRAW 2.0. The “rms” package in R software was used to plot the nomogram and calibration curves, while the “rmda” package was used to generate the DCA and CIC results.
Results
Patient characteristics
A total of 2,725 patients (1,153 male and 1,572 female with an average age of 61.57±10.55 years) from centers 1 and 2 were enrolled in the study. Of the patients, 467 (17.14%) had emphysema/bullae, 347 (12.73%) had HVP, and 364 (13.36%) had obstructive inflammation. Among the patients with LN positivity, 804 (29.5%) had an mGGO score of 5, and the average SD of tumor size on the CT images was 17.32±10.53 mm. Additionally, a total of 112 patients [65 male and 47 female with an average age of 62.70±10.25 years, n (LN–) =97, n (LN+) =15] from center 3 were enrolled in the study. The training dataset included 2,180 (80%) patients [n (LN–) =1,854, n (LN+) =326], and the test dataset included 545 patients (20%) [n (LN–) =470, n (LN+) =75].
Of the 2,837 patients included in the study, there were 2,174 patients in tumor stage 1 (T1), 468 in tumor stage 2 (T2), 147 in tumor stage 3 (T3), and 48 in tumor stage 4 (T4); 2,421 in node stage 0 (N0), 175 in node stage 1 (N1), 60 in node stage 1–2 (N1–2), and 181 in node stage 2 (N2); and 2,805 in metastasis stage 0 (M0), and 32 in metastasis stage X (Mx). The demographic characteristics and imaging feature data of the patients are presented in Table 1. LNM was the most common in zones 10–14, occurring in 37.4% of cases, followed by zones 2–4 (31.9%), and zones 7–9 (21.2%), and was least frequent in zones 5–6 (9.8%) (Figure 2A,2B).
Table 1
| Characteristics | Training | PTraining | Test | PTest | Exvad | PExvad | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LN– (n=1,854) | LN+ (n=326) | Uni | Mul | LN– (n=470) | LN+ (n=75) | Mul | LN– (n=97) | LN+ (n=15) | Mul | ||||||
| Gender‡ | <0.001* | NA | NA | NA | |||||||||||
| Female | 1,113 (60.0) | 150 (46.0) | 281 (59.8) | 28 (37.3) | 44 (45.4) | 3 (20.0) | |||||||||
| Male | 741 (40.0) | 176 (54.0) | 189 (40.2) | 47 (62.7) | 53 (54.6) | 12 (80.0) | |||||||||
| Age† (years) | 60.1±10.7 | 61.7±9.7 | 0.016* | NA | 58.9±11.4 | 62.3±9.2 | 62.1±10.8 | 63.3±9.7 | |||||||
| TNM‡ | |||||||||||||||
| Pathological T staging | NA | NA | NA | NA | |||||||||||
| 1 | 1,535 (82.8) | 118 (36.2) | 395 (84.0) | 24 (32.0) | 91 (93.8) | 11 (73.3) | |||||||||
| 2 | 246 (13.3) | 130 (39.9) | 56 (11.9) | 28 (37.3) | 5 (5.2) | 3 (20.0) | |||||||||
| 3 | 60 (3.2) | 60 (18.4) | 11 (2.3) | 14 (18.7) | 1 (1.0) | 1 (6.7) | |||||||||
| 4 | 13 (0.7) | 18 (5.5) | 8 (1.7) | 9 (12.0) | 0 | 0 | |||||||||
| Pathological N staging | NA | NA | NA | NA | |||||||||||
| 0 | 1,854 (100.0) | 0 | 470 (100.00) | 0 | 97 (100.0) | 0 | |||||||||
| 1 | 0 | 138 (42.3) | 0 | 31 (41.3) | 0 | 6 (40.0) | |||||||||
| 1–2 | 0 | 49 (15.0) | 0 | 10 (13.3) | 0 | 1 (6.7) | |||||||||
| 2 | 0 | 139 (42.6) | 0 | 34 (45.3) | 0 | 8 (53.3) | |||||||||
| Pathological M staging | NA | NA | NA | NA | |||||||||||
| M0 | 1,854 (100.0) | 303 (92.94) | 470 (100.0) | 67 (89.33) | 97 (100.0) | 14 (93.3) | |||||||||
| Mx | 0 | 23 (7.06) | 0 | 8 (10.66) | 0 | 1 (6.7) | |||||||||
| Smoke‡ | 0.008* | 0.073 | NA | NA | |||||||||||
| Never | 1,563 (84.3) | 235 (72.1) | 410 (87.2) | 50 (66.7) | 71 (73.2) | 5 (33.3) | |||||||||
| Current | 176 (9.5) | 50 (15.3) | 41 (8.7) | 9 (12.0) | 21 (21.6) | 7 (46.7) | |||||||||
| Former | 115 (6.2) | 41 (12.6) | 19 (4.0) | 16 (21.3) | 5 (5.2) | 3 (20.0) | |||||||||
| HBP‡ | 141 (7.6) | 23 (7.1) | 0.728 | NA | 33 (7.0) | 5 (6.7) | NA | 41 (42.3) | 6 (40.0) | NA | |||||
| DM‡ | 45 (2.4) | 11 (3.4) | 0.319 | NA | 13 (2.8) | 5 (6.7) | NA | 10 (10.3) | 3 (20.0) | NA | |||||
| Cardiovascular diseases‡ | 42 (2.3) | 12 (3.7) | 0.129 | NA | 12 (2.6) | 1 (1.3) | NA | 3 (3.1) | 1 (6.7) | NA | |||||
| Tumor indicator‡ | <0.001* | NA | NA | NA | |||||||||||
| Normal | 1,293 (69.7) | 169 (51.8) | 335 (71.3) | 40 (53.3) | 94 (96.9) | 13 (86.7) | |||||||||
| Abnormal | 559 (30.2) | 157 (48.2) | 135 (28.7) | 35 (46.7) | 3 (3.1) | 2 (13.3) | |||||||||
| Hepatitis cirrhosis‡ | 9 (0.5) | 2 (0.6) | 0.506 | NA | 2 (0.4) | 0 | NA | 3 (3.1) | 0 | NA | |||||
| Multiple comorbidities‡ | 40 (2.2) | 9 (2.8) | 0.498 | NA | 7 (1.5) | 3 (4) | NA | 7 (7.2) | 1 (6.7) | NA | |||||
| Emphysema/bullae‡ | 307 (16.6) | 71 (21.8) | 0.022* | 0.002* | 67 (14.3) | 22 (29.3) | 0.882 | 19 (19.6) | 5 (33.3) | 0.5575 | |||||
| HVP‡ | 256 (13.8) | 30 (9.2) | 0.023* | 0.002* | 57 (12.1) | 4 (5.3) | 0.054 | 17 (17.5) | 3 (20.0) | 0.1744 | |||||
| ILD‡ | 29 (1.6) | 14 (4.3) | 0.001* | NA | 8 (1.7) | 3 (4.0) | NA | 3 (3.1) | 1 (6.7) | NA | |||||
| Bronchiectasis‡ | 18 (1.0) | 6 (1.8) | 0.165 | NA | 2 (0.4) | 0 | NA | 3 (3.1) | 2 (13.3) | NA | |||||
| MLC‡ | 66 (3.6) | 20 (6.1) | 0.043* | NA | 15 (3.2) | 2 (2.7) | NA | 8 (8.2) | 3 (20.0) | NA | |||||
| mGGO‡ | <0.001* | <0.001* | <0.001* | 0.0190* | |||||||||||
| 0 (0%) | 585 (31.6) | 0 | 149 (31.7) | 0 | 22 (22.7) | 0 | |||||||||
| 1 (≤25%) | 470 (25.1) | 4 (1.2) | 124 (26.4) | 2 (2.7) | 11 (11.3) | 0 | |||||||||
| 2 (≤50%) | 135 (7.3) | 7 (2.1) | 37 (7.9) | 1 (1.3) | 13 (13.4) | 0 | |||||||||
| 3 (≤75%) | 128 (6.9) | 19 (5.8) | 36 (7.7) | 5 (6.7) | 14 (14.4) | 0 | |||||||||
| 4 (<100%) | 130 (7.0) | 43 (13.2) | 40 (8.5) | 6 (8.0) | 10 (10.3) | 1 (6.7) | |||||||||
| 5 (100%) | 406 (21.9) | 253 (77.6) | 84 (17.9) | 61 (81.3) | 27 (27.8) | 14 (93.3) | |||||||||
| Location‡ | 0.021* | 0.126 | NA | NA | |||||||||||
| Right upper lobe | 600 (32.4) | 83 (25.5) | 155 (33.0) | 20 (26.7) | 34 (35.1) | 1 (6.7) | |||||||||
| Right middle lobe | 152 (8.2) | 24 (7.4) | 48 (10.2) | 7 (9.3) | 8 (8.2) | 3 (20.0) | |||||||||
| Right lower lobe | 368 (19.8) | 62 (19.0) | 79 (16.8) | 19 (25.3) | 20 (20.6) | 2 (13.3) | |||||||||
| Left upper lobe | 481 (25.9) | 94 (28.8) | 109 (23.2) | 14 (18.7) | 29 (29.9) | 4 (26.7) | |||||||||
| Left lower lobe | 253 (13.6) | 63 (19.3) | 79 (16.8) | 15 (20.0) | 6 (6.2) | 5 (33.3) | |||||||||
| Morphology‡ | 0.001* | NA | NA | NA | |||||||||||
| Round | 83 (4.5) | 5 (1.5) | 17 (3.6) | 1 (1.3) | 22 (22.7) | 4 (26.7) | |||||||||
| Oval | 225 (12.1) | 23 (7.1) | 66 (14.0) | 2 (2.7) | 7 (7.2) | 0 | |||||||||
| Irregular | 1,546 (83.4) | 298 (91.4) | 387 (82.3) | 72 (96.0) | 68 (70.1) | 11 (73.3) | |||||||||
| Lobulation‡ | 1,240 (66.9) | 285 (87.4) | <0.001* | NA | 319 (67.9) | 62 (82.7) | NA | 62 (63.9) | 12 (80.0) | NA | |||||
| Spiculation‡ | <0.001* | 0.112 | NA | NA | |||||||||||
| No | 1,125 (60.7) | 90 (27.6) | 299 (63.6) | 30 (40.0) | 56 (57.7) | 9 (60.0) | |||||||||
| Short | 550 (29.7) | 164 (50.3) | 117 (24.9) | 28 (37.3) | 33 (34.0) | 3 (20.0) | |||||||||
| Long | 179 (9.7) | 72 (22.1) | 54 (11.5) | 17 (22.7) | 8 (8.2) | 3 (20.0) | |||||||||
| Airspace‡ | 352 (19.0) | 66 (20.2) | 0.114 | NA | 88 (18.7) | 17 (22.7) | NA | 9 (9.3) | 3 (20.0) | NA | |||||
| Air bronchogram‡ | 434 (23.4) | 103 (31.6) | 0.002* | 0.103 | 101 (21.5) | 20 (26.7) | NA | 17 (17.5) | 5 (33.3) | NA | |||||
| Pleural tags‡ | <0.001* | 0.286 | NA | NA | |||||||||||
| Type 0 | 638 (34.4) | 36 (11.0) | 168 (35.7) | 12 (16.0) | 54 (55.7) | 7 (46.7) | |||||||||
| Type I | 714 (38.5) | 102 (31.3) | 180 (38.3) | 36 (48.0) | 10 (10.3) | 3 (20.0) | |||||||||
| Type II | 323 (17.4) | 104 (31.9) | 73 (15.5) | 20 (26.7) | 8 (8.2) | 0 | |||||||||
| Type III | 179 (9.7) | 84 (25.8) | 49 (10.4) | 7 (9.3) | 13 (13.4) | 1 (6.7) | |||||||||
| Obstructive inflammation‡ | 180 (9.7) | 106 (32.5) | <0.001* | 0.027* | 52 (11.1) | 26 (34.7) | 0.478 | 5 (5.2) | 6 (40.0) | 0.1327 | |||||
| Calcification‡ | 23 (1.2) | 9 (2.8) | 0.038* | NA | 5 (1.1) | 2 (2.7) | NA | 2 (2.1) | 0 | NA | |||||
| Pleural effusion‡ | 1 (0.1) | 16 (4.9) | <0.001* | NA | 2 (0.4) | 5 (6.7) | NA | 3 (3.1) | 0 | NA | |||||
| LD† (mm) | 18.7±10.6 | 31.7±14.2 | <0.001* | 0.671 | 18.9±10.8 | 35.2±18.1 | NA | 19.5±17.7 | 35.2±15.9 | NA | |||||
| SD† (mm) | 13.9±7.9 | 23.5±10.5 | <0.001* | 0.012* | 13.9±8.4 | 25.9±13.9 | 0.001* | 14.5±14.6 | 25.3±11.6 | 0.0517 | |||||
| CTmax† (HU) | 50.1±222.5 | 186.2±90.7 | <0.001* | 0.091 | 50.3±216.2 | 170.9±84.7 | NA | –6.3±240.1 | 200.8±84.0 | NA | |||||
| CTmin† (HU) | –251.3±251 | –84.9±83.4 | <0.001* | 0.080 | –239.0±240.0 | –81.7±90.9 | NA | –339.7±301.9 | –151.3±104.8 | NA | |||||
| CTmean† (HU) | –95.4±216.1 | 47.6±24.6 | <0.001* | 0.121 | –89.9±204.9 | 42.8±27.2 | NA | –169.9±249.2 | 28.4±9.9 | NA | |||||
| CTsd† (HU) | 88.4±51.9 | 62.0±35.8 | <0.001* | 0.862 | 86.9±51.8 | 57.3±31.3 | NA | 86.2±52.4 | 125.7±231.6 | NA | |||||
†, data are expressed as the mean ± standard deviation, and the statistical values are the independent sample t-test results. The data are qualitative variables; ‡, data are expressed as n (%), and the statistical values are Pearson’s χ2 test, Spearman’s χ2 test, and Fisher’s exact test results; *, statistical significance (P<0.05). CT, computed tomography; CTmax, maximum computed tomography attenuation value within a defined region of interest; CTmean, average within a defined region of interest attenuation value (HU) across all voxels in a region of interest; CTmin, minimum computed tomography attenuation value (HU) within a defined region of interest; CTsd, standard deviation of computed tomography attenuation values (HU) in a region of interest; DM, diabetes mellitus; Exvad, external verification; HBP, high blood pressure; HVP, heterogeneous ventilation or perfusion; HU, Hounsfield units; ILD, interstitial lung disease; LD, long diameter; LN, lymph node; M, metastasis; MLC, multiple lung comorbidity; Mul, multivariate analysis; mGGO, mixed ground-glass opacity; N, node; NA, not applicable; SD, short diameter; T, tumor; Uni, univariate analysis.
Figure 2C provides the results of four patients: patient A had a pathological LN-negative result but a CT imaging LN-positive result; patient B had a pathological LN-positive result and a CT imaging LN-positive result; patient C had a pathological LN-negative result and a CT imaging LN-negative result; patient D had a pathological LN-positive result but a CT imaging LN-negative result.
The AUC of the ROC curve for the training dataset was 0.893 (95% CI: 0.873–0.914), and the AUC for the test dataset was 0.907 (95% CI: 0.879–0.935) (Figure 3A). The accuracy, sensitivity, and specificity of the test dataset were 86.2%, 89.3%, and 80.9%, respectively. The accuracy, sensitivity, and specificity of the training dataset were 76.8%, 91.6%, and 74.2%, respectively (Table 2).
Table 2
| Evaluation index | ModelTraining | ModelTest | ModelExvad |
|---|---|---|---|
| AUC (95% CI) | 0.893 (0.873–0.914) | 0.907 (0.879–0.935) | 0.923 (0.870–0.976) |
| Accuracy (%) | 76.8 | 86.2 | 83.0 |
| Sensitivity (%) | 91.6 | 89.3 | 93.3 |
| Specificity (%) | 74.2 | 80.9 | 81.4 |
| Brier score | 0.092 | 0.081 | 0.074 |
| C-index | 0.887 | 0.907 | 0.923 |
| R-square (95% CI) | 0.431 (0.244–0.520) | 0.465 (0.249–0.553) | 0.528 (0.255–0.571) |
The AUC with a 95% CI, along with the accuracy, sensitivity, specificity, and Brier score of the model were computed. The nomogram-related C-index and R-squared value were evaluated within the model. AUC, area under the curve; CI, confidence interval; C-index, concordance index.
A total of 15 factors were included in the multivariate logistic regression analysis, and it was ultimately determined that mGGO (OR =2.385, 95% CI: 1.993–2.854; P<0.001; B value =0.869), obstructive inflammation (OR =2.055, 95 % CI: 1.288–3.281; P=0.003; B value =0.720), SD (OR =1.049, 95% CI: 1.016–1.084; P=0.004; B value =0.048), HVP (OR =0.393, 95% CI: 0.196–0.789; P=0.009; B value =–0.933), and emphysema or bullae (OR =0.655, 95% CI: 0.389–1.103; P=0.049; B value =–0.423) were independent risk factors for LNM (Figure 3B-3E).
The nomogram for predicting the probability of LNM in lung cancer was plotted using R statistical software (Figure 4A,4B). The Brier scores were 0.092 for the training dataset, and 0.081 for the test dataset, both of which were less than 0.25, indicating good model calibration. The concordance index (C-index) values for the model were 0.887 for the training dataset and 0.907 for the test dataset. The R-squared values were 0.431 (95% CI: 0.244–0.520) and 0.465 (95% CI: 0.249–0.553) for the training and test datasets, respectively (Table 2).
In both the training and test datasets, the calibration curve showed great stability (Figure 5), and the DCA results showed that within the threshold probability range of 0.01 to 0.60, the nomogram provided a more significant net benefit than the “treat all” or “no treatment” strategies (Figure 6) (15).
The CIC results showed the clinical effectiveness of the nomogram (Figure 7). The number of patients at high risk closely matched those with high-risk outcomes (when the probability was more significant than 0.4).
In the Exvad set, the model had an AUC of 0.923 (95% CI: 0.870–0.976), an accuracy of 83.0%, a sensitivity of 93.3%, a specificity of 81.4%, a Brier score of 0.074, a C-index of 0.923, and an R-squared value of 0.528 (95% CI: 0.255–0.571) (Figure S2).
Discussion
This study conducted a retrospective analysis of a large dataset of lung cancer patients and developed a predictive model to predict lung cancer patients’ LNM status accurately. The results indicated that the model in this study had a high AUC and high accuracy. The calibration curve, DCA, and CIC results also showed the clinical applicability of the model. The model had excellent predictive performance and satisfactory calibration.
As a predictive tool, a nomogram provides a visual representation of the probability of an event occurring to assist in clinical decision making (16). Its principle involves integrating multiple variables into a simple and user-friendly scoring system, which is why it is widely applied in clinical practice.
In 2020, Wang et al. identified mGGO, the tumor diameter, serum carcinoembryonic antigen levels, and visceral invasion as risk factors for predicting LNM in lung cancer (17). Xue et al. identified age, tumor diameter, and Ki-67 as risk factors for LNM (18). The present study found that mGGO, obstructive inflammation, emphysema or bullae, SD, and HVP are independent risk factors for LNM. Notably, mGGO and SD are common high-risk factors; however, our identification of pulmonary disease-related factors (i.e., obstructive inflammation, emphysema or bullae, and HVP) represents a novel contribution. These findings provide significant prognostic information for the treatment strategy of early-stage lung cancer patients.
The clinical significance of mGGO lies in its potential to indicate tumors’ invasiveness and growth characteristics. In our study, the degree of solid components within the nodules was significantly associated with the risk of LNM, which is consistent with previous research findings that emphasize the importance of mGGO in assessing early LNM in lung cancer (19).
Additionally, the presence of obstructive inflammation and emphysema or bullae often suggests chronic obstructive pulmonary disease, which provides a favorable microenvironment for tumor invasion and metastasis. This finding suggests that in clinical practice, more attention should be paid to the effects of these chronic inflammatory conditions on LNM in lung cancer.
The correlation between LNM and SD, as an indicator of tumor size, was validated in this study. Tumor size is a significant factor affecting is metastatic potential, with smaller tumors potentially carrying a lower risk of lymphatic spread (16,19,20).
HVP is commonly used to assess lung function and diagnose pneumonia. In terms of tumor prediction, it reflects the unevenness of blood flow and ventilation within the lung lobes caused by tumor compression. In our model, HVP was negatively correlated with the risk of LNM, suggesting that it may play a protective role in tumor development.
The correlation between pathological T and N staging with risk factors like tumor size (SD), emphysema/bullae, obstructive inflammation, and mGGO was significant. Higher T and N stages were correlated with larger tumors, more emphysema, increased obstructive inflammation, and higher mGGO expression. No significant correlation was found between T staging and HVP. The patients with squamous cell carcinoma had larger tumor sizes, more emphysema, higher obstructive inflammation, and stronger mGGO 5 expression, while those with invasive adenocarcinoma had smaller tumors and lower mGGO expression. These findings underscore the relevance of staging and risk factors in understanding tumor behavior and prognosis (see Appendix 1).
Our predictive model demonstrated high accuracy, sensitivity, and specificity in the training and test datasets. The high AUC of the ROC curve confirmed the model’s predictive power. Additionally, through DCA and CIC, we showed the clinical application value of the model, which could provide strong support for clinical decision making (15).
This study provides valuable insights into the risk factors and prediction of LNM, and it established a predictive tool; however, it also had some limitations. First, as a retrospective study, there is the potential of selection bias and issues with data integrity. Second, our study population did not undergo follow-up observation, which limits the accuracy of our model’s long-term predictions. Finally, with the continuous advancement of medical imaging technology and biomarker detection, future studies may identify additional factors related to LNM, such as hematological tests and pathological and genetic examinations (21-24), necessitating updates and improvements to the existing model.
Conclusions
Our study provides a robust tool for predicting LNM in lung cancer that could aid in guiding clinical treatment decisions and improving patient outcomes. In our future research, we will continue to explore and refine this predictive model to achieve more accurate results.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2016/rc
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2016/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of Tongde Hospital of Zhejiang Province Afflicted to Zhejiang Chinese Medical University (Tongde hospital of Zhejiang Province) (No. 2022-012-JY, center 1), The First Affiliated Hospital of Bengbu Medical University (No. LWSL202300145, center 2), and Shaoxing People’s Hospital (No. 2015-118, center 3). As this was a retrospective study, the requirement of informed consent was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Jin W, Huang K, Zhang M, Gao W, Luo Q, Ye X, Yuan Z. Global, regional, and national cancer burdens of respiratory and digestive tracts in 1990-2044: A cross-sectional and age-period-cohort forecast study. Cancer Epidemiol 2024;91:102583. [Crossref] [PubMed]
- Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Majernikova SM. Risk and safety profile in checkpoint inhibitors on non-small-cel lung cancer: A systematic review. Hum Vaccin Immunother 2024;20:2365771. [Crossref] [PubMed]
- Detterbeck FC, Boffa DJ, Kim AW, Tanoue LT. The Eighth Edition Lung Cancer Stage Classification. Chest 2017;151:193-203.
- Ji D, Sun R, Wu Z. Effects of uniportal thoracoscopic pulmonary segmentectomy and lobectomy on patients with early-stage non-small-cell lung cancer and risk factors of postoperative complications. Am J Transl Res 2023;15:4369-79.
- Lu Y, Ma T, Wang L, et al. Advances in Lymph Node Metastasis and Lymph Node Dissection in Early Non-small Cell Lung Cancer. Zhongguo Fei Ai Za Zhi 2019;22:520-5. [Crossref] [PubMed]
- Xue M, Liu J, Li Z, Lu M, Zhang H, Liu W, Tian H. The role of adenocarcinoma subtypes and immunohistochemistry in predicting lymph node metastasis in early invasive lung adenocarcinoma. BMC Cancer 2024;24:139. [Crossref] [PubMed]
- Wang T, Ma S, Yan T, et al. Clinical Study of Surgical Treatment of Non-small Cell Lung Cancer 10 mm or Less in Diameter Under Video-assisted Thoracoscopy. Zhongguo Fei Ai Za Zhi 2016;19:216-9. [Crossref] [PubMed]
- Ma X, Xia L, Chen J, Wan W, Zhou W. Development and validation of a deep learning signature for predicting lymph node metastasis in lung adenocarcinoma: comparison with radiomics signature and clinical-semantic model. Eur Radiol 2023;33:1949-62. [Crossref] [PubMed]
- Shimada Y, Kudo Y, Maehara S, Fukuta K, Masuno R, Park J, Ikeda N. Artificial intelligence-based radiomics for the prediction of nodal metastasis in early-stage lung cancer. Sci Rep 2023;13:1028. [Crossref] [PubMed]
- Ai J, Gao H, Shi G, Lan Y, Hu S, Wang Z, Liu L, Wei Y. A clinical nomogram for predicting occult lymph node metastasis in patients with non-small-cell lung cancer ≤2 cm. Interdiscip Cardiovasc Thorac Surg 2024;39:ivae098. [Crossref] [PubMed]
- Yuan L, Guo T, Hu C, Yang W, Tang X, Cheng H, Xiang Y, Qu X, Liu H, Qin X, Qin L, Liu C. Clinical characteristics and gene mutation profiles of chronic obstructive pulmonary disease in non-small cell lung cancer. Front Oncol 2022;12:946881. [Crossref] [PubMed]
- Wang X, Claggett BL, Tian L, Malachias MVB, Pfeffer MA, Wei LJ. Quantifying and Interpreting the Prediction Accuracy of Models for the Time of a Cardiovascular Event-Moving Beyond C Statistic: A Review. JAMA Cardiol 2023;8:290-5. [Crossref] [PubMed]
- Shi H, Xu Z, Cheng G, Ji H, He L, Zhu J, Hu H, Xie Z, Ao W, Wang J. CT-based radiomic nomogram for predicting the severity of patients with COVID-19. Eur J Med Res 2022;27:13. [Crossref] [PubMed]
- Tong C, Miao Q, Zheng J, Wu J. A novel nomogram for predicting the decision to delayed extubation after thoracoscopic lung cancer surgery. Ann Med 2023;55:800-7. [Crossref] [PubMed]
- Jianlong B, Pinyi Z, Xiaohong W, Su Z, Sainan P, Jinfeng N, Shidong X. Risk factors for lymph node metastasis and surgical scope in patients with cN0 non-small cell lung cancer: a single-center study in China. J Cardiothorac Surg 2021;16:304. [Crossref] [PubMed]
- Wang Y, Jing L, Wang G. Risk factors for lymph node metastasis and surgical methods in patients with early-stage peripheral lung adenocarcinoma presenting as ground glass opacity. J Cardiothorac Surg 2020;15:121. [Crossref] [PubMed]
- Xue X, Zang X, Liu Y, Lin D, Jiang T, Gao J, Wu C, Ma X, Deng H, Yu Z, Pan L, Xue Z. Independent risk factors for lymph node metastasis in 2623 patients with Non-Small cell lung cancer. Surg Oncol 2020;34:256-60. [Crossref] [PubMed]
- Zhang W, Mu G, Huang J, Bian C, Wang H, Gu Y, Xia Y, Chen L, Yuan M, Wang J. Lymph node metastasis and its risk factors in T1 lung adenocarcinoma. Thorac Cancer 2023;14:2993-3000. [Crossref] [PubMed]
- Moulla Y, Gradistanac T, Wittekind C, Eichfeld U, Gockel I, Dietrich A. Predictive risk factors for lymph node metastasis in patients with resected non-small cell lung cancer: a case control study. J Cardiothorac Surg 2019;14:11. [Crossref] [PubMed]
- Park SY, Ardura MI, Zhang SX. Diagnostic limitations and challenges in current clinical guidelines and potential application of metagenomic sequencing to manage pulmonary invasive fungal infections in patients with haematological malignancies. Clin Microbiol Infect 2024;30:1139-46. [Crossref] [PubMed]
- Berezowska S, Keyter M, Bouchaab H, Weissferdt A. Pathology of Surgically Resected Lung Cancers Following Neoadjuvant Therapy. Adv Anat Pathol 2024;31:324-32. [Crossref] [PubMed]
- Chen L, Chen B, Zhao Z, Shen L. Using artificial intelligence based imaging to predict lymph node metastasis in non-small cell lung cancer: a systematic review and meta-analysis. Quant Imaging Med Surg 2024;14:7496-512. [Crossref] [PubMed]
- Xie Z, Yang Y, Niu Z, Mao G, Zhu X, Xu Z, Yang D, Wang H, Wang J. Preoperative computed tomography semantic features in predicting lymph node metastasis of part-solid nodules in non-small cell lung cancer: a multicenter retrospective study. Quant Imaging Med Surg 2024;14:5151-63. [Crossref] [PubMed]

