A novel computed tomography-based multi-parameter decision tree algorithm model for preoperatively predicting the risk of lymph node metastasis in surgically resectable synchronous multiple primary lung cancer
Original Article

A novel computed tomography-based multi-parameter decision tree algorithm model for preoperatively predicting the risk of lymph node metastasis in surgically resectable synchronous multiple primary lung cancer

Wenbiao Zhang1#, Huiyun Ma1#, Ying Zhu2#, Wenjing Gou3, Baocong Liu1, Qiong Li1, Shuangjiang Li4

1Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, China; 2Department of Radiology, the First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China; 3Department of Radiology, Sichuan Provincial People’s Hospital, Sichuan Academy of Medical Sciences, Chengdu, China; 4Department of Endoscopy and Laser, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, China

Contributions: (I) Conception and design: S Li, Q Li; (II) Administrative support: S Li, Q Li; (III) Provision of study materials or patients: W Zhang, H Ma, Y Zhu, W Gou, B Liu, Q Li; (IV) Collection and assembly of data: W Zhang, H Ma, Y Zhu, W Gou, B Liu, Q Li; (V) Data analysis and interpretation: W Zhang, S Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Shuangjiang Li, MD. Department of Endoscopy and Laser, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, 651 Dongfeng Road East, Guangzhou 510060, China. Email: lisj@sysucc.org.cn; Qiong Li, MD. Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, 651 Dongfeng Road East, Guangzhou 510060, China. Email: liqiong@sysucc.org.cn.

Background: Chest thin-section computed tomography (TS-CT) has the potential to provide evidence for the prediction of lymph node metastasis (LNM) in synchronous multiple primary lung cancer (SMPLC). The present study aims to develop and validate a new CT-based multi-parametric decision tree algorithm (CT-DTA) model capable of accurate risk evaluation for LNM in SMPLC preoperatively.

Methods: A total of 235 patients with surgically resected SMPLC from Sun Yat-Sen University Cancer Center (SYSUCC), the First Affiliated Hospital of Sun Yat-Sen University (FAH-SYSU) and Sichuan Provincial People’s Hospital (SPPH) were finally included. We initially retrieved all the CT-derived quantitative signs in the training cohort (139 cases from SYSUCC) and selected those with statistical significance to build a DTA model. The discriminative power of CT-DTA model for the occurrence of LNM was further externally validated among the validation cohort (96 patients from FAH-SYSU and SPPH). In addition, the performance of CT-DTA model was also assessed across different subgroups of the entire cohort.

Results: Five key quantitative covariables measured on chest TS-CT constituted a CT-DTA model with seven leaf nodes, and long-axis diameter of the solid portion was the most dominant risk contributor of LNM. This CT-DTA model gained a satisfactory predictive accuracy, revealed by an area under the curve >0.80 in both the training cohort (0.905; P<0.001) and the validation cohort (0.812; P<0.001). Moreover, our CT-DTA model was also exhaustively demonstrated to perform as an independent predictor for risk stratification of LNM in both the training cohort (odds ratio: 12.01; P=0.003) and the validation cohort (odds ratio: 8.11; P=0.033). Its potent performance for risk prediction still remained stable across nearly all of the subgroups stratified by clinicopathological characteristics.

Conclusions: This CT-DTA model could serve as a noninvasive, user-friendly and practicable risk prediction tool to aid treatment decision-making in surgically resectable SMPLC.

Keywords: Synchronous multiple primary lung cancer (SMPLC); chest computed tomography (chest CT); decision tree algorithm (DTA); lymph node metastasis (LNM)


Submitted Nov 04, 2024. Accepted for publication Mar 14, 2025. Published online May 21, 2025.

doi: 10.21037/qims-24-2440


Introduction

Rationale

Non-small cell lung cancer (NSCLC) is one of the most prevalent malignancies and the worldwide leading cause of cancer-related deaths (1). With advances in the multi-slice spiral computed tomography (CT) and other medical imaging techniques, multiple primary lung cancer (MPLC), which refers to the primary lung cancer in which ≥2 lesions occur simultaneously or successively in different locations of the lung within the same individual, has been increasingly diagnosed over decades (2). Thoracic surgeons have recently paid attention to the benefits from surgical resection for synchronous MPLC (SMPLC), as well as to minimize the injuries from over-surgery. Accumulative evidence has revealed that lymph node metastasis (LNM) predominates in the principal reasons for treatment failure and unfavorable prognosis following lung cancer surgery (3). Moreover, the scope of lymphadenectomy usually depends on the probability of LNM (4). Therefore, with the aim to improve surgical outcome of SMPLC, it is crucial to achieve precise risk prediction of LNM preoperatively, since we cannot only avoid aggressive lymph node dissection in the low-risk patients but also optimize postoperative care and adjuvant therapy purposefully in the high-risk patients.

As the most common imaging test in daily practice, chest CT has also been widely demonstrated to have the potential to offer a series of qualitative and quantitative features which are of clinical significance for risk evaluation of worse prognosis in NSCLC (5,6). However, to our knowledge, there is no investigation yet on the key indicators derived from chest CT for the occurrence of LNM in SMPLC. Accordingly, an easy-to-use and practicable CT-based multi-parameter scoring system based on accurate risk stratification of LNM in surgically resectable SMPLC will be particularly valuable for decision-making.

Objectives

Decision tree algorithm (DTA) is a non-parametric supervised learning method with the goals to create a model that predicts the value of a target endpoint by learning simple decision rules inferred from the data features (7,8). It has been reported that DTA models can provide user-friendly information which are competent to aid treatment decision-making in clinical settings (7,8). Thus, the purpose of this multi-center study was to develop a novel DTA model based on chest CT imaging features of pulmonary nodules for precise risk evaluation of LNM in patients who intended to undergo surgical treatment for SMPLC. We presented this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2440/rc).


Methods

Study design, study protocol and settings

This retrospective cohort study was performed on the independent datasets of surgical patients with SMPLC prospectively collected from Sun Yat-Sen University Cancer Center (SYSUCC), the First Affiliated Hospital of Sun Yat-Sen University (FAH-SYSU) and Sichuan Provincial People’s Hospital (SPPH) between December 2011 and June 2020.

This study was approved by the Institutional Review Board of Sun Yat-sen University Cancer Center (No. B2022-293-Y01). All participating institutions were informed and agreed to the study. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The participants were required to give informed consent before taking part.

Participants

Recruitment and study groups

We initially reviewed the clinicopathological characteristics and CT imaging records of 317 consecutive patients pathologically diagnosed with SMPLC following surgery at SYSUCC, FAH-SYSU and SPPH. Finally, there were 235 of these patients considered to fit with eligibility criteria, as shown in Figure 1. We determined to classify the 139 patients from SYSUCC into the training cohort and the remaining 96 patients from FAH-SYSU and SPPH into the validation cohort, respectively. A CT-based multi-parameter DTA model (CT-DTA) was built up based on the training cohort, and its predictive capacity for risk of LNM was further evaluated in the external validation cohort.

Figure 1 Flowchart of the study design and patient enrollment. CART, classification and regression tree; CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, lymph node metastasis; ROC, receiver operating characteristic; SMPLC, synchronous multiple primary lung cancer.

Eligibility criteria

The following criteria were utilized to judge the candidate’s suitability for inclusion or exclusion:

  • Patients who were postoperatively and pathologically confirmed with SMPLC according to the 2013 American College of Chest Physicians (ACCP) criteria and the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control (UICC) tumor-node-metastasis (TNM) classification system (8th edition) were included (9,10). Patients whose complete details of pathological reports were unavailable would not be considered in order to guarantee the accuracy and objectivity of the data analyzed;
  • Curative-intent anatomical or wedge resections of the lung with systemic mediastinal lymph node dissection were eligible;
  • The clinicopathological and chest thin-section CT (TS-CT, slice thickness ≤1 mm) imaging details must be completely obtained within 3 months before surgery;
  • Patients who had received neoadjuvant therapy would be excluded to avoid potential confounding effects on metastatic cells within mediastinal or hilar lymph nodes;
  • Patients with concurrent malignancies would be excluded to avoid potential selection bias caused by mediastinal LNM originated from additional primary tumors out of the lungs.

Measurement and definitions of outcome data

Clinicopathological characteristics

We recorded the following clinicopathological characteristics in compliance with the joint standardization of variable definitions and terminology from the Society of Thoracic Surgeons and the European Society of Thoracic Surgeons (11):

  • Baseline information estimated: age, gender, and smoking status;
  • Preoperative comorbidities estimated: respiratory comorbidities (chronic obstructive pulmonary disease, emphysema, tuberculosis, pneumonia, asthma, bronchiectasis and interstitial lung diseases), cardiovascular comorbidities (hypertension, coronary artery disease, valvular heart disease, cerebrovascular disease, cardiac arrhythmias, chronic heart failure and peripheral artery disease) and diabetes mellitus;
  • Pathological features estimated: pleural invasion, lympho-vascular invasion, T-stage, N-stage and TNM stage. All of above results were estimated by our experienced pathologists according to the ACCP 2013 and the 8th edition AJCC/UICC criteria (9,10).

Chest TS-CT imaging parameters

Radiological assessment based on chest TS-CT images for each included patient, with both lung window (width: 1,500 HU; level: −450 HU) and mediastinal window (width: 350 HU; level: 40 HU) settings, was independently executed by two radiologists with work experience exceeding 10 years in a blind manner. Besides, in case of disagreement between the two primary radiologists, another radiologist with a 20-year experience would be invited to adjudicate a final decision.

The following TS-CT imaging features regarding each nodule of SMPLC were cautiously measured and recorded, including the type of nodule (12), long-axis diameter of the lesion, long-axis diameter of solid portion in the lesion, consolidation tumor ratio (CTR), spiculation, lobulation, pleural indentation, bubble-like vacuole and air bronchogram. And then, we combined the results regarding the same kind of TS-CT sign of all existing nodules as one covariable incorporated into CT-DTA model (Table S1). Qualitative and quantitative evaluation criteria of all the above parameters are detailed in Table S1 and described in our previous studies (7,13). Representative TS-CT images are displayed in Figure S1.

Outcome of interest

The primary endpoint in our study was the pathologically diagnosed LNM, which refers to cancer cells spread from SMPLC to mediastinal and/or hilar lymph nodes (11). The occurrence of LNM would be judged by an inter-institutional telepathology consultation if necessary.

Cutoffs of imaging parameters

The maximum value of Youden index (sensitivity + specificity −1) was employed as the optimal cut-point for continuous variable of CTR, long-axis diameter of the maximal lesion and long-axis diameter of solid portion in the maximal lesion respectively with respect to risk prediction of LNM in the training cohort of patients. The threshold values of these imaging parameters were further determined to be the grouping criteria corresponding to each lesion of SMPLC (Table S1).

Statistical analysis

Statistical differences between groups

We used Pearson’s Chi-squared test, Yates’ correction test or Fisher exact test to compare categorical variables and Mann-Whitney U test to compare continuous data [mean ± standard deviation (SD); median and interquartile range (IQR)], respectively. Statistical significance was suggested by P<0.050 in a two-sided test.

Predictive factor analysis

The correlations between all the evaluated characteristics and risk of LNM were initially investigated through univariable logistic regression analysis. Thereafter, all the clinicopathological and imaging covariables with P<0.20 were included in multivariable logistic regression models, and then, odds ratios (ORs) with 95% confidence intervals (CIs) were generated to determine which factors could play significantly predictive roles for the development of LNM. Ultimately, we utilized the Hosmer-Lemeshow test to measure the goodness-of-fit of each multivariable logistic regression model.

All the above statistical methods were accomplished using the IBM SPSS 27.0 software (IBM SPSS Statistics, Version 22.0., IBM Corp, Armonk, NY, USA).

Establishment and validation of DTA model

A DTA model was comprised of both qualitative and quantitative parameters of TS-CT signs based on the training cohort, all of which were found to show statistical significance in both group comparisons and logistic regression analyses. The eligible imaging parameters characterized as categorical or optimal cutoff values were input to establish a DTA model whose tree-growing methodology was set as follows: classification and regression tree (CART) algorithm; ‘Gini’ impurity criterion; the minimum number of samples required to split an internal node as 5; the minimum number of samples required to locate at a leaf node as 2; the maximum depth of the tree as 30. This decision-making process would be terminated when there was no substantial contribution in the next cycle of splitting (7).

The predictive performance of this CT-DTA model with regard to the risk of LNM was further externally validated based on the validation cohort. We conducted the receiver operating characteristic (ROC) analysis to estimate the capacity of our CT-DTA model and main clinicopathological characteristics to discriminate between the incidences of LNM. In addition, their areas under the curve (AUCs) would be inferred and further compared using DeLong test. A risk prediction model with an AUC >0.80 and P<0.001 would be considered to be clinically useful (14). In addition, we plotted calibration curves with bootstrap repetitions to reveal the consistency between the predicted probability by applying CT-DTA model and the real-world probability of LNM. Finally, we used the Shapley Additive Explanation (SHAP) method to explain covariable importance to the occurrence of LNM (15).

All the above statistical methods involved in CT-DTA modeling and validation were accomplished using the JMP Pro 16.0 software (SAS Institute, Cary, NC, USA) and R Studio 4.2.3 (R Foundation for Statistical Computing, Vienna, Austria).

Subgroup analyses

In addition, the efficiency of CT-DTA model was also assessed across all of the subgroups stratified by clinicopathological characteristics. A ROC analysis on each subgroup of the entire cohort was employed to measure the predictive accuracy of CT-DTA model for the emergence of LNM. The predictive independence of CT-DTA model in each subgroup of the entire cohort was further confirmed by a multivariable logistic regression analysis.


Results

Clinicopathological and imaging characteristics

The clinicopathological parameters of all the included patients are listed in Table 1. The majority of SMPLC contained two nodules (84.3%) and developed within the ipsilateral lobes (66.0%). Adenocarcinoma was the most common histological subtype of all the lesions in SMPLC (92.3%), 60% of which were identified as T1-stage tumors, and there were 29 (12.3%) of the patients diagnosed with LNM (N1–2-stage). No significant difference was found in the incidence of LNM between training cohort and the validation cohort.

Table 1

Estimated characteristics between training cohort and validation cohort

Estimated characteristics Entire cohort (n=235) Training cohort (n=139) Validation cohort (n=96) P value
Clinicopathological parameters
   Age (years)
    Mean ± SD 61.5±10.0 62.7±9.9 59.9±10.0 0.051
    Median [IQR] 63 [55–68] 65 [56–69] 61 [52–68]
   Gender 0.35
    Female 131 (55.7) 74 (53.2) 57 (59.4)
    Male 104 (44.3) 65 (46.8) 39 (40.6)
   Smoking status 0.73
    Never 166 (70.6) 97 (69.8) 69 (71.9)
    Current/former 69 (29.4) 42 (30.2) 27 (28.1)
   Respiratory comorbidity 0.45
    Absent 222 (94.5) 130 (93.5) 92 (95.8)
    Present 13 (5.5) 9 (6.5) 4 (4.2)
   Cardio-cerebrovascular comorbidity 0.074
    Absent 193 (82.1) 109 (78.4) 84 (87.5)
    Present 42 (17.9) 30 (21.6) 12 (12.5)
   Diabetes mellitus 0.069
    Absent 217 (92.3) 132 (95.0) 85 (88.5)
    Present 18 (7.7) 7 (5.0) 11 (11.5)
   Number of the lesions 0.75
    2 198 (84.3) 118 (84.9) 80 (83.3)
    ≥3 37 (15.7) 21 (15.1) 16 (16.7)
   Location of the lesions <0.001
    Ipsilateral side: right & same lobe 24 (10.2) 5 (3.6) 19 (19.8)
    Ipsilateral side: left & same lobe 10 (4.3) 3 (2.2) 7 (7.3)
    Ipsilateral side: right & different lobes 87 (37.0) 56 (40.3) 31 (32.3)
    Ipsilateral side: left & different lobes 34 (14.5) 23 (16.5) 11 (11.5)
    Contralateral side 80 (34.0) 52 (37.4) 28 (29.2)
   Surgical procedure 0.068
    Sub-lobar resections 55 (23.4) 34 (24.5) 21 (21.9)
    Lobectomy 36 (15.3) 15 (10.8) 21 (21.9)
    Lobectomy + sub-lobar resection 125 (53.2) 83 (59.7) 42 (43.8)
    Bi-lobectomy & pneumonectomy 19 (8.1) 7 (5.0) 12 (12.5)
   Histology 0.27
    AC + AC 217 (92.3) 126 (90.6) 91 (94.8)
    AC + SCC 14 (6.0) 11 (7.9) 3 (3.1)
    SCC + SCC 4 (1.7) 2 (1.4) 2 (2.1)
   Pleural invasion 0.79
    Absent 179 (76.2) 105 (75.5) 74 (77.1)
    Present 56 (23.8) 34 (24.5) 22 (22.9)
   Lymphovascular invasion 0.56
    Absent 209 (88.9) 125 (89.9) 84 (87.5)
    Present 26 (11.1) 14 (10.1) 12 (12.5)
   T stage of the maximal lesion <0.001
    Tis–1 141 (60.0) 81 (58.3) 60 (62.5)
    T2 67 (28.5) 52 (37.4) 15 (15.6)
    T3–4 27 (11.5) 6 (4.3) 21 (21.9)
   T stages of the lesions 0.67
    Tis–1 + Tis–1 141 (60.0) 81 (58.3) 60 (62.5)
    Tis–1 + T2–4 81 (34.5) 51 (36.7) 30 (31.3)
    T2–4 + T2–4 13 (5.5) 7 (5.0) 6 (6.3)
   Lymph node metastasis 0.64
    No (N0) 206 (87.7) 123 (88.5) 83 (86.5)
    Yes (N1–2) 29 (12.3) 16 (11.5) 13 (13.5)
   TNM stage 0.29
    0–I 182 (77.4) 111 (79.9) 71 (74.0)
    II–IV 53 (22.6) 28 (20.1) 25 (26.0)
Imaging parameters on chest computed tomography
   Type of nodule 0.022
    Pure GGN + pure GGN 25 (10.6) 12 (8.6) 13 (13.5)
    Pure GGN + GGO-predominant nodule 13 (5.5) 9 (6.5) 4 (4.2)
    Pure GGN + solid-predominant nodule 37 (15.7) 16 (11.5) 21 (21.9)
    Pure GGN + pure solid nodule 20 (8.5) 10 (7.2) 10 (10.4)
    GGO-predominant nodule + GGO-predominant nodule 2 (0.9) 2 (1.4) 0
    GGO-predominant nodule + solid-predominant nodule 17 (7.2) 15 (10.8) 2 (2.1)
    GGO-predominant nodule + pure solid nodule 10 (4.3) 8 (5.8) 2 (2.1)
    Solid-predominant nodule + solid-predominant nodule 26 (11.1) 13 (9.4) 13 (13.5)
    Solid-predominant nodule + pure solid nodule 48 (20.4) 29 (20.9) 19 (19.8)
    Pure solid nodule + pure solid nodule 37 (15.7) 25 (18.0) 12 (12.5)
   Consolidation tumor ratio 0.043
    All lesions <0.90 107 (45.5) 55 (39.6) 52 (54.2)
    1 lesion ≥0.90 84 (35.7) 52 (37.4) 32 (33.3)
    ≥2 lesions ≥0.90 44 (18.7) 32 (23.0) 12 (12.5)
   Presence of spiculation <0.001
    Absent 133 (56.6) 79 (56.8) 54 (56.3)
    1 lesion present 56 (23.8) 44 (31.7) 12 (12.5)
    ≥2 lesions present 46 (19.6) 16 (11.5) 30 (31.3)
   Presence of lobulation 0.87
    Absent 14 (6.0) 9 (6.5) 5 (5.2)
    1 lesion present 70 (29.8) 40 (28.8) 30 (31.3)
    ≥2 lesions present 151 (64.3) 90 (64.7) 61 (63.5)
   Presence of bubble-like vacuole 0.013
    Absent 144 (61.3) 95 (68.3) 49 (51.0)
    1 lesion present 74 (31.5) 38 (27.3) 36 (37.5)
    ≥2 lesions present 17 (7.2) 6 (4.3) 11 (11.5)
   Presence of air bronchogram 0.018
    Absent 121 (51.5) 63 (45.3) 58 (60.4)
    Normally present 69 (29.4) 43 (30.9) 26 (27.1)
    1 lesion pathologically present 40 (17.0) 28 (20.1) 12 (12.5)
    ≥2 lesions pathologically present 5 (2.1) 5 (3.6) 0
   Presence of pleural indentation 0.030
    Absent 73 (31.1) 41 (29.5) 32 (33.3)
    1 lesion present 117 (49.8) 78 (56.1) 39 (40.6)
    ≥2 lesions present 45 (19.1) 20 (14.4) 25 (26.0)
   Long-axis diameter of the maximal lesion (mm) 0.37
    Mean ± SD 25.5±13.7 26.0±14.0 24.8±13.3
    Median [IQR] 23 [16–31] 24 [17–31] 22 [15–31]
   Long-axis diameter of solid portion in the maximal lesion (mm)
    Mean ± SD 19.0±16.9 20.8±17.0 16.4±16.6 0.020
    Median [IQR] 17 [5–28] 19 [9–29] 13 [4–26]

Data were presented as n (%) if not otherwise specified. AC, adenocarcinoma; GGN, ground-glass nodule; GGO, ground-glass opacity; IQR, interquartile range; SCC, squamous cell carcinoma; SD, standard deviation.

The details of imaging parameters are listed in Table 1. On the one hand, the majority of the lesions in SMPLC were found to be without any TS-CT sign about spiculation (56.8%), bubble-like vacuole (68.3%) and abnormal air bronchogram (76.2%). On the other hand, TS-CT signs about pleural indentation (70.5%) and lobulation (93.5%) were both more frequently present in ≥1 lesion of SMPLC. The mean long-axis diameters of the maximal lesion and solid portion in the maximal lesion were 26.0±14.0 mm and 20.8±17.0 mm, respectively. Besides, demographic differences in TS-CT imaging features between the training cohort and the validation cohort were detailed in Table 1.

Derivation of CT-DTA model

Prediction of LNM by TS-CT imaging parameters

With respect to occurrence of LNM, the optimal cutoff points of CTR and long-axis diameters of the maximal lesion with its solid portion suggested by the maximum Youden indices were 0.90, 30 mm and 27 mm, respectively (Table S1). As exhibited in Table 2, we found significant differences in the type of nodule (P=0.021), CTR (P<0.001), presence of spiculation (P<0.001) and lobulation (P=0.012), and long-axis diameters of the lesion (P<0.001) and the solid portion (P<0.001) between patients with and without LNM in the training cohort. Moreover, in the univariable logistic regression analysis based on the training cohort, the number of pure solid nodule (PSN; P=0.001), CTR (P=0.001), presence of spiculation (P<0.001) and lobulation (P=0.039), and long-axis diameters of the lesion (P=0.011) and the solid portion (P=0.001) were initially found to be significantly associated with an increased risk of LNM (Table 3). After adjustment by all the covariable estimates holding P<0.20, a multivariable logistic regression analysis demonstrated that none of the above six imaging parameters could be independently predictive of LNM in the training cohort of patients, as detailed in Table 3 (model A).

Table 2

Differences in estimated characteristics between patients with and without lymph node metastasis in the training cohort

Estimated characteristics Training cohort (n=139) Lymph node metastasis P value
No (N0: n=123) Yes (N1–2: n=16)
Clinicopathological parameters
   Age (years)
    Mean ± SD 62.7±9.9 62.9±10.2 61.1±7.7 0.49
    Median [IQR] 65 [56–69] 65 [56–69] 63 [57–66]
   Gender 0.18
    Female 74 (53.2) 68 (55.3) 6 (37.5)
    Male 65 (46.8) 55 (44.7) 10 (62.5)
   Smoking status 0.034
    Never 97 (69.8) 90 (73.2) 7 (43.8)
    Current/former 42 (30.2) 33 (26.8) 9 (56.3)
   Respiratory comorbidity 1.0
    Absent 130 (93.5) 115 (93.5) 15 (93.8)
    Present 9 (6.5) 8 (6.5) 1 (6.3)
   Cardio-cerebrovascular comorbidity 0.21
    Absent 109 (78.4) 94 (76.4) 15 (93.8)
    Present 30 (21.6) 29 (23.6) 1 (6.3)
   Diabetes mellitus 0.71
    Absent 132 (95.0) 116 (94.3) 16 (100)
    Present 7 (5.0) 7 (5.7) 0
   Number of the lesions 0.95
    2 118 (84.9) 105 (85.4) 13 (81.3)
    ≥3 21 (15.1) 18 (14.6) 3 (18.8)
   Location of the lesions 0.68
    Ipsilateral side: right & same lobe 5 (3.6) 5 (4.1) 0
    Ipsilateral side: left & same lobe 3 (2.2) 2 (1.6) 1 (6.3)
    Ipsilateral side: right & different lobes 56 (40.3) 50 (40.7) 6 (37.5)
    Ipsilateral side: left & different lobes 23 (16.5) 20 (16.3) 3 (18.8)
    Contralateral side 52 (37.4) 46 (37.4) 6 (37.5)
   Surgical procedure 0.45
    Sub-lobar resections 34 (24.5) 30 (24.4) 4 (25.0)
    Lobectomy 15 (10.8) 12 (9.8) 3 (18.8)
    Lobectomy + sub-lobar resection 83 (59.7) 74 (60.2) 9 (56.3)
    Bi-lobectomy & pneumonectomy 7 (5.0) 7 (5.7) 0
   Histology 0.63
    AC + AC 126 (90.6) 112 (91.1) 14 (87.5)
    AC + SCC 11 (7.9) 9 (7.3) 2 (12.5)
    SCC + SCC 2 (1.4) 2 (1.6) 0
   Pleural invasion 0.027
    Absent 105 (75.5) 97 (78.9) 8 (50.0)
    Present 34 (24.5) 26 (21.1) 8 (50.0)
   Lymphovascular invasion <0.001
    Absent 125 (89.9) 116 (94.3) 9 (56.3)
    Present 14 (10.1) 7 (5.7) 7 (43.8)
   T stage of the maximal lesion <0.001
    Tis–1 81 (58.3) 79 (64.2) 2 (12.5)
    T2 52 (37.4) 38 (30.9) 14 (87.5)
    T3–4 6 (4.3) 6 (4.9) 0
   T stages of the lesions <0.001
    Tis–1 + Tis–1 81 (58.3) 79 (64.2) 2 (12.5)
    Tis–1 + T2–4 51 (36.7) 41 (33.3) 10 (62.5)
    T2–4 + T2–4 7 (5.0) 3 (2.4) 4 (25.0)
Imaging parameters on chest computed tomography
   Type of nodule 0.021
    Pure GGN + pure GGN 12 (8.6) 12 (9.8) 0
    Pure GGN + GGO-predominant nodule 9 (6.5) 9 (7.3) 0
    Pure GGN + solid-predominant nodule 16 (11.5) 16 (13.0) 0
    Pure GGN + pure solid nodule 10 (7.2) 9 (7.3) 1 (6.3)
    GGO-predominant nodule + GGO-predominant nodule 2 (1.4) 2 (1.6) 0
    GGO-predominant nodule + solid-predominant nodule 15 (10.8) 15 (12.2) 0
    GGO-predominant nodule + pure solid nodule 8 (5.8) 6 (4.9) 2 (12.5)
    Solid-predominant nodule + solid-predominant nodule 13 (9.4) 12 (9.8) 1 (6.3)
    Solid-predominant nodule + pure solid nodule 29 (20.9) 23 (18.7) 6 (37.5)
    Pure solid nodule + pure solid nodule 25 (18.0) 19 (15.4) 6 (37.5)
   Consolidation tumor ratio <0.001
    All lesions <0.90 55 (39.6) 55 (44.7) 0
    1 lesion ≥0.90 52 (37.4) 44 (35.8) 8 (50.0)
    ≥2 lesions ≥0.90 32 (23.0) 24 (19.5) 8 (50.0)
   Presence of spiculation <0.001
    Absent 79 (56.8) 79 (64.2) 0
    1 lesion present 44 (31.7) 32 (26.0) 12 (75.0)
    ≥2 lesions present 16 (11.5) 12 (9.8) 4 (25.0)
   Presence of lobulation 0.012
    Absent 9 (6.5) 9 (7.3) 0
    1 lesion present 40 (28.8) 39 (31.7) 1 (6.3)
    ≥2 lesions present 90 (64.7) 75 (61.0) 15 (93.8)
   Presence of bubble-like vacuole 0.46
    Absent 95 (68.3) 84 (68.3) 11 (68.8)
    1 lesion present 38 (27.3) 33 (26.8) 5 (31.3)
    ≥2 lesions present 6 (4.3) 6 (4.9) 0
   Presence of air bronchogram 0.083
    Absent 63 (45.3) 53 (43.1) 10 (62.5)
    Normally present 43 (30.9) 42 (34.1) 1 (6.3)
    1 lesion pathologically present 28 (20.1) 24 (19.5) 4 (25.0)
    ≥2 lesions pathologically present 5 (3.6) 4 (3.3) 1 (6.3)
   Presence of pleural indentation 0.97
    Absent 41 (29.5) 36 (29.3) 5 (31.3)
    1 lesion present 78 (56.1) 69 (56.1) 9 (56.3)
    ≥2 lesions present 20 (14.4) 18 (14.6) 2 (12.5)
   Long-axis diameter of the maximal lesion (mm) <0.001
    Mean ± SD 26.0±14.0 24.8±13.7 35.1±12.8
    Median [IQR] 24 [17–31] 23 [16–29] 33 [27–40]
   Long-axis diameter of solid portion in the maximal lesion (mm)
    Mean ± SD 20.8±17.0 18.9±16.6 35.1±12.8 <0.001
    Median [IQR] 19 [9–29] 17 [7–26] 33 [27–40]

Data were presented as n (%) if not otherwise specified. AC, adenocarcinoma; GGN, ground-glass nodule; GGO, ground-glass opacity; IQR, interquartile range; SCC, squamous cell carcinoma; SD, standard deviation.

Table 3

Univariable and multivariable logistic regression analysis of the predictive factors for lymph node metastasis in patients of the training cohort

Estimated characteristics Univariable analysis Multivariable analysis Multivariable analysis
OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value
Age (years) (per 1 year increased) 0.982 (0.932–1.034) 0.49
Gender (male vs. female) 2.06 (0.71–6.02) 0.19 1.78 (0.10–31.05) 0.69 2.36 (0.13–44.21) 0.57
Smoking status (current/former vs. never) 3.51 (1.21–10.17) 0.021 1.13 (0.078–16.32) 0.93 0.82 (0.043–15.33) 0.89
Preoperative comorbidity (present vs. absent) 0.60 (0.18–1.97) 0.40
Number of the lesions (≥3 vs. 2) 1.35 (0.35–5.20) 0.67
Location of the lesions (contralateral vs. ipsilateral) 1.00 (0.34–2.95) 0.99
Surgical procedure (≥1 lobectomy vs. lobectomy vs. sub-lobar resections) 0.87 (0.49–1.58) 0.69
Histology (AC + AC vs. AC + SCC vs. SCC + SCC) 1.16 (0.29–4.59) 0.84
Pleural invasion (present vs. absent) 3.73 (1.28–10.89) 0.016 1.25 (0.25–6.19) 0.79 1.18 (0.25–5.72) 0.83
Lymphovascular invasion (present vs. absent) 12.89 (3.70–44.90) <0.001 7.04 (1.26–39.19) 0.026 7.70 (1.35–43.87) 0.021
T stage of the maximal lesion (T3–4vs. T2vs. Tis–1) 3.51 (1.48–8.33) 0.004 0.22 (0.013–3.50) 0.28 0.25 (0.017–3.67) 0.31
T stages of the lesions (T2–4 + T2–4vs. Tis–1 + T2–4vs. Tis–1 + Tis–1) 7.42 (2.75–20.02) <0.001 12.67 (1.13–142.14) 0.040 13.08 (1.45–118.37) 0.022
Pure solid nodules (≥2 lesions vs. 1 lesion vs. absent) 3.26 (1.58–6.73) 0.001 1.43 (0.22–9.13) 0.71
CTR (≥2 lesions ≥0.90 vs. 1 lesion ≥0.90 vs. all lesions <0.90) 3.91 (1.75–8.73) 0.001 0.43 (0.041–4.46) 0.48
Presence of spiculation (≥2 lesions vs. 1 lesion vs. absent) 4.47 (2.08–9.62) <0.001 1.73 (0.45–6.71) 0.43
Presence of lobulation (≥2 lesions vs. 1 lesion vs. absent) 8.26 (1.12–61.14) 0.039 4.77 (0.29–77.61) 0.27 2.31 (0.18–30.11) 0.52
Presence of bubble-like vacuole (≥2 lesions vs. 1 lesion vs. absent) 0.84 (0.32–2.22) 0.72
Presence of air bronchogram (pathologically vs. normally vs. absent) 0.90 (0.49–1.66) 0.73
Presence of pleural indentation (≥2 lesions vs. 1 lesion vs. absent) 0.91 (0.40–2.04) 0.81
Long-axis diameter of the maximal lesion (mm) (per 1 mm increased) 1.041 (1.009–1.074) 0.011 Insufficient data 0.99
Long-axis diameter of solid portion in the maximal lesion (per 1 mm increased) 1.048 (1.018–1.078) 0.001 Insufficient data 0.99
CT-based multi-parameter decision tree algorithm model (per split proceed) 2.21 (1.59–3.08) <0.001 2.09 (1.29–3.37) 0.003

, the multivariable binary logistic regression model (model A) was established on the original parameters estimated on chest CT images and other clinicopathological characteristics with P<0.20 in the univariable analysis (Hosmer-Lemeshow test P=0.79); , the multivariable binary logistic regression model (model B) was established on the novel CT-based multi-parameter decision tree algorithm model and other clinicopathological characteristics with P<0.20 in the univariable analysis (Hosmer-Lemeshow test P=0.88). AC, adenocarcinoma; CI, confidence interval; CT, computed tomography; CTR, consolidation tumor ratio; OR, odds ratio; SCC, squamous cell carcinoma; SMPLC, synchronous multiple primary lung cancer.

Construction of CT-DTA model

We incorporated categorical data of all the above six imaging parameters showing univariable P<0.050 to train a DTA model. As illustrated in Figure 2, a DTA model consisting of presence of spiculation, long-axis diameters of the lesions and solid portion in the lesions, CTR, and number of PSN, which had been named as the CT-DTA model, was finally generated from the training cohort of patients.

Figure 2 A CT-based multi-parametric decision tree algorithm model by combining presence of spiculation, consolidation tumor ratio, number of pure solid nodules, and long-axis diameters of the lesions and solid portion in the lesions together according to classification and regression tree algorithm. CT, computed tomography; CTR, consolidation tumor ratio; LNM, lymph node metastasis.

This CT-DTA model contains seven leaf nodes with a predicted probability of LNM ranged from 0.1% to 45.4%. Notably, the importance of CT-based covariables contributing to the risk of LNM as estimated by SHAP values was visualized in Figure 3. Long-axis diameters of the solid portions was considered as the most predominant risk factor of LNM, followed by the presence of spiculation, CTR, the number of PSN, and long-axis diameters of the lesions.

Figure 3 Feature importance estimated by SHAP values for contributing to the occurrence of LNM in SMPLC. (A) Feature importance matrix plot; (B) SHAP summary plot. LNM, lymph node metastasis; SHAP, Shapley Additive Explanation; SMPLC, synchronous multiple primary lung cancer.

Validation of CT-DTA model

Predictive performance of CT-DTA model

Figure 4 shows the AUCs of CT-DTA model for predicting the risk of LNM in both the training cohort and the validation cohort. This CT-DTA model was found to have an excellent predictive accuracy to distinguish the patients developed with LNM in the training cohort, with a clinically meaningful AUC of 0.905 (95% CI: 0.851–0.958; P<0.001). The discriminative power of CT-DTA model was further externally validated among patients of the validation cohort. We found that this CT-DTA model still played substantially predictive roles for the risk of LNM, as revealed by an AUC of 0.812 (95% CI: 0.699–0.926; P<0.001). There was no significant distinction in the predictive accuracy of CT-DTA model between the training cohort and the validation cohort (DeLong test P=0.14).

Figure 4 ROC analyses on the predictive accuracy of CT-DTA model indicated by AUC for the risk of LNM in both the training cohort and the validation cohort. AUC, area under the curve; CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, lymph node metastasis; ROC, receiver operating characteristic.

By plotting the calibration curves with Bootstrap repetitions in both the training cohort and the validation cohort, we also confirmed a significant agreement between the incidences predicted by CT-DTA model and the real-world incidences of LNM, as shown in Figure 5.

Figure 5 Calibration curves revealed a good consistency between the incidence predicted by CT-DTA model and the real-world incidence of LNM in both the training cohort and the validation cohort. CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, Lymph node metastasis.

Predictive significance of CT-DTA model

In the training cohort, we run a multivariable logistic regression analysis again by replacing the raw data of TS-CT signs with our CT-DTA model (model B in Table 3). Finally, we verified that CT-DTA model was considered as the leading risk factor for LNM (OR: 2.09; 95% CI: 1.29–3.37; P=0.003; Table 3).

The predictive independence of CT-DTA model was further externally validated by multivariable logistic regression analyses in the validation cohort. As detailed in Table 4, we found none of the imaging characteristics showed statistical significance to be correlated with the occurrence of LNM in the validation cohort of patients (model A). A DTA model by sufficiently incorporating these variables on CT images could play as a grading system to predict the risk of LNM independently. When incorporating our CT-DTA model, a new multivariable logistic regression analysis (model B in Table 4) demonstrated that this newly established CT-DTA model by sufficiently integrating the CT imaging variables could independently predict the development of LNM (OR: 1.53; 95% CI: 1.02–2.31; P=0.041).

Table 4

Univariable and multivariable logistic regression analysis of the predictive factors for lymph node metastasis in patients of the validation cohort

Estimated characteristics Univariable analysis Multivariable analysis Multivariable analysis
OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value
Age (years) (per 1 year increased) 1.015 (0.955–1.078) 0.64
Gender (male vs. female) 1.86 (0.57–6.03) 0.30
Smoking status (current/former vs. never) 0.74 (0.19–2.92) 0.66
Preoperative comorbidity (present vs. absent) 0.95 (0.24–3.77) 0.94
Number of the lesions (≥3 vs. 2) 1.62 (0.39–6.68) 0.51
Location of the lesions (contralateral vs. ipsilateral) 1.09 (0.31–3.89) 0.89
Surgical procedure (≥1 lobectomy vs. lobectomy vs. sub-lobar resections) 1.45 (0.65–3.23) 0.36
Histology (AC + AC vs. AC + SCC vs. SCC + SCC) 1.04 (0.19–5.86) 0.96
Pleural invasion (present vs. absent) 12.12 (3.24–45.27) <0.001 Insufficient data 1.0 5.55 (0.051–607.82) 0.78
Lymphovascular invasion (present vs. absent) 31.60 (7.03–141.97) <0.001 27.93 (1.85–421.25) 0.016 41.12 (3.26–519.21) 0.004
T stage of the maximal lesion (T3–4vs. T2vs. Tis–1) 3.99 (1.86–8.57) <0.001 Insufficient data 1.0 1.47 (0.092–23.40) 0.79
T stages of the lesions (T2–4 + T2–4vs. Tis–1 + T2–4vs. Tis–1 + Tis–1) 3.69 (1.49–9.10) 0.005 0.45 (0.020–10.36) 0.62 0.24 (1.020–2.93) 0.26
Pure solid nodules (≥2 lesions vs. 1 lesion vs. absent) 4.15 (1.76–9.76) 0.001 Insufficient data 1.0
CTR (≥2 lesions ≥0.90 vs. 1 lesion ≥0.90 vs. all lesions <0.90) 2.87 (1.28–6.44) 0.010 Insufficient data 1.0
Presence of spiculation (≥2 lesions vs. 1 lesion vs. absent) 3.74 (1.67–8.39) 0.001 2.70 (0.66–11.07) 0.17
Presence of lobulation (≥2 lesions vs. 1 lesion vs. absent) 2.14 (0.62–7.43) 0.23
Presence of bubble-like vacuole (≥2 lesions vs. 1 lesion vs. absent) 1.24 (0.54–2.83) 0.62
Presence of air bronchogram (pathologically vs. normally vs. absent) 1.23 (0.56–2.72) 0.61
Presence of pleural indentation (≥2 lesions vs. 1 lesion vs. absent) 3.16 (1.30–7.68) 0.011 1.97 (0.39–9.85) 0.41 3.02 (0.68–13.42) 0.15
Long-axis diameter of the maximal lesion (mm) (per 1 mm increased) 1.044 (1.005–1.085) 0.027 1.000 (0.84–1.19) 1.0
Long-axis diameter of solid portion in the maximal lesion (per 1 mm increased) 1.036 (1.004–1.068) 0.026 0.97 (0.84–1.13) 0.72
CT-based multi-parameter decision tree algorithm model (per split proceed) 1.71 (1.25–2.36) 0.001 1.53 (1.02–2.31) 0.041

, the multivariable binary logistic regression model (model A) was established on the original parameters estimated on chest CT images and other clinicopathological characteristics with P<0.20 in the univariable analysis (Hosmer-Lemeshow test P=0.91); , the multivariable binary logistic regression model (model B) was established on the novel CT-based multi-parameter decision tree algorithm model and other clinicopathological characteristics with P<0.20 in the univariable analysis (Hosmer-Lemeshow test P=0.21). AC, adenocarcinoma; CI, confidence interval; CT, computed tomography; CTR, consolidation tumor ratio; OR, odds ratio; SCC, squamous cell carcinoma; SMPLC, synchronous multiple primary lung cancer.

Risk stratification according to CT-DTA model

With the aim to help therapeutic decision-making based on accurate risk stratification of LNM preoperatively, we tried to classify the two independent cohorts into low-risk (predictive probability 0.14–24.40%) and high-risk (predictive probability 31.96–45.40%) populations in compliance with the leaf node holding the maximum value of Youden index (0.71) based on the training cohort of patients (Figure 6A). Given such criteria, there were 110 (79.1%) low-risk and 29 (20.9%) high-risk patients in the training cohort, and 80 (83.3%) low-risk and 16 (16.7%) high-risk patients in the validation cohort, respectively.

Figure 6 Risk stratification according to the CT-DTA model. (A) The leaf node in CT-DTA model with the maximum value of Youden index (0.71) was determined as the optimal cut-point. (B) Significant differences in the incidence of LNM between low-risk and high-risk patients stratified by CT-DTA model in both the training cohort and the validation cohort. AUC, area under curve; CI, confidence interval; CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, lymph node metastasis.

A significant difference was observed in the incidence of LNM between low-risk (3.6%) and high-risk (41.4%) patients in the training cohort (P<0.001; Figure 6B). Subsequently, a multivariable logistic regression analysis determined that CT-DTA model (OR: 12.01; 95% CI: 2.32–62.32; P=0.003), when analyzed as a risk stratification tool, could be the strongest risk factor for LNM (Table 5). Risk stratification according to CT-DTA model was further externally validated since the high-risk (43.8%) patients had a significantly elevated incidence of LNM when compared to the low-risk (7.5%) patients in the validation cohort (P<0.001; Figure 6B). Finally, when evaluating CT-DTA model in terms of a risk stratification tool, its independent predictive value for the risk of LNM was still stable in the validation cohort as demonstrated by a multivariable logistic regression analysis (OR: 8.11; 95% CI: 1.19–55.30; P=0.033; Table 5).

Table 5

Multivariable logistic regression analyses on the significance of CT-DTA model as a risk stratification tool for lymph node metastasis in patients with SMPLC

Estimated characteristics OR (95% CI)
(multivariable analysis)
P value
Training cohort
   Risk stratification by CT-DTA model (high-risk vs. low-risk) 12.01 (2.32–62.32) 0.003
   Gender (male vs. female) 1.91 (0.13–29.04) 0.64
   Smoking status (current/former vs. never) 1.08 (0.064–18.40) 0.96
   Pleural invasion (present vs. absent) 1.34 (0.30–5.96) 0.70
   Lymphovascular invasion (present vs. absent) 7.48 (1.45–38.65) 0.016
   T stage of the maximal lesion (T3–4vs. T2vs. Tis–1) 5.35 (0.38–76.92) 0.21
   T stages of the lesions (T2–4 + T2–4vs. Tis–1 + T2–4vs. Tis–1 + Tis–1) 15.74 (1.88–131.46) 0.011
   Presence of lobulation (≥2 lesions vs. 1 lesion vs. absent) 2.73 (0.26–28.17) 0.40
Validation cohort
   Risk stratification by CT-DTA model (high-risk vs. low-risk) 8.11 (1.19–55.30) 0.033
   Pleural invasion (present vs. absent) 13.28 (0.092–1,924.86) 0.31
   Lymphovascular invasion (present vs. absent) 21.88 (3.45–138.67) 0.001
   T stage of the maximal lesion (T3–4vs. T2vs. Tis–1) 1.31 (0.070–24.41) 0.86
   T stages of the lesions (T2–4 + T2–4vs. Tis–1 + T2–4vs. Tis–1 + Tis–1) 6.94 (0.43–111.11) 0.17
   Presence of pleural indentation (≥2 lesions vs. 1 lesion vs. absent) 3.01 (0.68–13.36) 0.15

, the multivariable binary logistic regression model based on the training cohort: Hosmer-Lemeshow test P=0.84; the multivariable binary logistic regression model based on the validation cohort: Hosmer-Lemeshow test P=0.75. CI, confidence interval; CT-DTA, computed tomography-based multi-parametric decision tree algorithm; OR, odds ratio; SMPLC, synchronous multiple primary lung cancer.

Subgroup analyses on the entire cohort

As AUC values generated from subgroup ROC analyses indicated, the predictive accuracy of CT-DTA model for the risk of LNM remained significantly reliable across all the subgroups of age, gender, smoking status, preoperative comorbidity, location of lesions, histology, pleural invasion, lympho-vascular invasion, and T-stages of the lesions (Figure 7).

Figure 7 Subgroup ROC analyses regarding the predictive accuracy of CT-DTA model for risk of LNM based on the entire cohort of patients with SMPLC. AC, adenocarcinoma; AUC, area under the curve; CI, confidence interval; CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, lymph node metastasis; ROC, receiver operating characteristic; SMPLC, synchronous multiple primary lung cancer.

Another forest plot depicting the OR statistics of CT-DTA model from subgroup multivariable logistic regression analyses was shown in Figure 8. After controlling confounding effects from other clinicopathological covariables, we found that the significance of CT-DTA model as an independent risk factor for LNM continued to stay robust across all the subgroups stratified by gender, smoking status, histology, pleural invasion, lympho-vascular invasion, and T-stages of the lesions. Furthermore, this CT-DTA model could also be employed to independently predict the risk of LNM among the elderly patients, patients without any underlying comorbidity and those with lesions distributed on the ipsilateral side of the lung (Figure 8).

Figure 8 Subgroup multivariable logistic regression analyses regarding the predictive significance of CT-DTA model for risk of LNM based on the entire cohort of patients with SMPLC. AC, adenocarcinoma; CI, confidence interval; CT, computed tomography; CT-DTA, CT-based multi-parametric decision tree algorithm model; LNM, lymph node metastasis; OR, odds ratio; SMPLC, synchronous multiple primary lung cancer.

Discussion

Key results and interpretations

To our knowledge, this is the first time to employ a DTA modeling technique in multi-parametric risk assessment based on the relevant imaging features measured on chest TS-CT particularly for surgically resectable SMPLC. In this multi-center study, we had established a novel and non-invasive CT-DTA model by efficiently integrating five key determinants from chest TS-CT preoperatively, including CTR, presence of spiculation, number of PSNs, the long-axis diameters of the lesions and solid portion in the lesions, in order to precisely predict the incidence of LNM before undergoing surgery for SMPLC. After externally validated by a series of functional analyses, the excellent performance of CT-DTA model might have the potential to alert thoracic surgeons of high-risk of LNM in advance.

One of our focuses was to insert a novel DTA model into conventional radiological evaluation in the current clinical practice of surgically resectable SMPLC. For the first time, this study offered multivariable results showing possible CT-derived features related to the risk of LNM in surgical patients with SMPLC but none of them was observed with any predictive independence. Given such concerns, we developed a DTA model by incorporating five critical imaging parameters with statistical significance in the univariable analyses, and finally validated it as a clinically useful risk assessment tool by ROC analysis, calibration curve and multivariable logistic regression analysis in discerning which patients could easily develop N1–2 stage LNM. The seven leaf nodes generated in the CT-DTA model were from binary splits of five pivotal CT-based imaging features of pulmonary nodules, including CTR, presence of spiculation, number of PSNs, the long-axis diameters of the lesions and solid portion in the lesions. The following possible evidence from most recent investigations might help to elucidate the predictive strength of our CT-DTA model.

Firstly, a margin spiculation sign apparent on chest CT could be indicative of fibrotic stroma or desmoplasia, which was characterized by tumor microenvironmental remodeling due to desmoplastic reaction, proliferation of fibroblasts and dense deposition of extracellular matrix, all of which participate in modulating normal stroma into tumor stroma and then enhancing the growth and viability of cancer cells (16). During the process of oncogenesis, cancer-associated fibroblasts produce a variety of tumor growth factors, cytokines, chemokines and immune modulators and play an essential role in tissue fibrosis and desmoplasia (17). Therefore, the presence of spiculation is generally associated with LNM or distant metastasis from primary NSCLC, even though at an early stage when the disease was newly diagnosed. Secondly, it had been well demonstrated by a meta-analysis that high proportion of solid component in a nodule on chest CT was significantly associated with unfavorable pathological characteristics and overall survival of NSCLC (18,19). CTR >0.90, especially appeared in all the existing nodules, had also been proved as one key contributor in this CT-DTA model to the occurrence of LNM. Moreover, as previously reported, we had clarified the substantially prognostic roles of long-axis diameter of the lesion and PSN for spread through air spaces and worse overall survival in single or multiple lung adenocarcinoma (13,20). Choi et al. (21) had also emphasized the importance of lymph node evaluation in pure-solid NSCLC with no less than 2 cm in its long-axis diameter since such a PSN almost always had a strong linkage to invasive features in pathology.

Another highlight of our multi-center study was to perform both ROC analysis and multivariable logistic regression analysis on the clinical significance of CT-DTA model in each specific set of patient subgroups. Finally, CT-DTA model was found to be well validated for predicting the risk of LNM across nearly all the subgroups classified according to clinicopathological characteristics, especially in those who were traditionally regarded as the low-risk populations, such as female non-smokers, the patients without any underlying comorbidity, pleural or lympho-vascular invasion, and the patients diagnosed at early stage of the tumors. As regards the subgroups which failed to generate significant OR statistics, we speculated that the restricted sample size within those subgroups had the potential to attenuate the analytical performance when employing risk evaluations.

Clinical implications

Our findings provide solid evidence to support the involvement of a novel, easy-to-use and well-validated CT-DTA model in risk stratification prior to curative-intent resections for SMPLC to distinguish the patients who have a higher risk of LNM more accurately. The predictive probability can be exactly extrapolated according to this CT-DTA model, whose imaging features can be conveniently and non-invasively measured through chest TS-CT in routine practice. Moreover, the predictive accuracy of LNM in SMPLC may be obviously improved under the assistance of CT-DTA model before surgery. Therefore, this CT-DTA model has been proposed to aid thoracic surgeons in more accurate risk evaluation and then facilitate decision-making process on more individualized treatment plans following surgery to limit potential adverse events.

Limitations

Despite the above insightful findings, the following several limitations in this study should be sufficiently acknowledged. First of all, it was designed as a retrospective cohort study based on three prospectively maintained datasets with external validation and internal subgroup analyses. Due to the intrinsic limitations of retrospective nature, potential selection bias, such as variations in clinical pathways and treatment options across different centers, might still have weakened the demonstrative power of CT-DTA model as a reliable risk prediction tool. Second, due to the fact that the SMPLC itself belongs to a rare subtype of NSCLC, the relatively small sample size, while enrolled from three high-volume tertiary centers, might have brought negative effects on the statistical strength. Thus, a prospective study covering much more tertiary centers is urgently needed, and that is also a future project led by our research team on the basis of the present study. Third, our CT-DTA model had no efficacy to prejudge the nodal station or number of lymph nodes involved by metastatic cancer cells. Finally, qualitative evaluation of radiological features sometimes depends on the expertise of radiologists, and thus, such a subjective factor might result in confounding influence.


Conclusions

In conclusion, this study has proposed a novel user-friendly and non-invasive DTA model based on multiple radiological features easily obtained on chest TS-CT for accurate risk prediction of LNM before surgical resection for SMPLC. The CT-DTA model can serve as a practically useful tool to improve the predictive performance of traditional risk assessment and aid thoracic surgeons in decision-making of necessary lymphadenectomy and adjunctive therapies for potential high-risk patients with SMPLC who intend to undergo radical surgery. Larger-scale multi-center prospective studies are warranted to further validate the CT-DTA model for clinical utility.


Acknowledgments

We give special thanks to Dr. Chuanmiao Xie, from Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, for his great assistance in this study. We also give special thanks to Mrs. Hong Xie and Mrs. Peng Wang, from Department of Medical English, West China School of Medicine, Sichuan University, for their English language editing to this manuscript.


Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2440/rc

Funding: This work was supported by grants from the National Key Research and Development Program of China (No. 2023YFF1204303) and the Youth Fund of Guangzhou Municipal Science and Technology Project (No. 2024A04J4243).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2440/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was approved by the Institutional Review Board of Sun Yat-sen University Cancer Center (No. B2022-293-Y01). All participating institutions were informed and agreed the study. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The participants were required to give informed consent before taking part.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Jia X, Wang Y, Zhang H, Sun D. Current status and quality of prognosis prediction models of non-small cell lung cancer constructed using computed tomography (CT)-based radiomics: a systematic review and radiomics quality score 2.0 assessment. Quant Imaging Med Surg 2024;14:6978-89. [Crossref] [PubMed]
  2. Liu Z, Wang L, Gao S, Xue Q, Tan F, Li Z, Gao Y. Plasma metabolomics study in screening and differential diagnosis of multiple primary lung cancer. Int J Surg 2023;109:297-312. [Crossref] [PubMed]
  3. Jiang C, Zhang Y, Fu F, Deng P, Chen H. A Shift in Paradigm: Selective Lymph Node Dissection for Minimizing Oversurgery in Early Stage Lung Cancer. J Thorac Oncol 2024;19:25-35. [Crossref] [PubMed]
  4. Zhang R, Wang G, Lin Y, Wen Y, Huang Z, Zhang X, Yu X, Wang W, Xi K, Cerfolio RJ, D'Journo XB, Ruetzler K, Depypere L, Filosso PL, Zhang L. written on behalf of AME Thoracic Surgery Collaborative Group. Extent of resection and lymph node evaluation in early stage metachronous second primary lung cancer: a population-based study. Transl Lung Cancer Res 2020;9:33-44. [Crossref] [PubMed]
  5. Xie Z, Yang Y, Niu Z, Mao G, Zhu X, Xu Z, Yang D, Wang H, Wang J. Preoperative computed tomography semantic features in predicting lymph node metastasis of part-solid nodules in non-small cell lung cancer: a multicenter retrospective study. Quant Imaging Med Surg 2024;14:5151-63. [Crossref] [PubMed]
  6. Xie X, Yan H, Liu K, Guan W, Luo K, Ma Y, Xu Y, Zhu Y, Wang M, Shen W. Value of dual-layer spectral detector CT in predicting lymph node metastasis of non-small cell lung cancer. Quant Imaging Med Surg 2024;14:749-64. [Crossref] [PubMed]
  7. Luo Y, Li S, Ma H, Zhang W, Liu B, Xie C, Li Q. CT-based decision tree model for predicting EGFR mutation status in synchronous multiple primary lung cancers. J Thorac Dis 2023;15:1196-209. [Crossref] [PubMed]
  8. Wan F, He W, Zhang W, Zhang H, Zhang Y, Guang Y. Application of decision tree algorithms to predict central lymph node metastasis in well-differentiated papillary thyroid carcinoma based on multimodal ultrasound parameters: a retrospective study. Quant Imaging Med Surg 2023;13:2081-97. [Crossref] [PubMed]
  9. Kozower BD, Larner JM, Detterbeck FC, Jones DR. Special treatment issues in non-small cell lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e369S-99S.
  10. Detterbeck FC, Boffa DJ, Kim AW, Tanoue LT. The Eighth Edition Lung Cancer Stage Classification. Chest 2017;151:193-203.
  11. Fernandez FG, Falcoz PE, Kozower BD, Salati M, Wright CD, Brunelli A. The Society of Thoracic Surgeons and the European Society of Thoracic Surgeons general thoracic surgery databases: joint standardization of variable definitions and terminology. Ann Thorac Surg 2015;99:368-76. [Crossref] [PubMed]
  12. Zhang Y, Li G, Li Y, Liu Q, Yu Y, Ma Y, Pan Y, Zhang Y, Hu H, Sun Y, Zhang Y, Xiang J, Chen H. Imaging Features Suggestive of Multiple Primary Lung Adenocarcinomas. Ann Surg Oncol 2020;27:2061-70. [Crossref] [PubMed]
  13. Ma H, Li S, Zhu Y, Zhang W, Luo Y, Liu B, Gou W, Xie C, Li Q. A Novel Prognostic Score Based on Multiple Quantitative Parameters of Chest CT for Patients with Synchronous Multiple Primary Lung Cancer: Is Solid Component Size a Better Prognostic Indicator? Ann Surg Oncol 2023;30:3769-78. [Crossref] [PubMed]
  14. Grant SW, Collins GS, Nashef SAM. Statistical Primer: developing and validating a risk prediction model. Eur J Cardiothorac Surg 2018;54:203-8. [Crossref] [PubMed]
  15. Wang Y, Zhang L, Jiang Y, Cheng X, He W, Yu H, Li X, Yang J, Yao G, Lu Z, Zhang Y, Yan S, Zhao F. Multiparametric magnetic resonance imaging (MRI)-based radiomics model explained by the Shapley Additive exPlanations (SHAP) method for predicting complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicenter retrospective study. Quant Imaging Med Surg 2024;14:4617-34. [Crossref] [PubMed]
  16. Kim H, Park CM. Tumor-associated prognostic factors extractable from chest CT scans in patients with lung cancer. Transl Lung Cancer Res 2023;12:1133-9. [Crossref] [PubMed]
  17. Yang H, Sun B, Ma W, Fan L, Xu K, Jia Y, Xu J, Wang Z, Yao F. Multi-scale characterization of tumor-draining lymph nodes in resectable lung cancer treated with neoadjuvant immune checkpoint inhibitors. EBioMedicine 2022;84:104265. [Crossref] [PubMed]
  18. Jing W, Liu M, Li W, Li D, Wu Y, Lv F. Prognostic implication of consolidation-to-tumor ratio in early lung adenocarcinoma: a retrospective cross-sectional study. Quant Imaging Med Surg 2024;14:3366-80. [Crossref] [PubMed]
  19. Nie Y, Wang X, Yang F, Zhou Z, Wang J, Chen K. Surgical Prognosis of Synchronous Multiple Primary Lung Cancer: Systematic Review and Meta-Analysis. Clin Lung Cancer 2021;22:341-350.e3. [Crossref] [PubMed]
  20. Liu BC, Ma HY, Huang J, Luo YW, Zhang WB, Deng WW, Liao YT, Xie CM, Li Q. Does dual-layer spectral detector CT provide added value in predicting spread through air spaces in lung adenocarcinoma? A preliminary study. Eur Radiol 2024;34:4176-86. [Crossref] [PubMed]
  21. Choi S, Yoon DW, Shin S, Kim HK, Choi YS, Kim J, Shim YM, Cho JH. Importance of Lymph Node Evaluation in ≤2-cm Pure-Solid Non-Small Cell Lung Cancer. Ann Thorac Surg 2024;117:586-93. [Crossref] [PubMed]
Cite this article as: Zhang W, Ma H, Zhu Y, Gou W, Liu B, Li Q, Li S. A novel computed tomography-based multi-parameter decision tree algorithm model for preoperatively predicting the risk of lymph node metastasis in surgically resectable synchronous multiple primary lung cancer. Quant Imaging Med Surg 2025;15(6):4972-4994. doi: 10.21037/qims-24-2440

Download Citation