Inter-observer consistency on subsolid nodule follow-up recommendation based on National Comprehensive Cancer Network (NCCN) guidelines in low-dose computed tomography (LDCT) lung cancer screening
Original Article

Inter-observer consistency on subsolid nodule follow-up recommendation based on National Comprehensive Cancer Network (NCCN) guidelines in low-dose computed tomography (LDCT) lung cancer screening

Quanyang Wu1#, Lina Zhou1#, Wei Tang1, Yao Huang1, Jianwei Wang1, Linlin Qi1, Zewei Zhang2, Hongjia Li2, Shuluan Chen1, Jiaxing Zhang1, Shijun Zhao1, Ning Wu1,2

1Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; 2Department of Nuclear Medicine (PET-CT Center), National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Contributions: (I) Conception and design: Q Wu, L Zhou, S Zhao, N Wu; (II) Administrative support: None; (III) Provision of study materials or patients: Q Wu, Z Zhang; (IV) Collection and assembly of data: Q Wu, L Zhou, S Zhao, N Wu; (V) Data analysis and interpretation: Q Wu, L Zhou, S Zhao, N Wu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Ning Wu, MD. Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17, Pan Jia Yuan Nan Li, Beijing 100021, China; Department of Nuclear Medicine (PET-CT Center), National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17, Pan Jia Yuan Nan Li, Chaoyang District, Beijing, China. Email: cjr.wuning@vip.163.com; Shijun Zhao, MD. Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17, Pan Jia Yuan Nan Li, Chaoyang District, Beijing 100021, China. Email: shijunzhao@aliyun.com.

Background: Follow-up management of pulmonary nodules is a crucial component of lung cancer screening. Consistency in follow-up recommendations is essential for effective lung cancer screening. This study aimed to assess inter-observer agreement on National Comprehensive Cancer Network (NCCN) guideline-based follow-up recommendation for subsolid nodules from low-dose computed tomography (LDCT) screening.

Methods: A retrospective collection of LDCT reports from 2014 to 2017 for lung cancer screening was conducted using the Radiology Information System and keyword searches, focusing on subsolid nodules. A total of 110 LDCT cases containing subsolid nodules were identified. Two senior radiologists provided standardized follow-up recommendation. Follow-up recommendation was categorized into four groups (0-, 3-, 6-, and 12-month). To ensure overall balance and representativeness of the follow-up categories, 60 scans from 60 participants were included (distribution ratio 1:1:2:2). Cases were categorised into follow-up recommendation groups by five observers following NCCN guidelines. Fleiss’ kappa statistic was used to evaluate inter-observer agreement.

Results: Overall accuracy rate for follow-up recommendation among five observers was 72.3%. Chest radiologists’ overall agreement was significantly higher than radiology residents (P<0.01). The overall agreement among the five observers was moderate, with a Fleiss’ kappa of 0.437. For all paired readers, the mean Cohen’s kappa value was 0.603, with 95% confidence interval (CI) from 0.489 to 0.716. Chest radiologists demonstrated substantial agreement, evidenced by a Cohen’s kappa of 0.655 (95% CI: 0.503–0.807). In contrast, the mean Cohen’s kappa among radiology residents was 0.533 (95% CI: 0.501–0.565). The majority of cases with discrepancies, accounting for 73.5%, were associated with the same risk-dominant nodules. A higher proportion of part-solid nodule was a risk factor for discrepancies. Of the 600 paired readings, major discrepancies and substantial discrepancies were observed in 27.5% and 4.8% (29/600) of the cases.

Conclusions: In subsolid nodules, category evaluation of observer follow-up recommendation based on NCCN guidelines achieved moderate consistency. Disagreements were mainly caused by measurement and type disagreements of identical risk-dominant nodules. Part-solid nodule was a contributor for discrepancies in follow-up recommendation. Major and substantial management discrepancies were 27.5% and 4.8% in the paired evaluations.

Keywords: Lung cancer; observer variation; low-dose computed tomography (LDCT); subsolid nodule


Submitted Dec 25, 2023. Accepted for publication Aug 08, 2024. Published online Aug 28, 2024.

doi: 10.21037/qims-23-1824


Introduction

The baseline results of the International Early Lung Cancer Action Program (I-ELCAP) show that computed tomography (CT) screening can identify a substantial proportion of early curable lung cancers, leading to the rapid initiation of large-scale international screening studies (1). The National Lung Screening Trial (NLST) and Dutch-Belgian Randomized Lung Cancer Screening Trial (NELSON) indicate that low-dose computed tomography (LDCT) lung cancer screening can reduce mortality rates in individuals at high risk of lung cancer (2,3), leading to recommendation from multiple organisations, including the US Preventive Services Task Force, to screen individuals at high risk of lung cancer using LDCT (4).

LDCT screening can detect many nodules, and standardized reporting and management recommendation for CT lung screening contribute to facilitating outcome monitoring. Interpreting LDCT lung cancer screening images is a time-consuming and labor-intensive task for radiologists, and achieving consistency in follow-up recommendations is a common challenge faced by radiologists worldwide. The National Comprehensive Cancer Network (NCCN) has issued guidelines for the classification and management of lung cancer screening. These guidelines aim to provide healthcare professionals with standardized procedures for lung cancer screening, diagnosis, and management, based on the latest clinical evidence and expert consensus. The NCCN lung cancer screening guidelines encompass eligibility criteria, screening methods, interpretation of results, follow-up management, and an assessment of risks and benefits. The NCCN guidelines primarily utilize nodule type and size as criteria to differentiate between high-risk and low-risk nodules (5).

Multiple studies have conducted in-depth analyses of the natural growth history of subsolid pulmonary nodules (2,3,6). Of the 607 cases of early-stage lung cancer identified by CT screening in the I-ELCAP cohort, the proportion of subsolid nodules was approximately 1/3 (33.4%) (7,8). The NELSON baseline study found that subsolid nodules were approximately twice as likely to be lung cancer as solid nodules (4.5% vs. 2.3%) (9). These findings highlight the importance of carefully monitoring and assessing subsolid nodules. Many persistent non-solid nodules represent preinvasive lesions of invasive adenocarcinoma (10). Approximately 2.1% to 16.7% of non-solid nodules develop solid components during subsequent follow-up (11,12). Meanwhile, overdiagnosis may result in patient anxiety and unwarranted treatments. Thus, offering rational and uniform follow-up recommendations is of paramount importance for nodule management.

It is widely recognised that visual evaluation of nodule types and manual measurement of nodule diameters are susceptible to considerable observer variability (13,14). This inherent variability introduces challenges in accurately characterising and measuring nodules, potentially causing discrepancies in interpretation and management decisions. Previous studies have indicated that there are varying degrees of observer concordance when differentiating nodule density types, and considerable differences between readers in assigning nodule categories for screening using Lung CT Screening Reporting and Data System (Lung-RADS), with 8% of patients receiving different management recommendation from different radiologists (15). Additionally, research datasets from different populations may lead to variations in results. Considering the high proportion of subsolid nodule in lung cancer cases in Asian populations (16,17), it is important to evaluate the inter-observer consistency of follow-up recommendation in subsolid nodule cases.

This study assessed the inter-observer consistency of the NCCN guideline-based follow-up recommendation among observers for subsolid nodules from baseline LDCT screening.


Methods

NCCN nodule classification and study subgroups

According to the NCCN guidelines (Version 2023) (5), nodules within the dataset were classified. Observers were required to independently select the risk-dominant nodule and document its density type, size, and location. The classification of nodule type (solid, part-solid, or non-solid) and the calculation of nodule size (average diameter) were based on the principles outlined in the NCCN guidelines. Follow-up recommendation for each case was determined based on the risk-dominant nodule. Follow-up recommendation was divided into four categories: repeat scan in 0 month, repeat scan in 3 months, repeat scan in 6 months, and repeat scan in 12 months [For the convenience of the study, we defined diagnostic/standard chest CTs, Positron Emission Tomography-Computed Tomography (PET-CT) examination and biopsy procedure recommended in the NCCN guidelines as 0-month follow-up].

Data collection and study groups

During the period of 2014 to 2017, LDCT image and basic information were collected from participants in lung cancer screening at the Department of Cancer Prevention of the National Cancer Center. All LDCT diagnostic reports were reviewed by senior radiologists. Screening results were categorized into positive screening and negative screening. We reviewed radiology reports from the positive nodule population containing specific terminology associated with subsolid nodules, including phrases such as ground-glass nodules, non-solid nodules, subsolid nodules, and part-solid nodules. Firstly, we included subsolid nodules mentioned in the reports (positive screening) that were recommended further evaluation within 6 months or suggested diagnostic/standard chest CTs, biopsy and PET-CT in our study. The exclusion criteria included reports describing typical ground-glass opacities linked to inflammation, and a total of 68 cases containing subsolid nodules were included. Secondly, 200 LDCT reports were randomly selected from 4,969 participants with negative screening, and annual repeat screening were recommended in all of these reports. Out of 200 scans, 42 were detected with subsolid nodules on LDCT. Thirdly, to develop a standard reference for follow-up management, two senior radiologists (20 and 15 years of chest experience) independently evaluated the 110 (68+42) cases initially included in the study, matched each case with a follow-up management strategy according to NCCN guidelines, and the final decision on cases that generated disagreement was made by another chief radiologist (30 years of chest experience). The numbers of 0-month follow-up, 3-month follow-up, 6-month follow-up, and 12-month follow-up were 12, 16, 40 and 42 cases, respectively. Finally, to ensure balance and representativeness in all follow-up categories, we constructed a dataset that included all NCCN follow-up categories and distributed them in a ratio of 1:1:2:2 (randomly selected) among the categories (0, 3, 6, and 12 months). Therefore, the final dataset used for inter-observer consistency study included 60 (10+10+20+20) scans corresponding to 60 participants (Figure 1). To avoid potential selection bias, we compared an inclusion group (n=60) with an exclusion group (n=50), and found no significant differences between the two groups at either the participant level or the nodule level (Table S1).

Figure 1 Flowchart for construction of subsolid nodule dataset. *, subsolid nodules referred to nodules that radiologists recommended follow-up within 6 months or required biopsy or/and diagnostic chest CT/PET-CT; **, subsolid nodules referred to nodules that radiologists recommended annual screening; ***, the final dataset included all NCCN follow-up categories and distributed them in a ratio of 1:1:2:2 among the categories (0, 3, 6 and 12 months). LDCT, low-dose computed tomography; PET-CT, positron emission tomography-computed tomography; NCCN, National Comprehensive Cancer Network.

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by ethics committee of Cancer Hospital, Chinese Academy of Medical Sciences (No. NCC2020C-243) and individual consent for this retrospective analysis was waived.

LDCT scan parameters

CT scans were performed using 64- and 128-detector row scanners from multiple manufacturers, including General Electric Medical Systems (Discovery CT750 HD or Optima CT660) and Siemens Medical Systems (MOMATOM go Top or Definition Edge), utilizing a low-dose scanning protocol. The scanning parameters included a tube voltage of 120 kVp, automatic tube current ranging from a minimum of 30 mA to a maximum of 300 mA, and a gantry rotation time of 0.5 seconds, with a slice thickness of 5 mm. The reconstructed image slice thickness was set to 1.0 or 1.25 mm. The noise index was set at 40 for GE systems or 20–30 mAs for Siemens systems. The image data were then transferred to the Picture Archiving and Communication System (PACS, Carestream, Philips, Amsterdam, The Netherlands) for storage and retrieval.

Observer and reading method

This study involved the participation of five observers, comprising two chest radiologists with 5–20 years of experience in chest diagnosis who were regularly interpreting LDCT results for lung cancer screening. The remaining three observers were radiology residents. In addition, two senior radiologists with 15–20 years of diagnostic experience independently evaluated the above cases and provided standard follow-up recommendations. During the image review, all observers were familiar with the NCCN guidelines. Computer-aided detection (CAD) tools were prohibited. The observers calculated the risk-dominant nodule size based on the guidelines, using the long and vertical diameters to calculate the mean diameter (lung window) and provided follow-up recommendation for each patient. LDCT evaluation was performed on images with a slice thickness of either 1.0 or 1.25 mm. In accordance with the guidelines, five observers individually formulated follow-up strategies for each participant. The observers were required to record the nodule type, size, and location of each risk-dominant nodule. During the reading period, all the observers had access to a hardcopy of the NCCN guidelines.

Statistical analyses

Using the reference standard for comparison, a table of follow-up recommendation was made for each observer, and the consistency rate was calculated. To assess the impact of inconsistent classification on nodule management, the inconsistent cases were categorized into three groups: minor inconsistency, major inconsistency, and significant inconsistency. Minor inconsistency was defined as a difference of ≤3 months; major inconsistency was defined as a difference of >3 to <9 months; significant inconsistency was defined as a difference of ≥9 months.

Fisher’s exact test was used to determine the significance of association between two categorical variables in a contingency table. Fleiss’ kappa statistic was used to measure inter-observer agreement for NCCN follow-up recommendation among multiple observers. The Cohen’s kappa statistic was employed to determine the inter-observer agreement between pairs of observers. The kappa values were interpreted based on the criteria established by Landis and Koch (18).

The subsolid nodule dataset was partitioned into two subsets according to the degree of agreement among observers. Subset A encompassed cases accurately classified by a minimum of four observers, whereas subset B included cases accurately classified by three or fewer observers. Fisher’s exact test was used to compare the proportion of nodule types between subset A and B to identify nodules with significant risk.

Statistical significance was defined as P<0.05. Analysis of the data was conducted with the assistance of the Statistical Package for the Social Sciences (SPSS, version 27, IBM Corp., Armonk, NY, USA).


Results

Demographics results

The basic demographic and main characteristics related to lung cancer for the 60 participants were presented in Table 1. The average age of the participants was 56 years, and 50.0% (30/60) were male. A history of smoking was present in 38.3% of the participants. The risk-dominant nodules were mainly located in the upper lobe, accounting for approximately 53.3% (32/60). Further details regarding participants’ information, including education level, family history of tumours, respiratory system diseases, and occupational exposure history, were described in Table 1.

Table 1

Demographic information and nodule characteristics of the 60 participants

Subject characteristics Value
Age* (years) 56.0±7.40
   40–49 10 (16.7)
   50–59 35 (58.3)
   60–69 13 (21.7)
   ≥70 2 (3.3)
Sex
   Male 30 (50.0)
   Female 30 (50.0)
Smoking status
   Current smoker 18 (30.0)
   Former smoker 5 (8.3)
   No smoker 31 (51.7)
   Unknown 6 (10.0)
Passive smoking
   No 9 (15.0)
   Yes 45 (75.0)
   Unknown 6 (10.0)
Education
   High school or below 16 (26.7)
   Associate bachelor 11 (18.3)
   Bachelor 20 (33.3)
   Master or above 5 (8.3)
   Unknown 8 (13.3)
COPD
   No 51 (85.0)
   Unknown 9 (15.0)
Asthma
   No 51 (85.0)
   Unknown 9 (15.0)
Family history of respiratory disease
   No 10 (16.7)
   Yes 44 (73.3)
   Unknown 6 (10.0)
Family history of lung cancer
   No 41 (68.3)
   Yes 13 (21.7)
   Unknown 6 (10.0)
Family history of other malignancies
   No 32 (53.3)
   Yes 22 (36.7)
   Unknown 6 (10.0)
Asbestos exposure or occupational exposure
   No 52 (86.7)
   Yes 2 (3.3)
   Unknown 6 (10.0)
Diabetes
   No 50 (83.3)
   Yes 2 (3.3)
   Unknown 8 (13.3)
Nodule type
   0-month follow-up 10
    Part-solid nodule 10 (100.0)
    Non-solid nodule 0
   3-month follow-up 10
    Part-solid nodule 10 (100.0)
    Non-solid nodule 0
   6-month follow-up 20
    Part-solid nodule 18 (90.0)
    Non-solid nodule 2 (10.0)
   12-month follow-up 20
    Part-solid nodule 2 (10.0)
    Non-solid nodule 18 (90.0)
Nodule size*
   0-month follow-up 18.6±3.35
   3-month follow-up 12.2±3.32
   6-month follow-up 10.7±4.77
   12-month follow-up 9.1±3.46
Nodule location
   Upper lobe 32 (53.3)
   Middle lobe 6 (10.0)
   Lower lobe 22 (36.7)

*, age and nodule size are presented as mean ± standard deviation, while the remaining data are presented as number (frequency) or No. (%). COPD, chronic obstructive pulmonary disease.

Accuracy of NCCN follow-up management

Five observers provided detailed follow-up recommendation for 60 LDCT cases (Figure 2, Table 2). Compared with the standard follow-up recommendation, the overall agreement rate for each observer’s follow-up recommendation was 72.3%, the total agreement rate for negative screenings was 88.0% (12-month follow-up), and the total agreement rate for positive screenings was 64.5%. Among the positive screenings, the total agreement rate was highest for the 0-month follow-up group (82.0%) and lowest for the 3-month follow-up group (48.0%). A notable disparity was identified in the proportion of consistent cases between negative and positive screening results (P<0.01). In cases of inconsistent follow-up recommendation for positive screening, the proportions of minor and major inconsistencies were 14.5% and 21.0%, respectively. Among cases of inconsistency, minor inconsistencies were mainly distributed in the 3-month follow-up group (38.0%), whereas major inconsistencies were concentrated in the 6-month follow-up group (30.0%).

Figure 2 Accuracy of NCCN guideline management categorisations. This image presented the follow-up recommendations from five observers. The first two columns on the left showed the follow-up categories and the number of cases: 12-month follow-up (20 cases), 6-month follow-up (20 cases), 3-month follow-up (10 cases), and 0-month follow-up (10 cases). The five columns on the right represented the five observers, with the numbers in the table indicating the suggested follow-up intervals. Light blue indicated concordance, blue indicated minor discordance, and dark blue indicated major discordance. NCCN, National Comprehensive Cancer Network.

Table 2

Concordance rates for the classification of management recommendations for observers according to the NCCN guidelines

Group Concordance Discordance P value
Minor Major
Negative screenings 88 (88.0%) 0 12 (12.0%)
   12-month follow-up 88 (88.0%) 0 12 (12.0%)
Positive screenings 129 (64.5%) 29 (14.5%) 42 (21.0%) <0.01*
   6-month follow-up 64 (64.0%) 6 (6.0%) 30 (30.0%)
   3-month follow-up 24 (48.0%) 19 (38.0%) 7 (14.0%)
   0-month follow-up 41 (82.0%) 4 (8.0%) 5 (10.0%)

*, comparison of concordance between positive screening group and negative screening group. NCCN, National Comprehensive Cancer Network.

Influence of experience on degree of discordance

We performed subgroup analyses according to diagnostic experience. The overall agreement rate of the chest radiologists was significantly higher than that of the radiology residents (P<0.01). In both the positive and negative screening group, the agreement rate of the chest radiologists was significantly higher than that of the radiology residents (P<0.05, P<0.01). The consistency of the chest radiologists and radiology residents in the negative screening group was more accurate than positive screening group (P<0.01, P<0.01). In cases of disagreement, although the proportion of major disagreements among chest radiologists was lower than among radiology residents, there existed no notable distinction between the two groups (Table 3).

Table 3

Consistency of observers stratified by diagnostic experience

Group Concordance Discordance P value
Minor Major
Chest radiologists 99 (82.5%) 7 (5.8%) 14 (11.7%) <0.01*
   Negative screenings 40 (100%) 0 0
   Positive screenings 59 (73.8%) 7 (8.8%) 14 (17.5%)
Radiology residents 118 (65.6%) 22 (12.2%) 40 (22.2%) <0.01**
   Negative screenings 48 (80%) 0 12 (20.0%)
   Positive screenings 70 (58.3%) 22 (18.3%) 28 (23.3%)

*, differences in concordance between negative and positive screening among chest radiologists; **, differences in concordance between negative and positive screening among radiology residents.

Inter-observer agreement

The observers demonstrated moderate agreement for LDCT follow-up recommendations per the NCCN guidelines, achieving a Fleiss’ kappa value of 0.437, with 95% confidence interval (CI) from 0.388 to 0.487 across all five observers. Among the three radiology residents, the Fleiss’ kappa value was 0.350 (95% CI: 0.260–0.440). Pairwise weighted Cohen’s kappa values varied from 0.419 (95% CI: 0.264–0.574) to 0.777 (95% CI: 0.665–0.889), with an average value of 0.603 (95% CI: 0.489–0.716). A Cohen’s kappa coefficient of 0.655 (95% CI: 0.503–0.807) indicated substantial agreement among the chest radiologists. The radiology residents exhibited an average weighted kappa coefficient of 0.533 (95% CI: 0.501–0.565). The results were summarised in Table 4.

Table 4

Inter-observer agreement in kappa values

Reader Fleiss’ kappa
(95% CI)
Cohen’s kappa
(95% CI)
All readers 0.437 (0.388–0.487) 0.603 (0.489–0.716)
Chest radiologists NA 0.655 (0.503–0.807)
Radiology residents 0.350 (0.260–0.440) 0.533 (0.501–0.565)

Fleiss’ kappa: Fleiss’s kappa statistic was used to measure inter-observer agreement for NCCN follow-up recommendation among multiple observers. Cohen’s kappa: Cohen’s kappa statistic was used to determine the inter-observer agreement between pairs of observers. CI, confidence interval; NA, not applicable; NCCN, National Comprehensive Cancer Network.

Causes of follow-up management disagreement

We analyzed the results of 600 evaluations conducted by five observers in paired assessments (10 pairs × 60 cases =600) and found that 39.0% (234/600) of the follow-up recommendation had differences, of which major discordance accounted for 27.5% (165/600) and substantial management discordance accounted for 4.8% (29/600). Instances with inconsistencies were separated into two categories: discordance linked to the same risk-dominant nodule, and discordance linked to distinct risk-dominant nodules (Figures 3,4).

Figure 3 A risk-dominant nodule categorised differently due to different feature judgements by five observers. The case showed a nodule displayed in normal view and magnified view. The three rows showed the axial (top), coronal (middle) and sagittal (bottom) planes. The arrows indicate the risk-dominant nodules categorised by different observers. LDCT showed the nodule was classified as a part-solid nodule by four observers, three of whom categorised it for 6-month follow-up (part-solid nodule with solid component <6 mm) and one observer categorised it for 3-month follow-up (part-solid nodule with solid component >6 to <8 mm). Another observer classified as a non-solid nodule with a recommendation for annual screening. LDCT, low-dose computed tomography.
Figure 4 Management discrepancies caused by differences in risk-dominant nodule selection in two cases. (A-C) In the same case, different observers may select different nodules as the dominant risk nodule. (D) In another case, four observers selected this nodule as the risk-dominant nodule. The arrows indicate the risk-dominant nodules categorised by different observers. (A) One observer who regarded the first nodule as the risk-dominant nodule categorised it as annual screening. (B) Three observers regarded the second nodule as the risk-dominant nodule (part-solid nodule with solid component <6 mm,) and categorised it for 6-month follow-up. (C) One observer regarded the third nodule as the risk-dominant nodule (part-solid nodule with solid component >6 to <8 mm) and categorised it for 3-month follow-up. (D) In another case, 4 observers categorised a part-solid nodule adjacent to the mediastinal pleura in the lower lobe of the left lung for 3–6 months follow-up, and another observer, who did not seem to detect the nodule, categorised another small nodule as a risk dominant nodule and recommended annual screening.

Most cases with discrepancies involved the same risk-dominant nodules, representing 73.5% (172/234) of the instances. Differences in nodule management due to variations in nodule measurements occurred in 38.0% (89/234) of paired cases, while discrepancies arising from differences in nodule type accounted for 35.5% (83/234). Overall, major management discrepancies were found in 47.0% (110/234) of paired cases, and substantial discrepancies were present in 6.0% (14/234) of paired cases.

Conversely, 26.5% (62/234) of the discrepancies arose from selecting alternative risk-dominant nodules, with a substantial discrepancy rate of 6.4% (15/234).

Impact of observer variability on screening performance

For the five observers, the average percentage of positive screening across all cases was 58.0%, ranging from 46.7% to 75.0%. Correspondingly, the percentage of negative screening cases was 42.0%, ranging from 25.0% to 53.3%. On average, disparity occurred in 10 out of 60 cases (16.7%) between observer pairs, with variations ranging from 6 to 15 cases among those who screened positive or negative.

Risk-dominant nodule-based analysis

Subset A (cases exhibiting high inter-observer agreement) and subset B (cases exhibiting low inter-observer agreement) were compared in terms of the risk-dominant nodule types. In subset B group, 23 risk-dominant nodules, of which 20 were part-solid nodules and 3 were non-solid nodules. In subset A group, 37 risk-dominant nodules, of which 20 were part-solid nodules and 17 were non-solid nodules. The low-agreement group exhibited a significantly higher proportion of part-solid nodules than the high-agreement group (P<0.01).


Discussion

The effectiveness of lung cancer screening programs, both in terms of diagnosis and cost-effectiveness, depends on the precise and consistent differentiation between high-risk and low-risk nodules. Effective screening for lung cancer requires accurate and consistent nodule management. The current NCCN (Version 2023) guidelines for follow-up recommendation rely on the visual classification of nodule types and manual measurement of nodule size. Previous studies have reported considerable variability between observers in both reading tasks (19,20), but there is limited research on the differences in follow-up recommendation between observers and a lack of further exploration of the consistency of follow-up recommendation in subsolid nodules. Our study quantified the variability in follow-up recommendation among observers based on the NCCN guidelines for LDCT in subsolid nodules, analysed potential influencing factors, and assessed the impact of divergent management recommendations on test performance.

We found moderate agreement among observers of the NCCN follow-up recommendation classification for LDCT lung cancer screening in a dataset with subsolid nodules, with a Fleiss’ kappa value of 0.437 (95% CI: 0.388–0.487) among five observers and a mean weighted Cohen’s kappa value of 0.603 (95% CI: 0.489–0.716) between pairs of observers. This highlights the value of the classification system in interpreting and managing LDCT screening results and supports the widespread and universal application of the management system. Chest radiologists demonstrated greater classification accuracy than radiology residents. Compared with chest radiologists (Cohen’s kappa =0.655), the inter-observer agreement among radiology residents was lower, with a Cohen’s kappa value of 0.533. Our findings emphasize the variability in diagnostic experience among radiologists regarding patient follow-up management, indicating that interpreting screening CT may require a certain level of experience and specialised training to ensure the accuracy and consistency of the interpretation results.

We compared our results with previous research. van Riel et al. (13) studied inter-observer consistency of NLST data, reporting an average pairwise weighted Cohen’s kappa value of 0.67. The consistency in the study was higher than ours. The possible reason is that van Riel’s study sample included all types of nodules, while our study specifically focused on subsolid nodules. Our findings also confirmed that the proportion of part-solid nodules increased significantly in the low follow-up recommendation agreement group compared to the high follow-up recommendation agreement group. The presence and size of solid components in some subsolid nodules are considered decisive morphological factors for follow-up strategy, which is also the current recommendation of various guidelines. However, the accuracy and reproducibility of manual pulmonary nodule measurements are limited. When the measurement value approaches the decision threshold, the clinical repercussions of this inaccuracy are most pronounced. This will significantly impact reproducibility, thereby affecting the consistency of management strategies.

Variability in follow-up recommendation can involve designating either an identical or distinct risk-dominant nodule. In this study, the former accounted for the majority (73.5%). Among them, 47.0% of paired cases have major discrepancies and 6.0% have substantial discrepancies in nodule management. Although the disagreements caused by different risk-dominant nodules comparing the identical risk-dominant nodules were not high (26.5% vs. 73.5%), the substantial management disagreements were slightly higher (6.4% vs. 6.0%). These discrepancies were caused by nodule measurement, classification and detection. In the evaluation process, we did not impose a requirement for observers to annotate all detected nodules but granted them the discretion to determine which nodules to measure, so that observers would selectively annotate what they deemed as risk-dominant nodules. Hence, disparities in nodule detection contributed to the variability observed among readers. CAD technology based on artificial intelligence could had the potential to enhance the consistency of nodule assessment (21). Another possible cause of this inconsistency was that the threshold interval between the nodule classifications was too small. A part-solid nodule featuring a 5.4 mm solid component was recommended for a follow-up in 6 months (≥6 mm with solid component <6 mm), while another part-solid nodule with a 7.5 mm solid component was categorized as a 3-month follow-up. Variations arising from manual measurements and selection of measurement planes by different observers inevitably result in minor discrepancies, potentially leading to divergent nodule management decisions.

There are some limitations in this study. Firstly, we evaluated the consistency of observer follow-up recommendation in subsolid nodules environment. The dataset was balanced and included nodules treated with four different management strategies. This specific environment may differ from actual clinical settings. Secondly, the sample in the study was relatively limited, expanded sample would be needed to enhance the generalizability of the study results. Finally, the number and margin morphology of nodules on LDCT scans might also affect inter-observer consistency. However, these factors were difficult to control during clinical evaluation.


Conclusions

In subsolid nodules, our findings indicated that the category evaluation of follow-up recommendation based on the NCCN guidelines achieved moderate consistency. Disagreements were mainly caused by measurement and type disagreements of identical risk-dominant nodules. Major discordance and substantial management disagreements occurred in 27.5% and 4.8% of the paired evaluations, and part-solid nodule was a contributor for discrepancies in follow-up recommendation.


Acknowledgments

Funding: This work was supported by the National Key R&D Program of China (No. 2020AAA0109500) and CAMS Innovation Fund for Medical Sciences (No. 2021-I2M-C&T-B-063).


Footnote

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-23-1824/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by ethics committee of Cancer Hospital, Chinese Academy of Medical Sciences (No. NCC2020C-243) and the requirement for individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Henschke CI, McCauley DI, Yankelevitz DF, Naidich DP, McGuinness G, Miettinen OS, Libby DM, Pasmantier MW, Koizumi J, Altorki NK, Smith JP. Early Lung Cancer Action Project: overall design and findings from baseline screening. Lancet 1999;354:99-105. [Crossref] [PubMed]
  2. Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395-409. [Crossref] [PubMed]
  3. de Koning HJ, van der Aalst CM, de Jong PA, Scholten ET, Nackaerts K, Heuvelmans MA, et al. Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial. N Engl J Med 2020;382:503-13. [Crossref] [PubMed]
  4. Moyer VA. U.S. Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2014;160:330-8. [Crossref] [PubMed]
  5. Wood DE, Kazerooni EA, Aberle D, Berman A, Brown LM, Eapen GA, et al. NCCN Guidelines® Insights: Lung Cancer Screening, Version 1.2022. J Natl Compr Canc Netw 2022;20:754-64. [Crossref] [PubMed]
  6. Zhang H, Wang D, Li W, Tian Z, Ma L, Guo J, Wang Y, Sun X, Ma X, Ma L, Zhu L. Artificial intelligence system-based histogram analysis of computed tomography features to predict tumor invasiveness of ground-glass nodules. Quant Imaging Med Surg 2023;13:5783-95. [Crossref] [PubMed]
  7. Henschke CI, Yip R, Smith JP, Wolf AS, Flores RM, Liang M, Salvatore MM, Liu Y, Xu DM, Yankelevitz DFInternational Early Lung Cancer Action Program Investigators. CT Screening for Lung Cancer: Part-Solid Nodules in Baseline and Annual Repeat Rounds. AJR Am J Roentgenol 2016;207:1176-84. [Crossref] [PubMed]
  8. Yankelevitz DF, Yip R, Smith JP, Liang M, Liu Y, Xu DM, Salvatore MM, Wolf AS, Flores RM, Henschke CIInternational Early Lung Cancer Action Program Investigators Group. CT Screening for Lung Cancer: Nonsolid Nodules in Baseline and Annual Repeat Rounds. Radiology 2015;277:555-64. [Crossref] [PubMed]
  9. Henschke CI, Salvatore M, Cham M, Powell CA, DiFabrizio L, Flores R, Kaufman A, Eber C, Yip R, Yankelevitz DFInternational Early Lung Cancer Action Program Investigators. Baseline and annual repeat rounds of screening: implications for optimal regimens of screening. Eur Radiol 2018;28:1085-94. [Crossref] [PubMed]
  10. Gardiner N, Jogai S, Wallis A. The revised lung adenocarcinoma classification-an imaging guide. J Thorac Dis 2014;6:S537-46. [Crossref] [PubMed]
  11. Kakinuma R, Noguchi M, Ashizawa K, Kuriyama K, Maeshima AM, Koizumi N, Kondo T, Matsuguma H, Nitta N, Ohmatsu H, Okami J, Suehisa H, Yamaji T, Kodama K, Mori K, Yamada K, Matsuno Y, Murayama S, Murata K. Natural History of Pulmonary Subsolid Nodules: A Prospective Multicenter Study. J Thorac Oncol 2016;11:1012-28. [Crossref] [PubMed]
  12. Bak SH, Lee HY, Kim JH, Um SW, Kwon OJ, Han J, Kim HK, Kim J, Lee KS, Quantitative CT. Scanning Analysis of Pure Ground-Glass Opacity Nodules Predicts Further CT Scanning Change. Chest 2016;149:180-91. [Crossref] [PubMed]
  13. van Riel SJ, Sánchez CI, Bankier AA, Naidich DP, Verschakelen J, Scholten ET, de Jong PA, Jacobs C, van Rikxoort E, Peters-Bax L, Snoeren M, Prokop M, van Ginneken B, Schaefer-Prokop C. Observer Variability for Classification of Pulmonary Nodules on Low-Dose CT Images and Its Effect on Nodule Management. Radiology 2015;277:863-71. [Crossref] [PubMed]
  14. Ridge CA, Yildirim A, Boiselle PM, Franquet T, Schaefer-Prokop CM, Tack D, Gevenois PA, Bankier AA. Differentiating between Subsolid and Solid Pulmonary Nodules at CT: Inter- and Intraobserver Agreement between Experienced Thoracic Radiologists. Radiology 2016;278:888-96. [Crossref] [PubMed]
  15. van Riel SJ, Jacobs C, Scholten ET, Wittenberg R, Winkler Wille MM, de Hoop B, Sprengers R, Mets OM, Geurts B, Prokop M, Schaefer-Prokop C, van Ginneken B. Observer variability for Lung-RADS categorisation of lung cancer screening CTs: impact on patient management. Eur Radiol 2019;29:924-31. [Crossref] [PubMed]
  16. Fan L, Wang Y, Zhou Y, Li Q, Yang W, Wang S, Shan F, Zhang X, Shi J, Chen W, Liu SY. Lung Cancer Screening with Low-Dose CT: Baseline Screening Results in Shanghai. Acad Radiol 2019;26:1283-91. [Crossref] [PubMed]
  17. Yi CA, Lee KS, Shin MH, Cho YY, Choi YH, Kwon OJ, Shin KE. Low-dose CT screening in an Asian population with diverse risk for lung cancer: A retrospective cohort study. Eur Radiol 2015;25:2335-45. [Crossref] [PubMed]
  18. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.
  19. Singh S, Pinsky P, Fineberg NS, Gierada DS, Garg K, Sun Y, Nath PH. Evaluation of reader variability in the interpretation of follow-up CT scans at lung cancer screening. Radiology 2011;259:263-70. [Crossref] [PubMed]
  20. Gierada DS, Pilgram TK, Ford M, Fagerstrom RM, Church TR, Nath H, Garg K, Strollo DC. Lung cancer: interobserver agreement on interpretation of pulmonary findings at low-dose CT screening. Radiology 2008;246:265-72. [Crossref] [PubMed]
  21. Chen J, Cao R, Jiao S, Dong Y, Wang Z, Zhu H, Luo Q, Zhang L, Wang H, Yin X. Application value of a computer-aided diagnosis and management system for the detection of lung nodules. Quant Imaging Med Surg 2023;13:6929-41. [Crossref] [PubMed]
Cite this article as: Wu Q, Zhou L, Tang W, Huang Y, Wang J, Qi L, Zhang Z, Li H, Chen S, Zhang J, Zhao S, Wu N. Inter-observer consistency on subsolid nodule follow-up recommendation based on National Comprehensive Cancer Network (NCCN) guidelines in low-dose computed tomography (LDCT) lung cancer screening. Quant Imaging Med Surg 2024;14(9):6543-6555. doi: 10.21037/qims-23-1824

Download Citation