Artificial intelligence model outperformed experienced clinicians in differentiating the aetiology of pneumonia on chest computed tomography: a retrospective study

Wenting Jin; Ying Shao; Jue Pan; Meixia Wang; Tongjie Gu; Wei Shen; Xi Ouyang; Zhi Qiao; Dongdong Gu; Zhen Qian; Yaozong Gao; Bijie Hu

doi:10.21037/qims-24-2129

Original Article

Artiﬁcial intelligence model outperformed experienced clinicians in differentiating the aetiology of pneumonia on chest computed tomography: a retrospective study

Wenting Jin^1#, Ying Shao^2#, Jue Pan¹, Meixia Wang³, Tongjie Gu⁴, Wei Shen⁵, Xi Ouyang², Zhi Qiao⁶, Dongdong Gu², Zhen Qian⁶, Yaozong Gao^2*, Bijie Hu^1,7*

¹Department of Infectious Diseases, Zhongshan Hospital, Fudan University, Shanghai, China; ²R&D Department, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China; ³Zhongshan Hospital (Xiamen), Fudan University, Xiamen, China; ⁴Department of Respiratory Medicine, Ningbo No. 2 Hospital, Ningbo, China; ⁵Department of Respiratory Medicine, Cixi No. 3 Hospital, Ningbo, China; ⁶Institute of Intelligent Diagnostics, Beijing United-Imaging Research Institute of Intelligent Imaging, Beijing, China; ⁷Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China

Contributions: (I) Conception and design: B Hu, Y Gao, W Jin, J Pan; (II) Administrative support: B Hu, J Pan, Y Gao; (III) Provision of study materials or patients: W Jin; (IV) Collection and assembly of data: W Jin, Y Shao; (V) Data analysis and interpretation: M Wang, Y Shao, W Jin; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

^*These authors contributed equally to this work.

Correspondence to: Yaozong Gao, PhD. R&D Department, Shanghai United Imaging Intelligence Co., Ltd., No. 701, Yunjin Road, Xuhui District, Shanghai 200232, China. Email: yaozong.gao@uii-ai.com; Bijie Hu, PhD. Department of Infectious Diseases, Zhongshan Hospital, Fudan University, 180 Fenglin Road, Xuhui District, Shanghai 200032, China; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China. Email: hu.bijie@zs-hospital.sh.cn.

Background: Rapid and precise aetiological diagnosis is crucial for managing pneumonia. We aimed to develop and validate deep learning (DL) models for differentiating ten pneumonia aetiologies on chest computed tomography images.

Methods: We enrolled 1,091 pneumonia patients with 1 of 10 definite aetiological diagnoses between October 1^st, 2015 and June 30^th, 2022 in this retrospective study. We trained and validated two DL models: a classic 3D-DenseNet model (DenseNet) and a novel large vision model (LVM). The models were tested on a data from 183 nonoverlapping patients for external dataset. Model performance was assessed using the area under the curve (AUC) of the Top1 diagnosis and the accuracy of the Top1, Top2, and Top3 diagnoses. Comparisons were also performed between the DL models and eight experienced radiologists and pulmonologists.

Results: The LVM combined with non-imaging model (LVM+) had a greater average prediction performance than DenseNet combined with non-imaging model (DenseNet+), radiologists’ results with non-imaging data (radiologists+) and pulmonologists’ results with non-imaging data (pulmonologists+), with Top1 AUCs of 0.872, 0.851, 0.643 and 0.644, respectively. The Top1, Top2, and Top3 accuracies of LVM+ were 0.527, 0.701 and 0.820, respectively, similarly outperforming DenseNet+, radiologists+ and pulmonologists+. The two models performed similarly in the external test sets, with the Top1 AUCs of 0.743 for DenseNet and 0.775 for LVM. The classification-related confusion matrix of LVM/DenseNet with or without non-imaging model showed a significant advantage in identifying pulmonary non-tuberculous mycobacterium pulmonary disease (PNTM), pulmonary tuberculosis (PTB) and Pneumocystis jirovecii pneumonia (PJP).

Conclusions: This study presents a comprehensive classification closely aligned with pneumonia diagnosis in realistic clinical settings. We expect this method to be applied clinically to foster novel approaches to improve the accuracy in diagnosing pneumonia.

Keywords: Pneumonia; multi-pathogen classification; artificial intelligence (AI); deep learning model (DL model); chest computed tomography (chest CT)

Submitted Oct 03, 2024. Accepted for publication Nov 04, 2025. Published online Dec 31, 2025.

doi: 10.21037/qims-24-2129

Introduction

Pneumonia is the most prevalent infection and a common infection-related cause of death; the number of deaths attributed to pneumonia has increased since 1990, reaching more than 2.5 million deaths globally in all age groups in 2019 (1). Rapid, precise aetiological diagnosis and timely targeted antimicrobial therapy are crucial for reducing unnecessary antibiotic use, improving patient prognosis and reducing mortality (2). However, treatment failure was often linked to the challenge of identifying or speculating about the responsible pathogens (3).

Chest computed tomography (CT), an important technique for diagnosing pneumonia, can show pneumonia lesions with great detail (4). Different pathogens produce infection lesions with different characteristics, including lesion shape, location, number, size and speed of change. Some signs are pathognomonic for a specific disease (4-6). However, identifying the aetiology through chest CT is highly challenging for inexperienced clinicians.

Artificial intelligence (AI) facilitates management by enabling the rapid analysis of extensive images and the identification of intricate patterns beyond human discernment (7,8). However, most existing studies on the application of AI/deep learning (DL) models in the field of pneumonia have focused on identifying coronavirus disease 2019 (COVID-19) and predicting its severity (9-15). Other studies have attempted to discriminate up to three or four types of pneumonia, mainly including bacterial pneumonia (BP), fungal pneumonia (FP) and viral pneumonia (VP) (16-20), but studies evaluating the efficacy of AI/DL models in enhancing the prediction of the aetiology of pneumonia are rare. Therefore, we aimed to establish two DL models to discriminate pneumonias of ten aetiologies based on chest CT images with/without non-imaging data. We then sought to evaluate and compare their performance with that of experienced radiologists and pulmonologists in terms of the first-, second-, and third-most likely diagnoses. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2129/rc).

Methods

Study design

Our retrospective observational study involved patients at Zhongshan Hospital, Fudan University from October 1^st, 2015 and June 30^th, 2022. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. It was registered on ChiCTR (Registration number, ChiCTR2000035669) and approved by the Ethics Committee of Zhongshan Hospital (No. B2020-384R). Informed consent was taken from all the patients. Figure 1A shows an overview of the participant selection procedure and study design. We adopted two methods: conventional convolutional neural network (CNN) named DenseNet (21) and a pretrained large vision model (LVM).

Figure 1 Study design and workflow of patient selection (A) and the data distribution of 10 different classifications (B). AP, atypical pneumonia; BP, bacterial pneumonia; CT, computed tomography; GBT, Gradient Boosting Tree; LC, lung cancer; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis.

Classification of pneumonia

We selected 8 pneumonias with different proven aetiologies, namely, BP, atypical pneumonia (AP), pulmonary nocardiosis (PN), pulmonary tuberculosis (PTB), non-tuberculous mycobacterium pulmonary disease (PNTM), pulmonary cryptococcosis (PC), pulmonary aspergillosis (PA), and Pneumocystis jirovecii pneumonia (PJP), which were diagnosed according to the gold standard for different pathogens. We also included lung cancer (LC) and other pneumonia mimic (PM) (diagnostic criteria listed in Appendix 1).

Chest CT scans

Chest CT images of the patients were collected according to a standard chest imaging protocol using three different scanners (Aquilion CX, Canon, Tokyo, Japan; uCT960 Research, United Imaging, Shanghai, China; United Imaging 960, United Imaging, Shanghai, China). The main acquisition parameters were as follows: tube voltage =90–140 kV; automatic tube current modulation; pitch =1; and matrix =512×512. All acquisitions were reconstructed with high-resolution thorax kernels. Multiclass classification was conducted using reconstructed lung window sequences with a thickness of 5 mm.

Non-imaging data

Non-imaging data were collected from only 461 of the 1,091 patients as part of routine clinical practice. The Investigator Initiated Trial-Electronic Data Capture System was used to collect data. All categorical information was quantified, and all continuous data was normalized to the range of [0, 1] according to the minimum and maximum values in the training data. The detailed preprocessing methods used are listed in Table S1. The non-imaging information was then paired with the corresponding imaging data.

Cross-validation

To ensure consistency in data partitioning for both the imaging and non-imaging models during training, we performed a strict stratified fivefold cross-validation split on the 461 patients. The remaining 630 patients with only imaging data available were added to the imaging training set of each fold. An illustration of the respiratory pathogen distribution in the training, validation and testing partitions is shown in Figure 1B.

Development of the DenseNet classification model for imaging data

In order to concentrate on effective chest CT regions, the left and right lungs were automatically segmented by the VB-Net (22), and the lung field was extracted based on their bounding box. The image block was then resized to 192×192×192 pixels to ensure consistency in lung region sizes across images.

The 3D-DenseNet network was used for image classification with a dual-channel input. Each channel contained lung field image crops after pre-processing, with one normalized to the lung window [L/W: −400 HU/1,500 HU], and the other to the standard window [L/W: 40 HU/350 HU]. The labelled pathogen categories were used as the training labels. The model was developed using PyTorch framework, utilizing focal loss (23) as the optimization function. Adaptive Moment Estimation (ADAM) optimizer with an initial learning rate of 0.001 was employed. Training started from scratch for 1,000 epochs, evaluating accuracy every 20 epochs and selecting the epoch with the highest accuracy as final. Data augmentation strategies like flipping, rotation and scaling were applied during training for network robustness (see Figure 2A).

Figure 2 DenseNet network model and LVM structure. LVM, large vision model; ReLU, rectified linear unit.

Development and application of the LVM for imaging data

Inspired by the success of large transformer models in language tasks (24), a new advancement called Swin Transformer (25), has emerged in image analysis. It uses a hierarchical structure with shifted windows. In medical imaging, a variant called Swin UNETR (26) replaces the encoder part of U-Net with Swin Transformer, which boosts performance by combing the benefits of both models.

Firstly, we introduce a pretrained LVM model (27) with Swin Transformer as the encoder. This model was trained on a large 3D CT segmentation dataset, comprising 36,419 volumetric scans with 64,674 expert-annotated masks spanning 83 distinct anatomical and pathological segmentation tasks. The 1,091 CT scans in this study were excluded from the LVM’s pretraining set. Annotations were meticulously curated from both in-house and public sources: 33,913 anonymized clinical CT volumes were collected and an additional 2,506 volumes were aggregated from ten publicly available benchmarks. All data were rigorously annotated by radiologists or sourced from established challenges with standardized labelling protocols. Notably, the pre-training curriculum includes several lung-related tasks highly relevant to pneumonia analysis—specifically lung segmentation, lung nodule segmentation, and lung tumour segmentation—which provide the model with robust contextual understanding of pulmonary anatomy, and subtle lesion characteristics. These tasks enhance the model’s capacity to identify and localize abnormal regions within the lung parenchyma, thereby establishing a strong semantic and spatial foundation that is directly transferable and beneficial for downstream pneumonia classification, where accurate localization and characterization of infected lung areas are critical (28-36). This pre-training strategy enables the model to leverage cross-task knowledge, facilitating superior generalization even with limited task-specific data.

Here, we directly use the Swin Transformer encoder from this pretrained LVM to construct a classification network. During training, we mainly keep the Swin Transformer encoder parameters fixed, fine-tuning only a small subset to maintain the model’s excellent generalization. We also incorporate multiscale feature maps from different Swin Transformer blocks to capture comprehensive information. Finally, we aggregate this information using global average pooling for the final classification. The training process for the LVM-based classification network follows the same settings and environment as the vanilla image DL model discussed earlier (refer to Figure 2B).

Development of the non-imaging classification model and combination with the two imaging models

The non-imaging classification model was built on the Gradient Boosting Tree (37) (GBT) algorithm, using the “GradientBoostingClassifier” from sklearn (version 1.3.2) with 100 trees and a max depth of 2. The model took quantized and normalized non-imaging data as input, trained with labelled pathogen categories.

The ensemble probability is defined as the product of P_imaging and P_non-imaging, i.e., P_imaging × P_non-imaging. The final output probability is obtained by normalizing each ensemble probability—dividing it by the sum of all ensemble probabilities. The formula is as follows:

$P_{i} = P_{i m a g i n g, i} \times P_{n o n - i m a g i n g, i} / \sum_{j} (P_{i m a g i n g, j} \times P_{n o n - i m a g i n g, j})$ [1]

Where P_i is the normalized probability for the i-th class, P_imaging,i is the imaging probability for the i-th class, P_{non-imaging,i} is the non-imaging probability for the i-th class. The summation in the denominator runs over all possible classes j, ensuring that the sum of all normalized probabilities P_i equals 1.

Predictions from the imaging and non-imaging models were combined using an ensemble strategy to generate the final predictions for the 10 types. Ensemble methods combine multiple machine learning techniques to reduce variance and bias or improve predictions. The probabilities from both models were summed, normalized, and used to make Top1, Top2, and Top3 predictions for further analysis. The probabilities for the 10 classes output by the imaging and non-imaging classification models were integrated using an ensemble strategy to generate the final predictions for the 10 types. To scientifically define “uncertain” predictions, we employ a data-driven strategy on internal validation set to automatically determine the optimal prediction entropy threshold. This method balances classification performance and practical application requirements, seen in Appendix 1 and Figure S1.

Performance comparisons with radiologists and pulmonologists

Four different experienced radiologists and four experienced pulmonologists who had more than 10 years of clinical experience at 8 outside hospitals were recruited. The clinicians independently determined the first, second and third likely predictions among the 10 classes using our specially developed UII labelling software, as shown in Figure S2. They first determined and labelled the diagnoses only by reading the CT images of 461 patients, then considered the CT images alongside the non-imaging data of the patients after a 1–2-week washout period. The comparison indicators were the area under the curve (AUC) of the Top1 diagnosis and the accuracy of the Top1, Top2, and Top3 diagnoses. A correct Top1 diagnosis was defined if the first diagnosis was correct. A correct Top2 diagnosis was defined if either the first or the second diagnosis was correct. A correct Top3 diagnosis was defined if any of the first, second or third diagnosis was correct.

Performance evaluation and statistical analyses

Receiver operating characteristic (ROC) curves were generated for each category based on the corresponding probabilities, and the AUC was subsequently calculated. In addition, 95% confidence intervals (95% CIs) were determined using the adjusted Wald method. The top-k accuracy was defined such that if the first k categories with the highest confidence predicted by the model included the true category, then we considered the prediction of the model to be correct. The formulas for calculating the top-k accuracy were as follows:

$A c c u r a c y_{T o p - K} = \frac{1}{n} \sum_{i = 0}^{n} | y_{i} \cap {\hat{y}}_{i - t o p - k} |$ [2]

y_i = real labels; ${\hat{y}}_{i - t o p - k}$ = predicted top-k category; n = number of samples

Statistical analysis was performed with Python software version 3.7.0 (Python Software Foundation, Wilmington, DE, USA). A two-sided P value <0.05 was considered to indicate statistical significance. One-way analysis of variance (ANOVA) and the Pearson χ² test were used to compare continuous and categorical variables, respectively, among the groups. After conducting a comparison among the ten aetiologies, post-hoc analysis with Bonferroni correction was performed to examine differences between each pair of groups.

Results

Demographic characteristics

The data of patients with a confirmed pneumonia aetiology (n=1,091) were used for DL model training; the 1,091 patients included an illustration of the respiratory pathogen distribution were shown in Figure 1B. Among the 461 patients who also had available clinical data. The baseline characteristics according to aetiology are presented in Table 1. For the external cohort, 183 patients were enrolled, comprising 21 BP, 29 AP, 12 PN, 19 PTB, 22 PNTM, 18 PC, 15 PA, 8 PJP, 22 LC, and 17 PM patients.

Table 1

Baseline characteristics of all participants

Characteristics	BP	AP	PN	PTB	PNTM	PC	PA	PJP	LC	PM	P value
461 cases with chest CT images and non-imaging data
N	93	21	27	38	32	37	45	48	43	77	–
Age (years)	62 [49, 70.5]	45 [32, 66]	62 [52, 72]	46 [34.75, 65.25]	68.5 [57.5, 73]	53 [41, 66]	57 [45.5, 66.5]	54 [41.5, 64.75]	67 [59, 76]	60 [44, 70]	<0.0001
Female	33 (35.5)	6 (28.6)	13 (48.1)	10 (26.3)	25 (78.1)	11 (29.7)	18 (40.0)	19 (39.6)	22 (51.2)	39 (50.6)	0.0003
Blood
WBC (×10⁹/L)	8.6±4.1	7.3±2.9	10.4±4.9	6.7±2.8	5.8±1.8	6.6±2.1	9.2±3.5	7.5±5.4	7.7±3.9	8.0±4.1	<0.0001
Neutrophil (×10⁹/L)	6.4±3.8	5.6±3.6	8.5±4.8	4.6±2.4	3.9±1.6	4.4±1.9	6.9±3.5	6.0±4.2	5.6±3.7	6.5±7.0	0.0007
C-reactive protein (mg/L)	72.7±87.1	160.2±148.5	101.2±165.0	24.7±28.2	23.2±45.2	7.0±14.0	37.2±63.1	71.4±79.6	28.3±61.7	33.5±40.5	<0.0001
Erythrocyte sedimentation rate (mm/h)	50.6±32.7	54.1±25.4	63.3±29.9	39.4±28.0	36.8±33.0	22.4±20.8	49.0±39.6	40.2±28.7	32.0±23.4	50.4±37.2	<0.0001
Procalcitonin (ng/mL)	0.24±0.62	1.28±2.45	0.85±2.20	0.10±0.24	0.05±0.08	0.03±0.04	0.07±0.09	0.36±0.45	0.14±0.32	0.11±0.27	<0.0001
Carcinoembryonic antigen (ng/mL)	2.5±1.8	2.8±2.8	3.9±4.6	3.1±3.8	2.2±1.9	2.2±1.5	4.1±8.9	6.3±5.7	22.7±68.9	3.5±3.5	<0.0001
Positive T-SPOT.TB	18 (12.0)	0	4 (10.5)	35 (94.6)	2 (3.6)	20 (38.5)	6 (11.1)	2 (2.9)	15 (25.4)	11 (8.1)	<0.0001
Symptoms
Fever	64 (69.6)	20 (95.2)	20 (76.9)	19 (50.0)	19 (59.4)	22 (48.9)	6 (16.2)	39 (81.3)	4 (9.3)	38 (49.4)	<0.0001
Cough	70 (75.3)	18 (85.7)	30 (58.1)	34 (77.3)	15 (40.5)	22 (45.8)	52 (67.5)	24 (92.3)	26 (81.3)	22 (57.9)	<0.0001
Sputum	55 (75.3)	14 (85.7)	23 (92.3)	17 (57.9)	23 (81.3)	31 (77.3)	12 (40.5)	17 (45.8)	17 (58.1)	35 (67.5)	<0.0001
Tachypnea	15 (16.1)	5 (23.8)	7 (28.0)	7 (18.4)	2 (6.5)	8 (17.8)	1 (2.7)	17 (35.4)	10 (23.3)	28 (36.8)	0.0003
Dyspnea	2 (2.2)	0	6 (22.2)	3 (7.9)	0	2 (4.4)	0	7 (14.6)	2 (4.8)	6 (7.8)	0.0010
Chest pain	10 (10.9)	2 (9.5)	5(19.2)	5 13.2)	2 (6.3)	3 (7.0)	3 (8.6)	2 (4.2)	3 (7.0)	5 (6.5)	0.6058
Bloody phlegm	10 (10.8)	0	3 (11.1)	2 (5.4)	6 (18.8)	13 (28.9)	1 (2.7)	0	4 (9.3)	1 (1.3)	<0.0001
Hemoptysis	5 (5.4)	0	5 (20.0)	2 (5.3)	8 (25.8)	13 (30.2)	1 (2.7)	0	2 (4.7)	0	<0.0001
630 cases with only chest CT images
N	0	13	0	135	250	136	1	84	11	0	–
Age (years)	–	56 [35, 65.5]	–	53 [31, 63]	64 [52.75, 70]	56 [44.25, 65]	–	58.5 [46, 69]	58 [56, 68]	–	–
Female	–	5 (38.5)	–	87 (64.4)	109 (43.6)	83 (61.0)	1 (100.0)	54 (64.3)	7 (64.6)	–	–

Data are presented as median [interquartile range], n (%), or mean ± standard deviation unless otherwise specified. AP, atypical pneumonia; BP, bacterial pneumonia; CT, computed tomography; LC, lung cancer; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimic; PN, pulmonary nocardiosis; PTB, pulmonary tuberculosis; PNTM, non-tuberculous mycobacterium pulmonary disease; WBC, white blood cell.

Performance of the DenseNet and LVM with and without the non-imaging classifier

The ROC curves for the Top1 diagnoses are shown in Figure 3. The Top1 AUCs were better for the models incorporating the non-imaging model (DenseNet+ and LVM+) than for those that did not (DenseNet and LVM): 0.785 vs. 0.739 and 0.872 vs. 0.851, respectively. A similar trend was observed in the accuracy, as shown in Figure 4 and Table S2. The accuracy of the Top1 and Top2 diagnosis for both LVM and LVM+ were significantly greater than that for DenseNet and DenseNet+. The accuracy of the Top3 diagnosis for LVM+ reached as high as 0.820, greater than that of DenseNet+ (0.798), although the difference was not significant. From the perspective of the AUC and accuracy, the LVM+ showed the highest average prediction performance. Figures 5,6 shows calibration cures of DenseNet+ and LVM+ across all ten types. Overall, the LVM+ demonstrated notably superior overall performance compared to the DenseNet+. Its average bias-corrected expected calibration error (ECE) (0.044) was lower than that of the DenseNet+ (0.049). Concurrently, the average Brier score of LVM+ (0.068) was also lower than that of the DenseNet+ model (0.075).

Figure 3 The area under the ROC and the AUC for the two deep learning models, the performance of four different radiologists and four different pulmonologists. Note: + means with non-imaging data. AP, atypical pneumonia; AUC, area under the curve; BP, bacterial pneumonia; LVM, large vision model; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis; Pu, pulmonologists; Ra, radiologists; ROC, receiver operating characteristic curve.

Figure 4 The accuracy of Top1, Top2 and Top3 diagnoses of two deep learning models, radiologists and pulmonologists with or without non-imaging data. Data in parentheses represent 95% confidence interval. Note: + means with non-imaging data. LVM, large vision model.

Figure 5 Calibration curves of the DenseNet+ model across all ten classes. For each class, the ideal calibration is shown as a dashed black line. The apparent and bias-corrected calibration curves are represented by red square and blue circle lines, respectively. The shaded blue area denotes the 95% confidence interval of the bias-corrected estimate. The expected calibration error and Brier score are reported for each class. Note: + means with non-imaging data. AP, atypical pneumonia; BP, bacterial pneumonia; CI, confidence interval; ECE, expected calibration error; LC, lung cancer; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis.

Figure 6 Calibration curves of the LVM+ model across all ten classes. For each class, the ideal calibration is shown as a dashed black line. The apparent and bias-corrected calibration curves are represented by red square and blue circle lines, respectively. The shaded blue area denotes the 95% confidence interval of the bias-corrected estimate. The Expected Calibration Error and Brier score are reported for each class. Note: + means with non-imaging data. AP, atypical pneumonia; BP, bacterial pneumonia; CI, confidence interval; ECE, expected calibration error; LC, lung cancer; LVM, large vision model; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis.

Comparison of the performance of the DenseNet and LVM with that of radiologists/pulmonologists

The Top1 AUCs for the LVM (0.785) and LVM+ (0.872) models were greater than the averages of the experienced radiologists (0.620/0.643, with/without non-imaging data) and pulmonologists (0.627/0.644, with/without non-imaging data). DenseNet/DenseNet+ also outperformed the radiologists and pulmonologists, but the advantages were not as obvious as those of LVM/LVM+ (shown in Figure 3). The LVM/LVM+ models also demonstrated strong advantages in terms of the accuracies of the Top1, Top2 and Top3 diagnoses; in particular, inclusion of the non-imaging classifiers further improved the advantages of DL models over clinicians, as shown in Figure 4.

Performance for the ten different pneumonia aetiologies

Figure 7 shows the differences among the ten different classifications. The Top1 AUCs in identifying PJP, PNTM and AP were greater for the imaging models; 0.875, 0.838, and 0.854 for DenseNet; and 0.929, 0.903, and 0.819 for LVM, respectively. Still better diagnostic performance was achieved when the models were combined with the non-imaging classifier; notably, the Top1 AUCs in identifying PNTM and PJP were greater than those of the imaging-only models (0.984 and 0.916 for DenseNet+ and 0.995 and 0.946 for LVM+, respectively). Surprisingly, the AUC in identifying PTB (0.946 for DenseNet+ and 0.946 for LVM+) surpassed that in AP. The classification-related confusion matrices of LVM and LVM+ helped visualize model performance for the ten different aetiologies as shown in Figure 8. LVM+ could correctly predict more than 90% of PNTM cases in terms of the Top1 diagnosis which surpassed both pulmonologists+ and radiologists+. LVM alone showed good performance in identifying PJP, with no obvious improvement when LVM+ was used. The non-imaging classifier also improved the ability of the LVM to identify BP and PM. Similar trends were observed in DenseNet and DenseNet+. Better diagnostic performance in identifying PM and LC were achieved in two DL models than those in both pulmonologists and radiologists.

Figure 7 The area under the ROC and the AUC of two models in identifying 10 different classifications. Note: + means with non-imaging data. AP, atypical pneumonia; AUC, area under the curve; BP, bacterial pneumonia; CI, confidence interval; LC, lung cancer; LVM, large vision model; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis; ROC, receiver operating characteristic curve.

Figure 8 The classification-related confusion matrix of the Top1, Top2 and Top3 diagnosis of 10 different classifications with DenseNet (A-F), large vision model with (G-L), four pulmonologists (M-R) and four radiologists (S-X) and without non-imaging classifier. Note: + means with non-imaging data. AP, atypical pneumonia; BP, bacterial pneumonia; LC, lung cancer; LVM, large vision model; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis.

Performance of the DenseNet model and LVM in the external test cohort

The performance in the external test cohort was comparable to that in the external cohort summarized in Figure 9. The Top1 AUCs of DenseNet and LVM that did not incorporate the non-imaging model were 0.743 and 0.775, respectively, which were close to those of the internal cohort (Figure 9A,9B). Compared with DenseNet, LVM demonstrated significantly greater accuracies for the Top1 and Top2 diagnoses. The accuracies of the Top1, Top2 and Top3 diagnoses for both models were not inferior to the internal results. The classification-related confusion matrix is displayed in Figure 9C-9H.

Figure 9 The area under the ROC curve (A) and the accuracy of Top1, Top2 and Top3 diagnoses of two deep learning models without non-imaging data (B). Data in parentheses represent 95% confidence interval. The classification-related confusion matrix of the Top1, Top2 and Top3 diagnosis of 10 different classifications with large vision model with and without non-image classifier (C-H). Note: + means with non-imaging classifier. AP, atypical pneumonia; AUC, area under the curve; BP, bacterial pneumonia; CI, confidence interval; LC, lung cancer; LVM, large vision model; PA, pulmonary aspergillosis; PC, pulmonary cryptococcosis; PJP, Pneumocystis jirovecii pneumonia; PM, pneumonia mimics; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis; ROC, receiver operating characteristic.

Discussion

Pneumonia is a common cause of morbidity and mortality in adults and imposes a heavy burden on global public health (1). Despite the increased rate of aetiological confirmation resulting from considerable improvements in microbial molecular diagnostic technology in recent years, the clinical classification of various pneumonias is still challenging. Chest CT is one of the most important imaging modalities for differential diagnosis (38). Nonetheless, the imaging characteristics of different types of pneumonia may overlap, while the same type of pneumonia may manifest with different appearances. We sought to apply advancements in AI to bridge this gap. To our knowledge, this study classifies the greatest number of pneumonia aetiologies and best matches the diagnostic and treatment scenarios encountered in clinical practice.

We applied two DL models based on chest CT images with/without basic clinical data to identify the 3 most likely diagnoses. Our results indicated that both DL models outperformed pulmonologists and radiologists. Incorporation of clinical data only slightly increased the diagnostic accuracy of the clinicians, but that of the two DL models was significantly improved. The Top1 AUC of LVM+ reached 0.872, and the accuracy of the Top3 diagnosis reached 0.820. This means that there is more than an 80% possibility of predicting the pathogen accurately in the first 3 diagnoses output by the model, allowing greater confidence in providing a targeted diagnostic strategy rather than needing to cast a wide net. Moreover, our results supported the selection of the LVM+ due to its lower average Brier score and superior calibration across the majority of classes. This model is therefore capable of providing more accurate and reliable probabilistic predictions. Undoubtedly, the effectiveness of the models differs depending on the aetiology of the pneumonia.

The results of previous articles on the classification of pneumonia have shown that the sensitivity and specificity of AI/DL for VP are satisfactory, while those of BP or FP are relatively unsatisfactory (12,16,17,19). Our results are similar to those of a previous study on identifying BP. However, neither the average diagnostic precision of the DL model nor that of clinicians for PN was satisfactory. The low percentage of positive cultures and the non-specificity of the CT images of PN results in low clinical awareness among clinicians. Unsurprisingly, the performance of the DL model was not superior to that of either the radiologists or pulmonologists, likely because of small amount of PN training set. Although the DL models showed advantages in identifying AP via imaging data alone, these advantages were not obvious after the non-imaging classifier was incorporated, probably because most cases were of Chlamydia psittaci pneumonia (with demonstrates a strong inflammatory response and is impossible to distinguish from BP).

PTB is a common community-acquired pneumonia globally, especially in China. A meta-analysis (39) revealed that only six studies developed AI/DL-based models for PTB using chest CT instead of chest X-ray, with pooled sensitivities ranging from 0.750–0.993. Shi et al. (19) added PTB to their 4-class pneumonia DL model and reported a sensitivity and specificity of 0.849 and 0.951, respectively. PNTM has a series of imaging features similar to those of PTB, and the two diseases are easily misdiagnosed because of the presence of acid-fast bacilli in sputum smears; however, the treatment recommendations for the two are completely different. Although the incidence of PNTM is not as high as that of PTB (40,41), the rates of misdiagnosis and missed diagnosis should be noted. Wang et al. (42) showed the efficacy of 3D-ResNet as a rapid auxiliary diagnostic tool for PNTM and PTB, with AUCs ranging from 0.68–0.95. Surprisingly, our results showed that the identification performance of LVM/LVM+ was excellent for PNTM. This may be due to the fact that PNTM has the greatest number of training images. It is consistent with the knowledge that a larger sample size results in better performance for DL model. Moreover, PNTM demonstrates a relatively slow progression, minimal systemic toxicity, absent or mild inflammatory marker elevation, and negative T-SPOT.TB assays. The incorporation of this comprehensive clinical profile into the non-imaging classifier yielded a significant enhancement in its discriminatory function. The advantage of DenseNet+/LVM+ in diagnosing PTB may be due to the use of a TB-specific test, the T-SPOT.TB, which was initially included to improve the identification of the radiologically similar PTB and PNTM.

Failure to consider FP early can lead to antibacterial agent abuse, and patient deterioration. Different types of FP have different imaging characteristics. Because researches did not subdivide the fungal species among the pneumonias, the inclusion criteria did not follow the gold standard for each species, and the relatively small size of the FP cohort for AI/DL may be the reason for the poor performance. We attempted to separate the most common FPs, including PC, PA and PJP. Our results showed a high AUC and high accuracy in identifying PJP with its specific characteristics, but the DL model was still advantageous over the clinicians. For PJP, nucleic acid detection of respiratory tract specimens is preferable, while the G test and silver hexamine staining may be considered alternatives. The DL model did not show an advantage over clinicians in detecting PC despite having the third most training images, which may be due to the overlap in CT features between PC and LC, PA, BP. Cryptococcus capsular antigen in peripheral blood can detect PC in ~70% of patients, which was encouraged to be tested if the DL model suggests it as a possibility. The indicator was not included in our DL models mainly because its high specificity could lead to overfitting. Our results did not demonstrate a superior performance over clinicians in identifying PA. This lack of advantage may be attributed to the overlapping imaging features between chronic PA and PTB, NTM, coupled with a limited training dataset.

Researchers have also developed models to distinguish LC from BP, FP and VP. In fact, it is more difficult to distinguish other non-infectious diseases mimicking pneumonia, as clinicians are typically unfamiliar with the manifestations of these types. Surprisingly, our model did not have a significant advantage in identifying LC but showed a surprising advantage in identifying PM over both radiologists and pulmonologists. These results indicate the need for the differential diagnosis of PM.

However, some limitations of this study need to be mentioned. First, this was a retrospective cohort study with a small external validation cohort. Case selection for each class was mainly based on the proportion of confirmed cases causing an imbalanced distribution. A larger multicentre trial with an even distribution in the types needs to be conducted to enhance the generalizability. Second, VP was not included in this study mainly because it can generally be more confidently differentiated from other types and many AI/DL models for discriminating between VP and COVID-19 pneumonia have been developed with satisfactory performance. Including VP classification remains essential for future optimization efforts, as it ensures clinical relevance by representing the most prevalent form of lung infection. To improve the model’s ability to handle pneumonia types beyond these ten types, the following strategies can be adopted in our future work: (I) add an “unknown/other” category; (II) introduce an uncertainty-based rejection mechanism; (III) support continuous incremental learning. Third, our DL model included only images from one chest CT scan taken before diagnosis instead of consecutive CT scans, which may weaken the accuracy in determining the pathogen. Movere, non-single pathogen infections were excluded, which may cause interference in practical scenarios involving co-infections. This is also a key focus of our future optimization efforts. Fourth, the performance of DL models varies among different types and the analysis also identified specific areas for improvement. The LVM+ exhibited overconfidence (excessively high predicted probabilities) for the PA. Furthermore, both models performed suboptimal on the BP and PM, suggesting that these categories present the greatest challenge and should be the focus for future work. Subsequent efforts will involve targeted data collection for these classes and algorithmic refinements to continuously enhance model performance.

Conclusions

Our results demonstrated that the DL models outperformed experienced clinicians, and this advantage is even more pronounced with the LVM/LVM+. The LVM+ showed better calibration across most diagnostic categories but also achieved higher overall prediction accuracy, particularly for mycobacterium infections (including PNTM and PTB), which were the categories where the model’s advantage was most pronounced. In the future, we hope our post optimized model could be applied in different facilities by using software connected to a picture archiving and communication system in radiology departments, providing Top3-biased diagnoses for radiologists prior to reporting the final diagnosis, and providing valuable information for clinicians to more confidently submit patients for microbiological examination and recommend empirical medication (Figure 10).

Figure 10 Model diagram of AI involvement in pneumonia diagnosis. AFB, acid-fast bacilli; AI, artificial intelligence; CT, computed tomography; ER, emergency room; HIS, hospital information system; LIS, laboratory information system; LVM, large vision model; MTB, Mycobacterium tuberculosis; OPD, outpatient department; PACS, picture archiving and communication system; PN, pulmonary nocardiosis; PNTM, non-tuberculous mycobacterium pulmonary disease; PTB, pulmonary tuberculosis; RIF, rifampicin.

Acknowledgments

We would like to thank all the physicians from different hospitals (Dr. Xingwei Zhang, Qingle Wang, Qiong Li, Haiyan Ge, Jinjiang Shen, Haidong Huang, Jun Xu and Yujuan Cui) who participated in labelling different types of pneumonia. An abstract of this research was previously presented as an oral presentation at the 28^th Conference of Asia Pacific Society of Respirology (APSR) and as a poster presentation at IDWeek 2024.

Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2129/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2129/dss

Funding: The study was supported by the Clinical Research Plan of Shanghai Hospital Development Center (SHDC) (grant No. SHDC2020CR2031B); Shanghai Hospital Development Center Foundation (grant No. SHDC22024315); and Found of Institute of Hospital Infection and Control, Fudan University (grant Nos. 2024XKPT37 and 2025XKPT66).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2129/coif). Y.S., X.O., and D.G. are from Shanghai United Imaging Intelligence Co., Ltd. Zhi Qiao and Zhen Qian are from Beijing United-Imaging Research Institute of Intelligent Imaging. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committee of Zhongshan Hospital (No. B2020-384R). Informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Age-sex differences in the global burden of lower respiratory infections and risk factors, 1990-2019: results from the Global Burden of Disease Study 2019. Lancet Infect Dis 2022;22:1626-47. [Crossref] [PubMed]
Qu J, Zhang J, Chen Y, Huang Y, Xie Y, Zhou M, et al. Aetiology of severe community acquired pneumonia in adults identified by combined detection methods: a multi-centre prospective study in China. Emerg Microbes Infect 2022;11:556-66. [Crossref] [PubMed]
Menendez R, Torres A. Treatment failure in community-acquired pneumonia. Chest 2007;132:1348-55. [Crossref] [PubMed]
Animesh R, Vyas S, Chest CT. Scan Signs: A Few Noteworthy Additions. Chest 2018;153:1516-7. [Crossref] [PubMed]
Jin WT, Ma YY, Wang MR, Yao YM, Huang YN, Zhang Y, Wang QQ, Li B, Mu Q, Su Y, Cai SS, Li N, Luo Y, Pan J, Hu BJ. Evaluation and identification of pathogens of pulmonary infection based on chest CT imaging features. Chin J Clin Med 2020;27:543-8.
Kottlors J, Fervers P, Geißen S, Gertz RJ, Bremm J, Rinneburger M, Weisthoff M, Shahzad R, Maintz D, Persigehl T. Morphological appearance of the B.1.1.7 mutation of the novel coronavirus 2 (SARS-CoV-2) in chest CT. Quant Imaging Med Surg 2023;13:1058-70. [Crossref] [PubMed]
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10. [Crossref] [PubMed]
Theodosiou AA, Read RC. Artificial intelligence, machine learning and deep learning: Potential resources for the infection clinician. J Infect 2023;87:287-94. [Crossref] [PubMed]
Meng F, Kottlors J, Shahzad R, Liu H, Fervers P, Jin Y, et al. AI support for accurate and fast radiological diagnosis of COVID-19: an international multicenter, multivendor CT study. Eur Radiol 2023;33:4280-91. [Crossref] [PubMed]
Nguyen D, Kay F, Tan J, Yan Y, Ng YS, Iyengar P, Peshock R, Jiang S. Deep Learning-Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability. Front Artif Intell 2021;4:694875. [Crossref] [PubMed]
Mahin M, Tonmoy S, Islam R, Tazin T, Monirujjaman Khan M, Bourouis S. Classification of COVID-19 and Pneumonia Using Deep Transfer Learning. J Healthc Eng 2021;2021:3514821. [Crossref] [PubMed]
Lassau N, Ammari S, Chouzenoux E, Gortais H, Herent P, Devilder M, et al. Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients. Nat Commun 2021;12:634. [Crossref] [PubMed]
Wang G, Liu X, Shen J, Wang C, Li Z, Ye L, et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat Biomed Eng 2021;5:509-21. [Crossref] [PubMed]
Machnicki S, Patel D, Singh A, Talwar A, Mina B, Oks M, Makkar P, Naidich D, Mehta A, Hill NS, Brown KK, Raoof S. The Usefulness of Chest CT Imaging in Patients With Suspected or Diagnosed COVID-19: A Review of Literature. Chest 2021;160:652-70. [Crossref] [PubMed]
Javaheri T, Homayounfar M, Amoozgar Z, Reiazi R, Homayounieh F, Abbas E, et al. CovidCTNet: an open-source deep learning approach to diagnose covid-19 using small cohort of CT images. NPJ Digit Med 2021;4:29. [Crossref] [PubMed]
Wang F, Li X, Wen R, Luo H, Liu D, Qi S, Jing Y, Wang P, Deng G, Huang C, Du T, Wang L, Liang H, Wang J, Liu C. Pneumonia-Plus: a deep learning model for the classification of bacterial, fungal, and viral pneumonia based on CT tomography. Eur Radiol 2023;33:8869-78. [Crossref] [PubMed]
Zhang YH, Hu XF, Ma JC, Wang XQ, Luo HR, Wu ZF, Zhang S, Shi DJ, Yu YZ, Qiu XM, Zeng WB, Chen W, Wang J. Clinical Applicable AI System Based on Deep Learning Algorithm for Differentiation of Pulmonary Infectious Disease. Front Med (Lausanne) 2021;8:753055. [Crossref] [PubMed]
Venkataramana L, Prasad DVV, Saraswathi S, Mithumary CM, Karthikeyan R, Monika N. Classification of COVID-19 from tuberculosis and pneumonia using deep learning techniques. Med Biol Eng Comput 2022;60:2681-91. [Crossref] [PubMed]
Shi C, Shao Y, Shan F, Shen J, Huang X, Chen C, Lu Y, Zhan Y, Shi N, Wu J, Wang K, Gao Y, Shi Y, Song F. Development and validation of a deep learning model for multicategory pneumonia classification on chest computed tomography: a multicenter and multireader study. Quant Imaging Med Surg 2023;13:8641-56. [Crossref] [PubMed]
Shao J, Ma J, Yu Y, Zhang S, Wang W, Li W, Wang C. A multimodal integration pipeline for accurate diagnosis, pathogen identification, and prognosis prediction of pulmonary infections. Innovation (Camb) 2024;5:100648. [Crossref] [PubMed]
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ, editors. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. IEEE; 2017:2261-9. doi: 10.1109/CVPR.2017.243.
Han M, Zhang Y, Zhou Q, Rong C, Zhan Y, Zhou X, Gao Y. Large-scale evaluation of V-Net for organ segmentation in image guided radiation therapy. Proceedings of SPIE 2019;10951: [Crossref]
Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy. IEEE; 2017:2999-3007. doi: 10.1109/ICCV.2017.324.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 2017.
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 10-17 Oct. 2021.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A. editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science(), vol 9351. Springer, Cham; 2015:234-41.
Ouyang X, Gu D, Li X, Zhou W, Chen Q, Zhan Y, Zhou XS, Shi F, Xue Z, Shen D. Towards a general computed tomography image segmentation model for anatomical structures and lesions. Commun Eng 2024;3:143. [Crossref] [PubMed]
Podobnik G, Strojan P, Peterlin P, Ibragimov B, Vrtovec T. HaN-Seg: The head and neck organ-at-risk CT and MR segmentation dataset. Med Phys 2023;50:1917-27. [Crossref] [PubMed]
Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA, et al. The Medical Segmentation Decathlon. Nat Commun 2022;13:4128. [Crossref] [PubMed]
Bilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, et al. The Liver Tumor Segmentation Benchmark (LiTS). Med Image Anal 2023;84:102680. [Crossref] [PubMed]
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 2013;26:1045-57. [Crossref] [PubMed]
Luo X, Fu J, Zhong Y, Liu S, Han B, Astaraki M, et al. SegRap2023: A benchmark of organs-at-risk and gross tumor volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma. Med Image Anal 2025;101:103447. [Crossref] [PubMed]
Luo X, Liao W, Xiao J, Chen J, Song T, Zhang X, Li K, Metaxas DN, Wang G, Zhang S. WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image. Med Image Anal 2022;82:102642. [Crossref] [PubMed]
Rister B, Yi D, Shivakumar K, Nobashi T, Rubin DL CT-ORG. a new dataset for multiple organ segmentation in computed tomography. Sci Data 2020;7:381. [Crossref] [PubMed]
Ji Y, Bai H, Ge C, et al. AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). 2022;35:36722-32.
Landman B, Xu Z, Igelsias J, et al. Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge. Proceedings of the MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge. 2015;5:12.
Friedman JH. Greedy function approximation: A gradient boosting machine. Annals of Statistics 2001;29:1189-232.
Raju S, Ghosh S, Mehta AC, Chest CT. Signs in Pulmonary Disease: A Pictorial Review. Chest 2017;151:1356-74. [Crossref] [PubMed]
Zhan Y, Wang Y, Zhang W, Ying B, Wang C. Diagnostic Accuracy of the Artificial Intelligence Methods in Medical Imaging for Pulmonary Tuberculosis: A Systematic Review and Meta-Analysis. J Clin Med 2022;12:303. [Crossref] [PubMed]
Tan Y, Deng Y, Yan X, Liu F, Tan Y, Wang Q, et al. Nontuberculous mycobacterial pulmonary disease and associated risk factors in China: A prospective surveillance study. J Infect 2021;83:46-53. [Crossref] [PubMed]
Ratnatunga CN, Lutzky VP, Kupz A, Doolan DL, Reid DW, Field M, Bell SC, Thomson RM, Miles JJ. The Rise of Non-Tuberculosis Mycobacterial Lung Disease. Front Immunol 2020;11:303. [Crossref] [PubMed]
Wang L, Ding W, Mo Y, Shi D, Zhang S, Zhong L, Wang K, Wang J, Huang C, Zhang S, Ye Z, Shen J, Xing Z. Distinguishing nontuberculous mycobacteria from Mycobacterium tuberculosis lung disease from CT images using a deep learning framework. Eur J Nucl Med Mol Imaging 2021;48:4293-306. [Crossref] [PubMed]

Cite this article as: Jin W, Shao Y, Pan J, Wang M, Gu T, Shen W, Ouyang X, Qiao Z, Gu D, Qian Z, Gao Y, Hu B. Artiﬁcial intelligence model outperformed experienced clinicians in differentiating the aetiology of pneumonia on chest computed tomography: a retrospective study. Quant Imaging Med Surg 2026;16(1):77. doi: 10.21037/qims-24-2129