Ensemble deep learning model based on CT scans: differentiating and subtype-classifying pancreatic inflammations and tumors, and predicting pancreatic lesion invasiveness

Xuhang Pan; Qian Yang; Maofen Shi; Yupeng He; Kun Qin; Jinchao Zhu; Tianle Zhang; Hao Wu; Rui Du; Min Sun; Suping Chen; Hongyi Yang; Yuhui Fang; Suming Zhang; Bo Yang

doi:10.21037/qims-2025-aw-2192

Original Article

Ensemble deep learning model based on CT scans: differentiating and subtype-classifying pancreatic inflammations and tumors, and predicting pancreatic lesion invasiveness

Xuhang Pan^1#, Qian Yang^2#, Maofen Shi¹, Yupeng He¹, Kun Qin¹, Jinchao Zhu³, Tianle Zhang⁴, Hao Wu⁵, Rui Du⁶, Min Sun⁷, Suping Chen⁸, Hongyi Yang¹, Yuhui Fang¹, Suming Zhang¹, Bo Yang¹

¹Institute of Medical Imaging, Department of Radiology, Taihe Hospital, Hubei University of Medicine, Shiyan, China; ²Department of Radiology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China; ³Department of Pathology, The Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; ⁴Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China; ⁵Department of Radiology, Mayo Clinic, Rochester, MN, USA; ⁶Department of Radiology, Wuhan Children's Hospital, Tongji Medical College, Huazhong University of Science & Technology, Wuhan, China; ⁷Department of General Surgery, Taihe Hospital, Hubei University of Medicine, Shiyan, China; ⁸Advanced Application Team, GE Healthcare, Shanghai, China

Contributions: (I) Conception and design: B Yang, S Zhang; (II) Administrative support: B Yang; (III) Provision of study materials or patients: B Yang, Q Yang, J Zhu; (IV) Collection and assembly of data: X Pan, Q Yang, M Shi, R Du, M Sun, S Chen, H Yang, Y Fang; (V) Data analysis and interpretation: X Pan, Q Yang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Bo Yang, PhD; Suming Zhang, MD. Institute of Medical Imaging, Department of Radiology, Taihe Hospital, Hubei University of Medicine, 32 South Renmin Road, Maojian District, Shiyan 442000, China. Email: yangbo1108@whu.edu.cn; 304607346@qq.com.

Background: Pancreatic diseases, including pancreatitis and tumors, are often difficult to distinguish on imaging due to overlapping morphological features. The accurate classification of pancreatic lesions and preoperative prediction of invasiveness are critical for clinical decision-making and prognostic evaluation. The objective of this study was to develop an ensemble deep learning (DL) model based on computed tomography (CT) images for differentiating and subtyping pancreatic inflammations and tumors, as well as for predicting lesion invasiveness.

Methods: This multi-center research included 6,740 patients’ pancreatic CT images. An ensemble DL model integrating DeepLabV3, nnUNet-MS, and Adaptive Pyramidal Shifted Window-Swin-Transformer, respectively responsible for pancreatic segmentation, lesion segmentation, and diagnosis, was developed. Segmentation performance was evaluated using Dice and Intersection over Union (IoU); lesion differentiating and sub-classifying performance were assessed using sensitivity and accuracy. Gradient-weighted Class Activation Mapping (Grad-CAM) was conducted for conservatively treated pancreatic ductal adenocarcinoma (PDAC) to predict 12-month invasiveness, with imaging follow-up as reference standard.

Results: The DeepLabV3 module had good pancreas segmentation performance (internal test set: Dice: 0.983; IoU: 0.971; external test set Dice: 0.981; IoU: 0.969). The nnUNet-MS module demonstrated superior lesion segmentation performance (internal validation set: Dice 0.941, IoU 0.932; external test set: Dice 0.942, IoU 0.930) to five other open-source nnUNet algorithms. The ensemble DL model showed high accuracy in both differentiating inflammatory and tumor lesions (internally 95.1%, externally 95.8%), as well as sub-classifying five inflammation subtypes and six tumor subtypes (internally 88.0% and externally 87.5%). It exhibited high sensitivity for detecting PDAC [93.9% (95% CI: 91.0–96.0%) internally and 92.9% (95% CI: 90.1–95.1%) externally]. In terms of predicting 12-month PDAC invasiveness, the volume difference between predicted and actual tumor progression ranged from 0.09 to 0.38 cm³.

Conclusions: The integrated DL model demonstrated excellent performance in differentiating and subtype classifying pancreatic inflammation and tumor lesions, and in predicting the invasiveness of pancreatic lesions.

Keywords: Deep learning (DL); pancreatic neoplasms; computed tomography (CT); computer-aided diagnostics

Submitted Oct 22, 2025. Accepted for publication Mar 24, 2026. Published online Apr 14, 2026.

doi: 10.21037/qims-2025-aw-2192

Introduction

Pancreatic specialists often face complex and diverse lesions when diagnosing pancreatic diseases. Conditions such as mass-forming pancreatitis (MFP) and pancreatic ductal adenocarcinoma (PDAC) are frequently misdiagnosed. Moreover, several severe conditions, including PDAC and acute necrotizing pancreatitis (ANP) with hemorrhage, progress rapidly and have poor prognoses. Approximately 90% of cancer cases have been diagnosed after metastasis beyond the pancreas, highlighting the need for rapid and accurate diagnostic assessments to guide clinical treatment (1,2). Although surgical pathology has shown high accuracy, it represents an irreversible treatment option (3). Clinically, various imaging techniques have been combined for diagnostic purposes, however, the processes are complex, and diagnostic criteria have remained inconsistent. Computed tomography (CT), with the ability to clearly display the tumor’s size, location, density, and blood supply (4,5), has been recommended in differential diagnosis and subtype classification of pancreatic lesions, which is crucial for avoiding unnecessary biopsies and surgeries, thereby facilitating timely and effective patient management.

Recent studies have indicated that analysis techniques based on deep learning (DL) and computer vision demonstrate significant potential in diagnosing pancreatic lesions, allowing for effective analysis based on single-modality CT images. However, several issues associated with these studies were identified: (I) lack of data on rare types of pancreatic diseases leading to the neglect of MFP and PDAC with acute pancreatitis (AP); (II) more focus was placed on the differential diagnosis of tumor-like lesions using enhanced or plain CT images, particularly PDAC subtype analysis (6-10); (III) previous integrated DL algorithm models with deep architectures consumed considerable computing power, yet yielded unsatisfactory accuracy for inflammation and complex complications (6-10). In our previous study, we showcased the ability of multimodal dilated residual convolution to capture details. Although this research was based on inflammatory conditions and limited the analysis of lesions to the organ level while incorporating the analysis of rare complications, we aimed to establish a comprehensive, richer database to achieve multi-task intelligent analysis of pancreatic diseases (11). At the same time, we sought to develop a feature extractor based on fixed dimensions and dilated residual convolution to further optimize the inherent bulkiness of integrated DL algorithm groups. This effort was directed toward facilitating the early detection of pancreatic lesions and improving the efficiency of CT detection.

To address the clinical demand of multi-task on accurately subtyping pancreatic lesions, this study aimed to develop a computer-aided diagnostic tool integrating advanced DL models for pancreatic segmentation, tumor delineation, lesion classification, and tumor invasiveness assessment. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2192/rc).

Methods

Ethical approval

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Institutional Review Board of Shiyan Taihe Hospital (THH) (approval No. 2025KS13). The following institutions participated in this study: THH, Hubei Cancer Hospital (HCH), Shanghai Ninth People’s Hospital (SNH), and Hubei Provincial People’s Hospital (HPH). All participating institutions reviewed, approved, and agreed to the conduct of this research protocol. The requirement for informed consent for this retrospective analysis was waived. The clinical information is presented in Table 1.

Table 1

Data set characteristic

Characteristic	Internal training HCH (N=2,890)	Internal validation HCH (N=1,020)	External testing THH (N=1,092)	External testing HPH (N=692)	External testing SNH (N=462)
Disease type
Normal, n (%)	819 (28)	113 (11)	0 (0)	143 (21)	121 (26)
Tumor cases, n (%)
PDAC	512 (18)	278 (27)	338 (31)	0 (0)	109 (23)
PNET	163 (7)	72 (7)	0 (0)	0 (0)	23 (4)
SPT	86 (3)	31 (3)	0 (0)	0 (0)	41 (8)
IPMN	73 (3)	51 (5)	98 (9)	0 (0)	91 (20)
MCN	61 (2)	23 (2)	189 (17)	0 (0)	32 (7)
SCN	78 (3)	37 (4)	0 (0)	0 (0)	68 (15)
Inflammatory cases, n (%)
AP	279 (9)	89 (9)	142 (13)	166 (24)	0 (0)
CP	321 (11)	130 (13)	153 (14)	118 (17)	0 (0)
AIP	43 (1)	17 (2)	65 (6)	62 (9)	0 (0)
PP	125 (4)	62 (7)	0 (0)	265 (38)	0 (0)
PA	330 (11)	117 (10)	107 (10)	0 (0)	0 (0)
Reference standard
Surgical pathology, n (%)	2,382 (74)	599 (74)	467 (43)	549 (79)	0 (0)
Clinical diagnosis, n (%)	2,170 (69)	581 (72)	1,092 (100)	692 (100)	462 (100)
Characteristics of tumor cases
Female, n (%)	192 (35)	78 (44)	139 (41)	0 (0)	27 (25)
AP with PDAC, n (%)	22 (4)	2 (1)	7 (2)	0 (0)	0 (0)
Age (years), median (IQR)	63 (56–68)	61 (58–70)	59 (55–63)	0 (0–0)	60 (57–63)
Enhanced CT:unenhanced (1:x)	1:0.112	1:0.095	1:0.032	0:0	1:0.041
Lesion location, n (%)
Head	603 (51)	131 (50)	387 (62)	0 (0)	180 (53)
Neck	35 (3)	24 (9)	7 (1)	0 (0)	4 (1)
Body	544 (46)	108 (41)	231 (37)	0 (0)	157 (46)
T stage, n (%)
T0	1,917 (66)	528 (52)	467 (43)	692 (100)	121 (26)
T1	86 (3)	40 (4)	55 (5)	0 (0)	55 (12)
T2	202 (7)	163 (16)	163 (15)	0 (0)	74 (16)
T3	173 (6)	132 (13)	131 (12)	0 (0)	83 (18)
T4	115 (4)	71 (7)	87 (8)	0 (0)	110 (24)
Missing data	397 (14)	86 (9)	189 (17)	0 (0)	19 (4)
Characteristics of inflammatory cases
Female, n (%)	551 (46)	104 (31)	173 (37)	236 (43)	0 (0)
Age (years), median (IQR)	56 (54–59)	55 (53–59)	57 (49–57)	54 (49–57)	0 (0–0)
MF-AP, n (%)	11 (3)	2 (2)	7 (4)	3 (1)	0 (0)
AIP, n (%)	8 (19)	1 (1)	6 (9)	4 (6)	0 (0)
Enhanced CT:unenhanced (x:1)	0.131:1	0.127:1	0.091:1	0.112:1	0:0
CT characteristics, n (%)
CT scanner	Revolution ACE	Revolution ACE	Optima CT 680, Lightspeed VCT, Revolution CT	Optima CT 680, Lightspeed VCT, Revolution CT	Apex CT, Optima CT 680
kVp (kV)
Inflammation	120	120	120	120	−
Tumor	120	120	120	−	120
Slice thickness (mm)
Inflammation	3.0	0.625	5.0	1.2	−
Tumor	3.0	0.625	5.0	−	3.0

AIP, autoimmune pancreatitis; AP, acute pancreatitis; CP, chronic pancreatitis; CT, computed tomography; HCH, Hubei Cancer Hospital; HPH, Hubei Provincial People’s Hospital; IPMN, intraductal papillary mucinous neoplasm; IQR, interquartile range; MCN, mucinous cystic neoplasm; MF-CP|AIP, mass-forming CP and AIP; PA, pancreatic abscesses; PDAC, pancreatic ductal adenocarcinoma; PNET, pancreatic neuroendocrine tumor; PP, pancreatic pseudocyst; SCN, serous cystic neoplasm; SNH, Shanghai Ninth People’s Hospital; SPT, solid pseudopapillary tumor; THH, Taihe Hospital.

Dataset description

This multicenter study comprised three strategically partitioned cohorts to ensure robust model development and validation. An internal training cohort was utilized for the primary construction of the integrated DL framework. Model efficacy was subsequently benchmarked using an internal testing cohort, which included a specialized sub-cohort dedicated to validating the differential diagnostic performance in distinguishing diverse pancreatic lesions. Finally, to rigorously evaluate cross-institutional robustness, an external multicenter testing cohort was employed to assess the model’s generalizability across heterogeneous clinical environments. The comprehensive study workflow and cohort distribution are delineated in Figure 1 and Appendix 1.

Figure 1 Data flow diagrams. (A) Data flow for the internal training set and internal test set. (B) Data flow for the external test set. (C) Data flow for the rare data used in the fourth stage. AI, artificial intelligence; AIP, autoimmune pancreatitis; AP, acute pancreatitis; CP, chronic pancreatitis; CT, computed tomography; Grad-CAM, Gradient-weighted Class Activation Mapping; HCH, Hubei Cancer Hospital; HPH, Hubei Provincial People’s Hospital; IPMN, intraductal papillary mucinous neoplasm; MCN, mucinous cystic neoplasm; MF-CP|AIP, mass-forming CP and AIP; MICCAI, Medical Image Computing and Computer Assisted Intervention; PA, pancreatic abscesses; PDAC, pancreatic ductal adenocarcinoma; PNET, pancreatic neuroendocrine tumor; PP, pancreatic pseudocyst; SCN, serous cystic neoplasm; SNH, Shanghai Ninth People’s Hospital; SPT, solid pseudopapillary tumor; TCIA, The Cancer Imaging Archive; THH, Taihe Hospital.

Internal training dataset

The internal training dataset consisted of patients from HCH. There were 973 patients with tumorous lesions [including 512 cases of PDAC, 163 cases of pancreatic neuroendocrine tumors (PNET), 86 cases of solid pseudopapillary tumors (SPT), 73 cases of intraductal papillary mucinous neoplasms (IPMN), 61 cases of mucinous cystic neoplasms (MCN), and 78 cases of serous cystic neoplasms (SCN)]. Additionally, there were 1,098 patients with inflammatory lesions [including 279 cases of AP and 321 cases of chronic pancreatitis (CP), 43 cases of autoimmune pancreatitis (AIP), 125 cases of pancreatic pseudocysts (PP), and 330 cases of pancreatic abscesses (PA)], as well as 819 patients with normal pancreatic conditions. These patients were treated between 2011 and 2024 and diagnosed preoperatively with either inflammatory or tumorous lesions. Tumorous patients underwent preoperative pathological examination. Preoperative CT scans were used for training the artificial intelligence (AI) model (Figure 1A).

Internal validation and aggression prediction dataset

The internal validation dataset was also sourced from HCH and included 492 patients with tumorous lesions (including 278 cases of PDAC, 72 cases of PNET, 31 cases of SPT, 51 cases of IPMN, 23 cases of MCN, and 37 cases of SCN), 415 patients with inflammatory lesions (including 89 cases of AP, 130 cases of CP, 17 cases of AIP, 62 cases of PP, and 117 cases of PA), and 113 patients with normal pancreatic conditions. These patients were treated between 2020 and 2023 and were pathologically and radiologically diagnosed with either inflammatory or tumorous lesions. Contrast-enhanced CT scans were performed for all patients and used for AI model testing and radiological analysis (Figure 1A).

To enhance the interpretability and clinical applicability of the ensemble DL model, a rare cohort from HCH was included, and these cases were analyzed using Gradient-weighted Class Activation Mapping (Grad-CAM) technology to assess the invasiveness of the lesions (Figure 1C). The rare cohort consisted of 22 cases of mass-forming CP and AIP (MF-CP|AIP), 24 cases of AP with PDAC, and 11 cases of conservative treatment PDAC (C-PDAC). The C-PDAC cases had 12-month follow-up, with CT scans taken every six weeks, and the final contrast-enhanced CT images (at 12 months) were used as the reference standard of tumor progression.

External multicenter testing dataset

The external testing dataset included data from three centers: HCH, SNH, and HPH, along with data from three public datasets: Medical Image Computing and Computer Assisted Intervention Society (MICCAI), Kaggle, and The Cancer Imaging Archive (TCIA). The inclusion criteria required histopathological confirmation through surgery or biopsy. Normal controls were randomly selected from patients with normal pancreatic CT scans. The multi-center testing dataset comprised 2,651 patients (including 1,031 cases of PDAC, 23 cases of PNET, 41 cases of SPT, 189 cases of IPMN, 221 cases of MCN, 68 cases of SCN, 308 cases of AP, 271 cases of CP, 127 cases of AIP, 265 cases of PP, and 107 cases of PA) and 264 patients with normal pancreatic disease. This dataset was used for independent validation without any adjustments or modifications to the model during the process (Figure 1B).

Ensemble DL model

This ensemble DL model integrated four stages of DL algorithms, unified through ensemble learning techniques. The process began by localizing the pancreas, followed by its segmentation. Next, the model identified and segmented potential lesions within the pancreas; finally a comprehensive diagnosis of the entire set of CT images alongside a detailed regional analysis was made (12). The outputs of this DL model were threefold: segmentation masks for the pancreas and cancerous lesions, diagnostic results for pancreatic lesions, and regional information (Figure 2A, Algorithm 1, and Appendix 1).

Figure 2 Ensemble deep learning algorithm AI model diagram. (A) Algorithm framework of the main architecture. (B) Algorithm framework for pancreatic segmentation using DeepLabV3. (C) Algorithm framework for lesion segmentation using nnUNet-MS, where the feature extractor is fixed to extract image features at different scales. (D) Algorithm framework for lesion diagnosis using the AP-Swin-Transformer, where the image size is fixed. AI, artificial intelligence; AP, adaptive pyramidal; CP, chronic pancreatitis; DCNN, deep convolutional neural network; IPMN, intraductal papillary mucinous neoplasm; MCN, mucinous cystic neoplasm; PDAC, pancreatic ductal adenocarcinoma; PNET, pancreatic neuroendocrine tumor; ROI, region of interest.

Algorithm 1 Integrated multi-task pipeline for pancreatic disease analysis

• Input: Multi-center CT images (plain and contrast-enhanced scans)

• Step 1: Pancreas Localization (Phase I)

Execute DeepLabV3 with ASPP to generate the initial pancreas mask

Define the ROI by cropping images based on localized contours to reduce background noise

• Step 2: Lesion Segmentation (Phase II)

Feed the cropped ROI into nnUNet-MS

Perform fine-grained lesion segmentation using the adaptive multi-scale feature extractor

• Step 3: Multi-task Classification (Phase III)

Input the segmented features into AP-Swin-Transformer

Utilize shifted window self-attention and pyramidal modules to output diagnostic labels (11 disease subtypes)

• Step 4: Invasiveness Prediction (Phase IV)

Apply Grad-CAM to analyze peri-pancreatic, intra-pancreatic, and intra-tumoral focus

Calculate predicted tumor volume (V_1pred) and evaluate 12-month invasiveness using computer vision methods

• Output: Segmentation masks, diagnostic sub-types, and tumor progression predictions

ASPP, Atrous Spatial Pyramid Pooling; CT, computed tomography; Grad-CAM, Gradient-weighted Class Activation Mapping; ROI, region of interest.

Pancreas localization

The goal of the first stage was to segment the pancreas. Given the capability of atrous convolution and atrous spatial pyramid pooling to handle fine features of different sizes, a DeepLabV3 model was trained in this stage to segment the entire pancreatic region (13). The goal was to obtain accurate pancreatic contours in CT scans of normal, inflammatory pancreatic lesions, and tumor-like pancreatic lesions (Figure 2B).

Lesion segmentation

The purpose of the second stage was to segment lesions within the pancreas. Deeper networks and enhanced detail-capturing capabilities were required at this stage, thereby enabling the handling of more complex classification environments. nnUNet was able to automatically adjust the depth and architecture of the network according to different lesion contours to meet the segmentation needs of complex pancreatic lesions (14,15). Based on nnUNet, the parameters of the feature extractor were fixed to optimize feature extraction at different scales, thereby achieving good lesion segmentation results (Figure 2C).

Lesion diagnosis

The third stage focused on diagnosing various types and subtypes of pancreatic diseases on CT images. Adaptive Pyramidal Shifted Window Transformer (AP-Swin-Transformer) model with multiple classifiers was developed, with the capability of identifying normal pancreas, inflammatory lesions, cancerous lesions, and their subtypes. At this stage, the basic information of pancreatic CT was first obtained based on the attention mechanism and sliding window mechanism of Swin-Transformer. Subsequently, adaptive modules and pyramidal modules were utilized to capture and analyze complex features in CT images that are imperceptible to the human eye across different scale slices (16), as exemplified in Figure 2D.

Regional informatic assessment

The fourth phase aimed to visualize the model-focused regions using Grad-CAM and to predict the progressed lesion (17). The regions of model focus were divided into three parts: peri-pancreatic, intra-pancreatic, and intra-tumoral. The prediction maps for these three parts, along with the initial tumor volume (V_1true), predicted tumor volume (V_1pred), and final tumor volume (V_2true), were generated by using computer vision methods (clustering + dilation operators). Additionally, the predicted tumor contour and volume were compared with the reference standard for C-PDAC cases.

Statistical analysis

Dice and Intersection over Union (IoU) were used for evaluating segmentation performance; sensitivity and accuracy were calculated to assess diagnostic performance. To enhance interpretability, visual heatmaps of CT scans were generated using Grad-CAM technology to examine the model’s attention regions in complex diseases. To evaluate tumor invasiveness, Dice was first calculated for different pancreatic regions within continuous PDAC cases and represented in raincloud plots. Finally, Bland-Altman analysis was conducted to assess the agreement between the predicted (PPV) and actual tumor progression volumes (APV). The PPV and APV were defined as follows:

$P P V = V_{1 p r e d i c t i o n} - V_{1 t r u e}$ [1]

$A P V = V_{2 t r u e} - V_{1 t r u e}$ [2]

Results

Pancreas segmentation

The lightweight DeepLabV3 module showed good performance in segmenting the pancreatic contour (internal test set: Dice: 0.983; IoU: 0.971; external test set Dice: 0.981; IoU: 0.969).

Lesion segmentation

As shown in Table 2, excellent and superior performance was achieved by the nnUNet MS module in both the internal and the external test sets when compared with the five open-source nnUNet (internal validation set, Dice: 0.941; IoU: 0.932; external validation set, Dice: 0.942; IoU: 0.930).

Table 2

The effect of different nnUNet models on lesion segmentation

Segmentation model	Internal validation set		External testing set
Segmentation model	Dice	IoU	Dice	IoU
nnUNet_MS	0.941	0.932	0.942	0.930
nnUNetv2	0.937	0.928	0.938	0.924
nnUNet ResEnc M	0.921	0.917	0.923	0.901
nnUNet ResEnc L	0.927	0.923	0.925	0.906
nnUNet ResEnc XL	0.931	0.921	0.933	0.925
nnUNet Trans	0.915	0.894	0.911	0.891

Dice, Dice similarity coefficient; IoU, Intersection over Union.

Lesion classification

The ensemble DL model showed high accuracy in both differentiating inflammatory and tumor lesions [internally 95.1% (866/911), externally 95.8% (2,540/2,651)], as well as sub-classifying five inflammation subtypes and six tumor subtypes [internally 88.0% (802/911) and externally 87.5% (2,320/2,651)]. In this multi-task, for six subtype tumors, the sensitivity ranged from 72% to 93.9% and 71% to 92.9% in the internal and external validation datasets, respectively. Therein, the detection for PDAC had the highest sensitivity, showing accuracy of 93.9% and 92.9%, whereas the identification for SCN had the lowest efficacy with a sensitivity of 72% internally and 71% externally. For different T-staged tumors, the sensitivity ranged from 87% to 95% and 87% to 96% in the internal and external validation datasets, respectively. Especially for T1-staged tumors, the accuracy was 88% and 87% in the internal and external validation datasets, respectively. For diagnosing five-subtype inflammations, the sensitivity ranged from 74% to 93% internally and 76% to 91% externally. The model had the highest diagnostic efficacy for PDAC and CP, and the lowest efficacy for diagnosing SCN and AIP. In addition, the model showed high diagnostic efficacy on both unenhanced CT scans and contrast-enhanced scans. For identifying severe lesions (requiring immediate intervention, surgery, or late-stage follow-up with radiotherapy or chemotherapy: PDAC, PNET, SPT, and ANP), the probability distribution was concentrated in the range of 0.6–0.8, with a high peak, indicating that the diagnosis was relatively clear. However, an overlap between PDAC and PNET was observed, resulting in some misclassified cases. The internal validation set and the external testing set showed similar patterns. For moderate lesions (high-risk cases still requiring surgery, medium-to-low-risk cases requiring monitoring: AP, IPMN, MCN, SCN, and PA), the probability distribution was concentrated in the range of 0.2–0.6. In the internal validation set, the peak was low, with no overlap between distributions, indicating that the diagnosis was clear. In the external testing set, the peak was high, and the diagnosis was also clear. In particular, receiver operating characteristic (ROC) analysis demonstrated high performance of the model in PDAC detection, with areas under the curve (AUCs) of 0.95 [95% confidence interval (CI): 0.931–0.967] for the internal validation dataset and 0.94 (95% CI: 0.915–0.945) for the external testing dataset (Figure 3).

Figure 3 The results of using the ensemble algorithm to diagnose pancreatic CT lesions. (A) Bar chart of model accuracy for tumorous and inflammatory lesions. (B) Classification accuracy of the model across different T stages and imaging phases, with T stages shown on the left and imaging phases on the right. Ridge plot of classification probabilities for (C) severe lesions and (D) moderate lesions in the internal validation set (left) and the external test set (right). (E) The complete confusion matrix, clearly demonstrating the model’s classification performance, with the internal validation set on the left and the external validation set on the right. AIP, autoimmune pancreatitis; AP, acute pancreatitis; CP, chronic pancreatitis; EnCT, enhanced computed tomography; I, inflammatory; IPMN, intraductal papillary mucinous neoplasm; MCN, mucinous cystic neoplasm; PA, pancreatic abscesses; PDAC, pancreatic ductal adenocarcinoma; PNET, pancreatic neuroendocrine tumor; PP, pancreatic pseudocyst; SCN, serous cystic neoplasm; SPT, solid pseudopapillary tumor; T, tumorous; UnCT, unenhanced computed tomography.

Heatmap generation

Using Grad-CAM, visualization heatmaps were generated to highlight the model’s focus areas on MF-CP|AIP, AP with PDAC, and C-PDAC (Figure 4A). As for Dice coefficients between non-contrast images and contrast-enhanced images, it achieved Dice value of 0.65±0.042 in the intra-tumoral region, 0.53±0.033 in the peripancreatic region, and 0.49±0.031 in the intra-pancreatic region (Figure 4B). In terms of predicting 12-month C-PDAC invasiveness, the difference between predicted and reference tumor progression volume ranged 0.09–0.38 cm³ (Figure 4C).

Figure 4 The model’s attention regions for rare and specific diseases and its potential for predicting tumor progression changes. (A) Attention heatmap of the model for rare and special types of pancreatic diseases, including both plain and enhanced CT images. The red arrows indicate the pancreas tumor. (B) Correlation of the model in predicting different pancreatic regions during tumor progression. (C) Model heatmap for predicting tumor progression, comparing the consistency between the actual tumor proliferative volume and the predicted tumor proliferative volume. AIP, autoimmune pancreatitis; AP, acute pancreatitis; CP, chronic pancreatitis; CT, computed tomography; EnCT, enhanced computed tomography; Grad-CAM, Gradient-weighted Class Activation Mapping; MF-CP|AIP, mass-forming CP and AIP; PDAC, pancreatic ductal adenocarcinoma; SD, standard deviation; UnCT, unenhanced computed tomography.

Discussion

In our study, an ensemble DL model including DeepLabV3, nnUNet-MS, and AP-Swin-Transformer algorithms based on non-contrast and contrast-enhanced CT scans was developed, showing excellent pancreas and pancreatic lesion segmentation performances. With multicenter datasets comprising complex pancreas diseases, the model was found to be capable of detecting and differentiating inflammatory and tumor lesions and their subtypes (inflammation: AP, CP, AIP, PP, PA; tumor: PDAC, PNET, SPT, IPMN, SCN, MCN) with excellent efficacy. Moreover, for MFP, PDAC with AP, and C-PDAC, Grad-CAM and deep network techniques could highlight the lesions and complex comorbidities, enhancing the interpretability; finally, the model demonstrated potential in predicting the invasiveness of C-PDAC tumors.

This multi-task diagnosis among various pancreatic diseases had long been regarded as complex and redundant by radiologists (18,19). In critically severe patients, distinguishing between MFP and pancreatic cancer has proven challenging due to the similar imaging characteristics on single CT scans. In such cases, magnetic resonance cholangiopancreatography and tumor marker indices were often employed for comprehensive assessment. Furthermore, for tumor-like cystic-solid lesions, radiologists consider factors such as necrosis, as well as age and gender, in their diagnostic process (20-22). In actual pancreatic lesion diagnosis, CT scans were used to assess lesion location, size, morphology, density, enhancement, and surrounding invasion, whereas MRCP was applied to distinguish between inflammatory and tumorous obstructions (23-25). These imaging results, combined with tumor marker indices and clinical indicators, were recognized as the recommended method for diagnosing pancreatic lesions.

In previous studies, the pancreas has been classified based on organ contours and AI technology applied to address the differential diagnosis of two types of pancreatic tumors and complex lesions in non-contrast CT scans. However, due to the exclusion of extra-organ information and the complexity of comprehensive disease diagnosis, this complex issue remained challenging to fully resolve (26,27). To address this, an integrated DL algorithm was developed, comprising four core components: pancreatic diagnosis, lesion detection, disease diagnosis, and invasiveness prediction, with the aim of assisting radiologists in diagnosis and enhancing diagnostic efficiency. To predict the invasiveness of pancreatic lesions, CT scans were divided into three regions, peripancreatic, intrapancreatic, and intratumoral, to investigate the impact of information from different regions in monitoring the invasiveness of pancreatic lesions. This partitioning approach helped to assess the contribution of each region in invasion prediction, thus providing more interpretable and practical diagnostic support.

The ensemble DL model firstly enabled precise diagnosis of pancreatic diseases by using CT scans, then significantly improved the diagnostic process’s efficiency. The model demonstrated effective generalization across different centers and for various diseases. This ability was attributed to several factors:

Comprehensive dataset creation. In collaboration with multiple hospitals, an extensive CT dataset was constructed, including both enhanced and plain scans, and all pathologies were confirmed. This dataset covered a wide range of pancreatic lesions, both common and rare, providing a diverse and representative foundation for the model.
Enhanced model performance through multi-task integration. The multi-task integrated network model ensured precise execution at every stage, significantly improving the model’s overall performance. This comprehensive approach made the model more robust and reliable in addressing the complexities of pancreatic lesion diagnosis.
Adaptive residual optimization model. A pyramid residual convolutional adaptive feature extractor was designed, enabling fine multi-scale feature extraction and model parameter optimization. This approach improved the accuracy of rare lesion detection and enhanced model efficiency.
Disease diagnosis through transformer-based networks. A transformer-based classification network was utilized in the diagnostic component of the integrated algorithm. This approach utilized powerful self-attention mechanisms, which not only enabled accurate diagnosis of common pancreatic diseases but also captured extrapancreatic and extratumoral features that were often missed in traditional radiology. These external features, varied in scale, provided vital diagnostic information that improved the model’s performance.

In the fourth step of the integrated algorithm, during regional analysis, it was noted that DL classification models often demonstrate insufficient interpretability. In actual diagnosis, the interpretability of a single label is low. Pancreatic cancer is a rapidly progressing tumorous lesion that requires quick and accurate diagnosis and monitoring (27,28). Rare data (such as MFP and PDAC with AP) were collected, and the model’s interpretability was demonstrated through Grad-CAM technology and a two-stage classification algorithm within the integrated model. By comparing the suspected areas in the first enhanced CT scan with the contours in preoperative CT, this integrated model not only significantly optimized the efficiency of pancreatic CT lesion diagnosis but also provided a new approach for monitoring the invasiveness of pancreatic tumors.

The superiority of our proposed ensemble DL framework lies in its multi-task integrated architecture, which meticulously mirrors the clinical diagnostic logic used by radiologists to navigate complex pancreatic pathologies. Unlike conventional single-task models that often suffer from ‘information loss’ by focusing solely on organ-level features, our model facilitates a robust clinical synergy between anatomical localization, lesion segmentation, and multi-task diagnosis. Specifically, by integrating nnUNet-MS to exclude extra-pancreatic noise and AP-Swin-Transformer to capture global long-range dependencies, the system effectively recovers critical diagnostic information—such as peri-pancreatic and intra-tumoral heterogeneity—that is frequently missed in traditional radiology. This comprehensive approach not only resolves the long-standing challenge of differentiating MFP from pancreatic cancer but also provides practical decision support through Grad-CAM-based visual explanations and quantitative invasiveness predictions. Consequently, this integrated pipeline transforms CT assessment from a redundant process into a highly efficient tool for early detection and longitudinal monitoring of tumor progression.

Despite the model’s outstanding performance, several misjudgments and false positives were observed during external validation for certain rare inflammatory or tumor-like cystic lesions. These errors could be attributed to several factors:

Data imbalance. Inflammatory lesions were often preferred for plain CT scans, whereas cancerous lesions were more reliant on contrast-enhanced CT. Additionally, the difficulty in collecting certain datasets resulted in a classification bias within the model for pancreatic lesion categorization. The number for AIP and SCN cases was relatively small: SCN: 78 cases for internal training, 37 for internal validation, 68 for external testing; AIP: 43 cases for internal training, 17 for internal validation, 127 for external testing. Furthermore, in regional analyses, the limited data availability for PDAC patients undergoing conservative treatment constrained the comprehensive validation of the model. In the future, such limitations could be addressed through collaborative efforts to mitigate the model’s data distribution bias and enhance its generalizability to rare diseases.
Specificity in AP diagnosis. Although the model showed high accuracy in diagnosing AP, it did not offer detailed differentiation within the condition’s subtypes. Hemorrhagic ANP, which carries a significant mortality risk, was particularly challenging, as contrast-enhanced CT plays a critical role in its diagnosis. Although contrast-enhanced data and non-contrast data were both used for model training, our study primarily focused on the multi-task diagnosis on non-contrast CT imaging. However, due to the urgency of treatment before CT scans can be performed, data on such cases remained scarce.

It should be noted that all cases were drawn from the same country and largely from the same ethnic background (primarily Asian). This demographic homogeneity, coupled with the inherent geographic and ethnic factors that may influence disease prevalence and presentation, limits our ability to fully assess the model’s generalizability across diverse populations. Moreover, as the preliminary nature of this study, we focused more on the model’s performance in segmentation, diagnosis, and interpretability, while the optimization for real-world clinical deployment and cross-institutional reproducibility has not yet been addressed.

Conclusions

The ensemble DL model was demonstrated to be highly flexible and broadly generalizable, capable of efficiently detecting and classifying pancreatic lesions on both plain and contrast-enhanced CT scans, also showing potential in predicting and tracking the tumor progression.

Acknowledgments

We would like to thank Drs. Wen Chen and Yijun Tang from Taihe Hospital for generously sharing the clinical data of pancreatic lesions and providing professional insights into the pathological classification of pancreatic inflammations and tumors.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2192/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2192/dss

Funding: This work was supported by Natural Science Foundation of Hubei Province (No. 2022CFB853), Young and Middle aged Talent Project of Education Department of Hubei Province (No. Q202220110).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2192/coif). S.C. is an employee from GE Healthcare. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The retrospective study was approved by the Institutional Review Board of Shiyan Taihe Hospital (THH) (approval No. 2025KS13). The following institutions participated in this study: Hubei Cancer Hospital, Shiyan Taihe Hospital, Shanghai Ninth People’s Hospital, Hubei Provincial People’s Hospital. All participating institutions reviewed, approved, and agreed to the conduct of this research protocol. Informed consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Klatte DCF, Boekestijn B, Wasser MNJM, Feshtali Shahbazi S, Ibrahim IS, Mieog JSD, Luelmo SAC, Morreau H, Potjer TP, Inderson A, Boonstra JJ, Dekker FW, Vasen HFA, van Hooft JE, Bonsing BA, van Leerdam ME. Pancreatic Cancer Surveillance in Carriers of a Germline CDKN2A Pathogenic Variant: Yield and Outcomes of a 20-Year Prospective Follow-Up. J Clin Oncol 2022;40:3267-77. [Crossref] [PubMed]
Klein AP. Pancreatic cancer epidemiology: understanding the role of lifestyle and inherited risk factors. Nat Rev Gastroenterol Hepatol 2021;18:493-502. [Crossref] [PubMed]
Chen PT, Wu T, Wang P, Chang D, Liu KL, Wu MS, Roth HR, Lee PC, Liao WC, Wang W. Pancreatic Cancer Detection on CT Scans with Deep Learning: A Nationwide Population-based Study. Radiology 2023;306:172-82. [Crossref] [PubMed]
Stoffel EM, Brand RE, Goggins M. Pancreatic Cancer: Changing Epidemiology and New Approaches to Risk Assessment, Early Detection, and Prevention. Gastroenterology 2023;164:752-65. [Crossref] [PubMed]
Bakasa W, Viriri S. Pancreatic Cancer Survival Prediction: A Survey of the State-of-the-Art. Comput Math Methods Med 2021;2021:1188414. [Crossref] [PubMed]
Cao K, Xia Y, Yao J, Han X, Lambert L, Zhang T, et al. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat Med 2023;29:3033-43. [Crossref] [PubMed]
Bhayana R, Nanda B, Dehkharghanian T, Deng Y, Bhambra N, Elias G, Datta D, Kambadakone A, Shwaartz CG, Moulton CA, Henault D, Gallinger S, Krishna S. Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer. Radiology 2024;311:e233117. [Crossref] [PubMed]
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577:89-94. [Crossref] [PubMed]
Si K, Xue Y, Yu X, Zhu X, Li Q, Gong W, Liang T, Duan S. Fully end-to-end deep-learning-based diagnosis of pancreatic tumors. Theranostics 2021;11:1982-90. [Crossref] [PubMed]
Wang H, Lei C, Zhao D, Gao L, Gao J. DeepHipp: accurate segmentation of hippocampus using 3D dense-block based on attention mechanism. BMC Med Imaging 2023;23:158.
Pan X, Jiao K, Li X, Feng L, Tian Y, Wu L, Zhang P, Wang K, Chen S, Yang B, Chen W. Artificial intelligence-based tools with automated segmentation and measurement on CT images to assist accurate and fast diagnosis in acute pancreatitis. Br J Radiol 2024;97:1268-77.
Viriyasaranon T, Chun JW, Koh YH, Cho JH, Jung MK, Kim SH, Kim HJ, Lee WJ, Choi JH, Woo SM. Annotation-Efficient Deep Learning Model for Pancreatic Cancer Diagnosis and Classification Using CT Images: A Retrospective Diagnostic Study. Cancers (Basel) 2023;15:3392.
Bovcon B, Kristan M. WaSR-A Water Segmentation and Refinement Maritime Obstacle Detection Network. IEEE Trans Cybern 2022;52:12661-74.
IsenseeFPetersenJKleinAZimmererDJaegerPFKohlSWasserthalJKohlerGNorajitraTWirkertSMaier-HeinKH. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. arXiv: 1809.10486.
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11.
LiuZLinYTCaoYHuHWeiYXZhangZLinSGuoBN. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030.
Jin C, Yu H, Ke J, Ding P, Yi Y, Jiang X, Duan X, Tang J, Chang DT, Wu X, Gao F, Li R. Predicting treatment response from longitudinal images using multi-task deep learning. Nat Commun 2021;12:1851. [Crossref] [PubMed]
Dietrich CF, Sahai AV, D'Onofrio M, Will U, Arcidiacono PG, Petrone MC, Hocke M, Braden B, Burmester E, Möller K, Săftoiu A, Ignee A, Cui XW, Iordache S, Potthoff A, Iglesias-Garcia J, Fusaroli P, Dong Y, Jenssen C. Differential diagnosis of small solid pancreatic lesions. Gastrointest Endosc 2016;84:933-40. [Crossref] [PubMed]
Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 2018;359:926-30. [Crossref] [PubMed]
Park HJ, Shin K, You MW, Kyung SG, Kim SY, Park SH, Byun JH, Kim N, Kim HJ. Deep Learning-based Detection of Solid and Cystic Pancreatic Neoplasms at Contrast-enhanced CT. Radiology 2023;306:140-9. [Crossref] [PubMed]
Park J, Artin MG, Lee KE, Pumpalova YS, Ingram MA, May BL, Park M, Hur C, Tatonetti NP. Deep learning on time series laboratory test results from electronic health records for early detection of pancreatic cancer. J Biomed Inform 2022;131:104095. [Crossref] [PubMed]
Bartoli M, Barat M, Dohan A, Gaujoux S, Coriat R, Hoeffel C, Cassinotto C, Chassagnon G, Soyer P. CT and MRI of pancreatic tumors: an update in the era of radiomics. Jpn J Radiol 2020;38:1111-24.
Klein EA, Richards D, Cohn A, Tummala M, Lapham R, Cosgrove D, Chung G, Clement J, Gao J, Hunkapiller N, Jamshidi A, Kurtzman KN, Seiden MV, Swanton C, Liu MC. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann Oncol 2021;32:1167-77.
Chu LC, Park S, Soleimani S, Fouladi DF, Shayesteh S, He J, Javed AA, Wolfgang CL, Vogelstein B, Kinzler KW, Hruban RH, Afghani E, Lennon AM, Fishman EK, Kawamoto S. Classification of pancreatic cystic neoplasms using radiomic feature analysis is equivalent to an experienced academic radiologist: a step toward computer-augmented diagnostics for radiologists. Abdom Radiol (NY) 2022;47:4139-50.
Cai J, Chen H, Lu M, Zhang Y, Lu B, You L, Zhang T, Dai M, Zhao Y. Advances in the epidemiology of pancreatic cancer: Trends, risk factors, screening, and prognosis. Cancer Lett 2021;520:1-11.
Singhi AD, Koay EJ, Chari ST, Maitra A. Early Detection of Pancreatic Cancer: Opportunities and Challenges. Gastroenterology 2019;156:2024-40.
Pereira SP, Oldfield L, Ney A, Hart PA, Keane MG, Pandol SJ, Li D, Greenhalf W, Jeon CY, Koay EJ, Almario CV, Halloran C, Lennon AM, Costello E. Early detection of pancreatic cancer. Lancet Gastroenterol Hepatol 2020;5:698-710.
Lu MY, Chen TY, Williamson DFK, Zhao M, Shady M, Lipkova J, Mahmood F. AI-based pathology predicts origins for cancers of unknown primary. Nature 2021;594:106-10. [Crossref] [PubMed]

Cite this article as: Pan X, Yang Q, Shi M, He Y, Qin K, Zhu J, Zhang T, Wu H, Du R, Sun M, Chen S, Yang H, Fang Y, Zhang S, Yang B. Ensemble deep learning model based on CT scans: differentiating and subtype-classifying pancreatic inflammations and tumors, and predicting pancreatic lesion invasiveness. Quant Imaging Med Surg 2026;16(5):416. doi: 10.21037/qims-2025-aw-2192

Ensemble deep learning model based on CT scans: differentiating and subtype-classifying pancreatic inflammations and tumors, and predicting pancreatic lesion invasiveness

Introduction

Methods

Ethical approval

Table 1

Dataset description

Internal training dataset

Internal validation and aggression prediction dataset

External multicenter testing dataset

Ensemble DL model

Pancreas localization

Lesion segmentation

Lesion diagnosis

Regional informatic assessment

Statistical analysis

Results

Pancreas segmentation

Lesion segmentation

Table 2

Lesion classification

Heatmap generation

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share