Early identification of abnormal pulmonary infectious diseases using unsupervised anomaly detection

Rong Liu; Yuhe Zhu; Zhangwen Lyu; Yite Gao; Yinwei Zhan; Yuefu Zhan

doi:10.21037/qims-2025-1285

Original Article

Early identification of abnormal pulmonary infectious diseases using unsupervised anomaly detection

Rong Liu¹, Yuhe Zhu¹, Zhangwen Lyu¹, Yite Gao¹, Yinwei Zhan¹, Yuefu Zhan^2,3,4

¹School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; ²Department of Radiology, The Third People’s Hospital of Longgang, Clinical Institute of Shantou University Medical College (The Third People’s Hospital of Longgang District Shenzhen), Shenzhen, China; ³Department of Radiology, The Seventh People’s Hospital of Chongqing (Affiliated Central Hospital of Chongqing University of Technology), Chongqing, China; ⁴School of Pediatrics, Hainan Medical University, Haikou, China

Contributions: (I) Conception and design: R Liu, Yuefu Zhan; (II) Administrative support: Yinwei Zhan; (III) Provision of study materials or patients: Yuefu Zhan; (IV) Collection and assembly of data: R Liu; (V) Data analysis and interpretation: R Liu, Y Zhu, Z Lyu, Y Gao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Yinwei Zhan, PhD. School of Computer Science and Technology, Guangdong University of Technology, No. 100 Waihuan Xi Road, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou 510006, China. Email: ywzhan@gdut.edu.cn; Yuefu Zhan, PhD. Department of Radiology, The Third People’s Hospital of Longgang, Clinical Institute of Shantou University Medical College (The Third People’s Hospital of Longgang District Shenzhen), No. 278 Songbai Road, Henggang Subdistrict, Longgang District, Shenzhen 518116, China; Department of Radiology, The Seventh People’s Hospital of Chongqing (Affiliated Central Hospital of Chongqing University of Technology), Chongqing, China; School of Pediatrics, Hainan Medical University, Haikou, China. Email: zyfradiology@hainmc.edu.cn.

Background: The early identification of abnormal pulmonary infectious diseases (APIDs) could effectively control the large-scale spread of such diseases. This study proposed a deep learning-based method for the early identification of APIDs.

Methods: Unsupervised anomaly detection (UAD) refers to the identification of abnormal samples of which its distribution differs from that of normal samples using a training set comprised of only normal samples. Building on this principle, we proposed a method for the early identification of APIDs. First, we established a pulmonary infection computed tomography (PICT) image sequence dataset, which included computed tomography (CT) image sequences of various common pulmonary infections, as well as two known abnormal cases [coronavirus disease 2019 (COVID-19) and melioidosis pneumonia]. Under our framework, only common infection sequences were used to train the UAD network, while both common and abnormal sequences were used in testing to assess the capability of the network to identify deviations. This approach not only detected the two known abnormal cases but was also able to detect unknown APIDs. To enhance the detection accuracy (ACC) of our approach, we developed the local reconstruction autoencoder (LRAE), which focuses on local regions in PICT images to effectively distinguish between common and abnormal infection areas.

Results: Comprehensive experiments on the PICT dataset were conducted using metrics such as the area under the curve (AUC), F1-score, and ACC, and the results revealed the effectiveness and superiority of the LRAE compared to existing UAD methods. Specifically, the AUC, F1-score, and ACC of the LRAE in detecting COVID-19 CT image sequences were 0.8269, 0.7242, and 0.7801, respectively; while those for the melioidosis pneumonia CT image sequences were 0.8716, 0.6415, and 0.8146, respectively.

Conclusions: This work offers a robust solution for the early identification of both known and emerging APIDs. The developed LRAE showed remarkable performance in detecting abnormal PICT image sequences.

Keywords: Early identification; abnormal pulmonary infectious diseases (APIDs); computed tomography image sequences (CT image sequences); deep learning; unsupervised anomaly detection (UAD)

Submitted Jun 04, 2025. Accepted for publication Sep 18, 2025. Published online Nov 19, 2025.

doi: 10.21037/qims-2025-1285

Introduction

In 2019, coronavirus disease 2019 (COVID-19) emerged and spread rapidly across the globe, significantly affecting public health and everyday life worldwide (1,2). Its rapid progression to a global pandemic was largely attributed to its high infectivity and the initial lack of effective early identification methods. The COVID-19 pandemic has highlighted a critical gap in our ability to promptly recognize and respond to novel, highly infectious pulmonary diseases that deviate from the clinical patterns of conventional pneumonia. To better describe and classify such diseases, we introduce the term abnormal pulmonary infectious diseases (APIDs).

In this study, APIDs are defined as a distinct subset of pulmonary infections characterized by one or more of the following: (I) high transmissibility with potential for regional or global outbreaks; (II) infections caused by atypical or emerging pathogens rarely seen in routine clinical pneumonia cases; and (III) a level of public health threat due to rapid progression, antimicrobial resistance, or biothreat potential. Representative examples include COVID-19 [caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)] and melioidosis pneumonia (caused by Burkholderia pseudomallei). These diseases differ from more common pulmonary infections such as bacterial, fungal, mycobacterial, or viral pneumonia, which generally follow predictable clinical courses and are better managed by existing diagnostic and therapeutic frameworks. As the emergence of novel APIDs remains a constant threat, it is imperative to develop robust early identification methods to enable timely public health responses and minimize the risk of widespread outbreaks.

To date, research has primarily focused on the analysis of known specific APIDs. For example, Dang et al. (3) proposed the CAD-Unet for the precise segmentation of COVID-19 lesion regions in computed tomography (CT) images. Yang et al. (4) designed a feature fusion method based on a deformable convolutional neural network (CNN) to classify COVID-19, non-COVID-19 pneumonia, and normal chest X-rays. Chen et al. (5) developed a staging method based on CT scoring to predict the progression of melioidosis pneumonia. However, the early identification or detection of unknown APIDs remains challenging.

The unsupervised anomaly detection (UAD) method is trained exclusively on normal samples, and during inference, it detects abnormal samples whose distribution differs from that of normal samples (6), potentially enabling the identification of unknown or rare anomalies. Consequently, UAD offers a promising approach for the early identification of novel APIDs.

Pulmonary CT is a medical imaging technique that uses CT scanners to examine the lungs, providing detailed information about pulmonary structures and lesions. It is a crucial tool for diagnosing pulmonary infections (7,8). Lesions caused by different categories of infections exhibit distinct characteristics in CT images. Lesions that differ from those associated with common pulmonary infections are likely indicative of an abnormal pulmonary infection. Based on this insight and the principle of UAD, we proposed a method for the early identification of APIDs. We first created a pulmonary infection CT (PICT) image sequence dataset, which comprised CT image sequences of various common pulmonary infections, as well as two known abnormal pulmonary infections (i.e., COVID-19 and melioidosis pneumonia). The common infection sequences were labeled as normal samples, while the abnormal infection sequences were labeled as abnormal samples. Details on the composition of the PICT dataset are provided in Table 1, and examples of CT images for various infections are provided in Figure 1.

Table 1

Detailed composition of PICT image sequence dataset

Pulmonary infection type	Pulmonary infection category	Sequence number in training set	Sequence number in validation set	Sequence number in test set
Common pulmonary infection	Bacterial pneumonia	100	50	100
	Fungal pneumonia	69	34	69
	Mycobacterial pneumonia	9	8	8
	Mycoplasma pneumonia	24	12	24
	Tuberculosis	100	50	100
	Viral pneumonia	12	11	11
	Mixed bacterial and mycoplasma pneumonia	6	5	5
	Mixed bacterial and viral pneumonia	3	3	3
	Mixed bacterial, fungal and viral pneumonia	2	2	2
Abnormal pulmonary infection	COVID-19	0	100	201
Abnormal pulmonary infection	Melioidosis pneumonia	0	44	88

COVID-19, coronavirus disease 2019; PICT, pulmonary infection computed tomography.

Figure 1 Examples of CT images for various pulmonary infections. The area outlined in red indicates the region of infection. COVID-19, coronavirus disease 2019; CT, computed tomography.

The PICT training set exclusively comprised CT image sequences of common pulmonary infections and was used to train a UAD network. Conversely, the test set comprised CT image sequences of both common and the two known abnormal pulmonary infections. The ability of the network to accurately detect abnormal infection sequences in the test set revealed its strong anomaly detection capabilities and potential to identify CT image sequences of other unknown abnormal pulmonary infections, enabling the early identification of novel APIDs. Notably, while some studies have focused on UAD in pulmonary medical images—for example, Lu et al. (9) proposed a heterogeneous autoencoder (AE) for differentiating between normal and pneumonia chest X-rays, and Liu et al. (10) introduced the Skip-ST model to distinguish between normal and pneumonia lung CT images—these methods can only detect the presence of pulmonary infection, and cannot determine whether an infection is of a common or anomalous type. Conversely, our study sought to distinguish between common and abnormal pulmonary infections to enable the early identification of both known and previously unseen APIDs.

In CT images of pulmonary infections, lesions caused by different infection categories exhibit distinct imaging characteristics, providing a basis for differentiating between common and abnormal infections. However, due to the subtle nature of these differences, delineating between the two is challenging (11). This is particularly evident for UAD, as relying solely on CT sequences of common infections to learn how to identify abnormal infections presents a significant challenge.

In this study, we proposed a novel UAD method, the local reconstruction autoencoder (LRAE). This method focuses on local regions in pulmonary CT images, adeptly capturing subtle imaging differences between common and abnormal infections, thereby achieving effective differentiation and significantly enhancing the accuracy (ACC) of UAD in PICT image sequences.

The main contributions of this study are as follows:

Developing an early recognition method for APIDs. This method classifies pulmonary infections as common or abnormal, and formulates the task of identifying abnormal infections as a UAD problem, making it applicable not only for detecting known conditions such as COVID-19 and melioidosis pneumonia but also for the early identification of previously unknown APIDs.
Creating the PICT image sequence dataset to support research into early identification of unknown APIDs. This dataset comprises CT image sequences of various common pulmonary infections, as well as sequences of two known abnormal infections, COVID-19 and melioidosis pneumonia.
Developing a novel UAD method, known as the LRAE. By focusing on local regions in CT images of pulmonary infections, this method can effectively differentiate between common and abnormal infection areas.

UAD

UAD aims to identify abnormal samples based on training data that contains only normal samples. Classical UAD methods typically use machine learning techniques to establish a decision boundary based on the feature distribution of available normal samples to distinguish between normal and abnormal data (12-14); however, these methods often struggle to effectively represent complex data distributions.

In recent years, most UAD methods have been based on deep learning, for which reconstruction-based methods represent one of the main approaches. Reconstruction-based methods assume that a deep neural network trained exclusively on normal samples will effectively reconstruct normal samples during inference but will fail to reconstruct anomalies, resulting in significant reconstruction errors that facilitate anomaly detection. AEs encode input images into a latent space and then decode the corresponding latent representations to reconstruct the images, forming the foundation of many reconstruction-based methods. The denoising AE (15) and adversarially regularized autoencoder (ARAE) (16) introduce noise during training, enhancing the robustness of the AE in image reconstruction. Adversarially learned one-class classifier (ALOCC) (17), one-class generative adversarial network (OCGAN) (18), and fast anomaly detection generative adversarial network (f-AnoGAN) (19) combine the AE with the generative adversarial network (GAN) (20) framework to improve the ability of the AE to reconstruct normal images. Sun et al. (21) integrated mutual information maximization and shuffle attention (22) into an adversarial autoencoder (AAE) (23) for UAD.

The ability of conventional AEs to reconstruct the fine details of images is limited. Conversely, the skip connections in U-Net (24) can effectively preserve and transmit high-resolution detail information. Consequently, many AE networks in recent UAD research have incorporated skip connections to enhance image reconstruction quality. Skip-GANomaly (25) employs an AE with skip connections for image reconstruction, enabling it to fully capture the distribution characteristics of normal images at various scales. Puzzle-AE (26) directly adopts U-Net as its AE and introduces a discriminator for generative adversarial training, achieving high-quality image reconstruction. Omni-frequency channel-selection reconstruction generative adversarial network (OCR-GAN) (27) separates the input image into components of different frequencies and reconstructs each component individually based on Skip-GANomaly. Skip-GANomaly++ (28) integrates residual connections (29) into Skip-GANomaly, effectively enhancing anomaly detection performance. Additionally, a heterogeneous AE (9) has been proposed that uses a CNN as its encoder and a hybrid CNN-Transformer network as its decoder. This design enables the model to capture the intrinsic features of normal data while accentuating differences in abnormal samples. In addition to AEs, variational autoencoders (VAEs) (30) are also used for image reconstruction in UAD. Zimmerer et al. (31) used a VAE to achieve unsupervised anomaly localization in medical images.

To achieve accurate anomaly detection, reconstruction-based methods must balance their generalizability to effectively reconstruct normal samples while inhibiting the reconstruction of anomalies. Overfitting leads to an abundance of false positives (FPs), while overgeneralization results in an excess of false negatives (FNs), as illustrated in Figure 2. In CT images of pulmonary infections, the differences among lesions caused by different types of infections are subtle, making it challenging for these methods to determine an appropriate level of generalizability to differentiate between common and abnormal infection areas. In addition, many reconstruction-based methods add random noise to training samples to help the network extract robust features. However, in CT images of pulmonary infections, variations in features such as texture and density within lesion areas are often subtle and can be easily obscured or distorted by noise.

Figure 2 Both overfitting and overgeneralization can lead to poor performance in reconstruction-based UAD methods. The red circle represents the range of accurate reconstruction for the corresponding reconstruction-based UAD method. UAD, unsupervised anomaly detection.

Another major category of deep learning methods for UAD is based on deep feature embedding. Such methods construct a density estimation model for the distribution of normal samples used in training, and assume that during inference, normal samples have a higher likelihood than abnormal samples under this model. Knowledge distillation (32) is one of the main techniques for deep feature embedding. Bergmann et al. (33) were the first to apply knowledge distillation to UAD. Salehi et al. (34) further proposed a multi-resolution feature distillation strategy for UAD. Deng et al. (35) introduced a simple yet efficient reverse distillation (RD) paradigm that further improved UAD performance. Building on RD, Tien et al. (36) proposed an improved method termed RD++, while Liu et al. (10) introduced a novel knowledge distillation paradigm based on RD, referred to as direct reverse knowledge distillation (DRKD). Deep feature embedding-based methods require pre-training on large-scale datasets [e.g., ImageNet (37)], and are thus less feasible than reconstruction-based approaches. Moreover, the substantial semantic gap between the pre-training dataset and the target dataset for UAD limits their interpretability.

In addition to the aforementioned approaches, methods for anomaly detection in multivariate time series have continued to evolve. Audibert et al. (38) proposed a UAD method for multivariate time series, called UAD for multivariate time series (USAD), based on an adversely trained AE architecture. Pietron et al. (39) designed a scalable, multi-level neuroevolution framework known as anomaly detection neuroevolution (AD-NEv), which effectively captures local anomalous patterns in time series data. Building on this, the authors (40) further introduced an improved model, AD-NEv++, and demonstrated that incorporating extended AE structures, such as skip connections and dense connections, can enhance reconstruction performance. Further, Garg et al. (41) conducted a systematic and comprehensive evaluation of unsupervised and semi-supervised deep learning methods for anomaly detection and diagnosis in multivariate time series from cyber-physical systems, providing methodological insights and benchmarking references for future research.

Puzzle-AE and motivation

Unlike deep feature embedding-based methods that require pre-training, reconstruction-based methods are trained end-to-end from scratch. Thus, we focused on using the latter to implement UAD on PICT image sequences. Puzzle-AE (26), inspired by the puzzle-solving (42) pretext task, divides each image into four segments in both training and evaluation, randomly shuffling their order, and treats the image reconstruction as a jigsaw puzzle task, as illustrated in Figure 3. This approach prevents noise from obscuring detailed information in the lesion areas of PICT images.

Figure 3 The schematic diagram of puzzle-autoencoder.

However, due to the coarse granularity of Puzzle-AE, it struggles to focus on the fine details of lesion areas in PICT images, resulting in overgeneralization and difficulties in effectively distinguishing between common and abnormal infection regions. Thud, we proposed a novel pretext task for UAD called the diagonally opposite patches swap (DOPS) mechanism. This mechanism first divides the image into small patches, each of which is further subdivided into four sub-patches. Then, by swapping the positions of the sub-patches in each patch, it disrupts the local structure of the image, thereby transforming the image reconstruction task into the restoration of local regions, as shown in Figure 4. By leveraging finer granularity, the DOPS mechanism guides the AE used for image reconstruction to focus on local regions of the image and their detailed features, thereby preventing overgeneralization in the AE and enabling effective differentiation between common and abnormal pulmonary infection areas. Further, we proposed a local feature fusion (LFF) module to improve the reconstruction ability of the AE for the local regions disrupted by the DOPS mechanism in the PICT images. Since abnormal infection areas are absent during training, this module is unable to effectively facilitate the reconstruction of these areas. Thus, this module magnifies the reconstruction error differences between common and abnormal infection areas, further enhancing the capacity of the AE to distinguish between them.

Figure 4 The DOPS mechanism transforms the image reconstruction task into a local region restoration task. Compared to the puzzle-making method used in the puzzle-autoencoder, the DOPS mechanism offers a finer level of granularity, enabling more precise focus on lesion areas in PICT images and enhancing the model’s ability to perceive lesion features. The red box indicates the area of infection. DOPS, diagonally opposite patches swap; PICT, pulmonary infection computed tomography.

By integrating the AE with the DOPS mechanism and the LFF module, we proposed an innovative UAD method called the LRAE. The LRAE can accurately distinguish between common and abnormal pulmonary infection areas, and thus has promise in the early identification of novel APIDs. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1285/rc).

Methods

The overall architecture of the LRAE

In this study, each sample used for training or evaluation was a sequence of PICT images, and the same network architecture as that of the three-dimensional U-Net (43) was adopted for the AE used to reconstruct images in the sequences. The entire framework of the LRAE is shown in Figure 5. Each patch in every image of the original sequence X is disrupted by the DOPS mechanism to obtain the image sequence with disrupted local regions $\tilde{X}$ , which is then fed into the AE ϕ to produce the reconstructed image sequence $\hat{X}$ . The DOPS mechanism reformulates the image reconstruction task as a local region restoration task, thereby enabling ϕ to more effectively attend to local regions and fine-grained features in the image. Since ϕ is only trained to reconstruct common infection sequences disrupted by the DOPS mechanism, it can effectively reconstruct common infection areas, but struggles to reconstruct abnormal infection areas. The LFF module is integrated into each stage of ϕ’s encoding process. By fusing local features in the feature maps, this module effectively improves ϕ’s ability to reconstruct disrupted common infection sequences.

Figure 5 The entire framework of the proposed LRAE. LRAE, local reconstruction autoencoder.

After training, the LRAE is capable of accurately reconstructing common infection areas while failing to effectively reconstruct abnormal areas, resulting in larger reconstruction errors in those areas, thereby enabling the detection of abnormal infection sequences and APIDs.

The DOPS mechanism

In PICT images, the differences between common and abnormal infection areas are often subtle, making it challenging for reconstruction-based methods to effectively reconstruct common infection areas while suppressing the reconstruction of abnormal infection areas. This is primarily because existing methods: (I) do not pay sufficient attention to local areas and fine-grained features in images; and (II) tend to overgeneralize, making it difficult to capture the fine-grained distinctions required for this task. To address these issues, we proposed a DOPS mechanism, which disrupts each local patch of the image, thereby transforming the image reconstruction task into a local region restoration task. This encourages the AE to focus more effectively on each local region of the image and its detailed features. Meanwhile, the fine-grained design of the DOPS mechanism helps prevent the AE from overgeneralizing during the reconstruction of pulmonary infection areas.

Suppose the original PICT image sequence X is of F images $X_{f} \in ℝ^{H \times W}, f = 0, 1, \dots, F - 1$ . Each image X_f, of height H and width W, is divided into M×N rectangular patches of height 2a and width 2b, with $M = ⌊ H / (2 a) ⌋$ and $N = ⌊ W / (2 b) ⌋$ . Then, as illustrated in Figure 6, the DOPS scheme D¹ (D²) is defined on X by dividing each patch into 2×2 sub-patches, and swapping the diagonal (anti-diagonal) pairs; that is:

${\tilde{X}}^{i} = D^{i} (X), i = 1, 2$ [1]

${\tilde{X}}_{f}^{1} (h, w) = {\begin{array}{l} X_{f} (h + a, w + b), & if 2 a m \leq h < 2 a m + a and 2 b n \leq w < 2 b n + b \\ X_{f} (h - a, w - b), & if 2 a m + a \leq h < 2 a m + 2 a and 2 b n + b \leq w < 2 b n + 2 b \\ X_{f} (h, w), & otherwise \end{array}$ [2]

${\tilde{X}}_{f}^{2} (h, w) = {\begin{array}{l} X_{f} (h + a, w + b), & if 2 a m \leq h < 2 a m + a and 2 b n + b \leq w < 2 b n + 2 b \\ X_{f} (h - a, w + b), & if 2 a m + a \leq h < 2 a m + 2 a and 2 b n \leq w < 2 b n + b \\ X_{f} (h, w), & otherwise \end{array}$ [3]

Figure 6 The schematic diagram of the DOPS mechanism. DOPS, diagonally opposite patches swap.

During training, the entire original sequence X is randomly operated on the DOPS D¹ or D² before being fed into the AE. Neither randomly shuffling the order of sub-patches nor swapping the sub-patches along the main and anti-diagonal directions is employed to disrupt each patch in every image of the sequence, as these methods would excessively disrupt the structure of the images in the sequence, making it difficult for the AE to reconstruct the entire sequence. In the DOPS mechanism, each patch represents a local region of the image, with its size determined by the parameters 2a and 2b. By swapping the positions of sub-patches in each patch, the DOPS mechanism introduces perturbations to the local structures of the image. The AE is then trained to restore these disrupted patches, thereby transforming the global image reconstruction task into a set of local region restoration tasks. As the AE is only trained to reconstruct the disrupted common infection sequences, it can effectively reconstruct common infection areas while failing to reconstruct abnormal ones.

LFF module

To improve the ability of the AE to reconstruct local regions disrupted by the DOPS mechanism in common PICT image sequences, we further proposed a LFF module. As shown in Figure 5, this module is integrated into each stage of the AE’s encoding process. At each encoding stage, this module performs three parallel LFF operations on the feature map. Each operation fuses local features at the same scale, but the three operations operate at different scales. Specifically, the fused features are derived from individual patches, 2×2 patch regions, and 4×4 patch regions of the input, with all patches defined based on the DOPS mechanism, as illustrated in Figure 7. The LFF module effectively enhances the AE’s overall perception of intra-patch structures and inter-patch relationships by fusing multi-scale local features from individual patches, 2×2 patches, and 4×4 patches. This, in turn, improves the reconstruction ACC of details in patches disrupted by the DOPS mechanism.

Figure 7 During the encoding stage of the AE, the LFF module fuses multi-scale local features of the feature map I corresponding to the PICT image sequence X. These local features are respectively extracted from individual patches, 2×2 patch regions, and 4×4 patch regions in X. AE, autoencoder; LFF, local feature fusion; PICT, pulmonary infection computed tomography.

The schematic diagram of the LFF module is shown in Figure 8. Specifically, for the given feature map $I \in ℝ^{C \times O \times P \times Q}$ , where C, O, P, and Q denote the number of channels, depth, height, and width, respectively, three convolution kernels of different sizes are applied in parallel to fuse its local features, resulting in three separate feature maps:

$I_{r} = {Conv}_{(3, k_{r}^{1}, k_{r}^{2})}^{(1, k_{r}^{1}, k_{r}^{2})} (I)$ [4]

$I_{s} = {Conv}_{(3, k_{s}^{1}, k_{s}^{2})}^{(1, k_{s}^{1}, k_{s}^{2})} (I)$ [5]

$I_{t} = {Conv}_{(3, k_{t}^{1}, k_{t}^{2})}^{(1, k_{t}^{1}, k_{t}^{2})} (I)$ [6]

Figure 8 The schematic diagram of the LFF module. Conv, convolution; DeConv, deconvolution; LFF, local feature fusion.

where,

$k_{r}^{1} = \frac{2 a P}{H}, k_{r}^{2} = \frac{2 b Q}{W}$ [7]

$k_{s}^{1} = \frac{4 a P}{H}, k_{s}^{2} = \frac{4 b Q}{W}$ [8]

$k_{t}^{1} = \frac{8 a P}{H}, k_{t}^{2} = \frac{8 b Q}{W}$ [9]

$C o n v_{A}^{B} (\cdot)$ represents a three-dimensional convolution operation with a kernel size of A and a stride of B, $I_{r} \in ℝ^{C \times O \times \frac{P}{k_{r}^{1}} \times \frac{Q}{k_{r}^{2}}}$ , $I_{s} \in ℝ^{C \times O \times \frac{P}{k_{s}^{1}} \times \frac{Q}{k_{s}^{2}}}$ , and $I_{t} \in ℝ^{C \times O \times \frac{P}{k_{t}^{1}} \times \frac{Q}{k_{t}^{2}}}$ . Therefore, I_r, I_s, and I_t respectively fuse the features of each patch, each 2×2 patch region, and each 4×4 patch region in the PICT image sequence. Subsequently, three deconvolution operations are used to restore the height and width of I_r, I_s, and I_t to match those of I, resulting in I_R, I_S, and I_T, respectively:

$I_{R} = {DeConv}_{(1, k_{r}^{1}, k_{r}^{2})}^{(1, k_{r}^{1}, k_{r}^{2})} (I_{r})$ [10]

$I_{S} = {DeConv}_{(1, k_{s}^{1}, k_{s}^{2})}^{(1, k_{s}^{1}, k_{s}^{2})} (I_{s})$ [11]

$I_{T} = {DeConv}_{(1, k_{t}^{1}, k_{t}^{2})}^{(1, k_{t}^{1}, k_{t}^{2})} (I_{t})$ [12]

where $D e C o n v_{J}^{K} (\cdot)$ represents a three-dimensional deconvolution operation with a kernel size of J and a stride of K. Then, we aggregate I_R, I_S, and I_T:

$I_{U} = {Conv}_{(1, 1, 1)}^{(1, 1, 1)} (Concat [I_{R}, I_{S}, I_{T}])$ [13]

The resulting feature map $I_{U} \in ℝ^{C \times O \times P \times Q}$ comprehensively integrates features of each patch, each 2×2 patch region, and each 4×4 patch region in the sequence. Finally, we obtain the output feature map:

$I_{o u t} = I + I_{U}$ [14]

I_out is used instead of I as the input to the subsequent encoding stage.

Since the training dataset contains only CT image sequences of common pulmonary infections, the LFF module lacks the ability to effectively learn the features of abnormal infection areas. As a result, its feature fusion performance in these areas is limited, making it unable to effectively enhance the AE’s reconstruction ability for such areas.

Loss function

The LRAE is trained from scratch in an end-to-end manner using only common PICT image sequences. To train the AE ϕ to accurately reconstruct common sequences disrupted by the D¹ or D² operation in the DOPS mechanism, the loss function is defined as:

$\begin{matrix} L_{rec} = E_{X \sim p_{x}} {‖ ϕ (D^{i} (X)) - X ‖}_{2}, i = 1 or 2 \end{matrix}$ [15]

where p_x denotes the distribution of common sequences, ${‖ \cdot ‖}_{2}$ is $l_{2}$ error, and either the D¹ or D² operation is selected at random.

Anomaly score

During evaluation, the PICT image sequence is separately disrupted using the D¹ and D² operations of the DOPS mechanism, generating two distinct disrupted sequences. Each of these is then fed into the trained AE to produce two reconstructed sequences, which exhibit subtle differences. We compute the $l_{2}$ errors between the original sequence X and each of the corresponding reconstructed sequences, and their average S serves as the anomaly score for X:

$S = \frac{1}{2} \sum_{i = 1}^{2} {‖ ϕ (D^{i} (X)) - X ‖}_{2}$ [16]

As it is trained only on common PICT image sequences and not abnormal image sequences, the LRAE cannot effectively reconstruct abnormal sequences and thus will achieve a higher anomaly score for the abnormal sequence during evaluation.

Dataset

To implement the anomaly detection of pulmonary infections, we compiled a PICT image sequence dataset. All the CT image sequences in PICT were collected from The First Affiliated Hospital of Hainan Medical University and The Third People’s Hospital of Longgang District Shenzhen in China, and carefully classified by pathologists. The specific composition of the PICT dataset is shown in Table 1. The PICT dataset was divided into a training set, a validation set, and a test set, where each case corresponds to a single CT image sequence. Rather than using cross-validation, the dataset was randomly split in this manner, as in the UAD task, the training set comprises only normal samples. In this context, cross-validation would merely partition the normal samples and would thus have little value in evaluating the ability of the model to detect anomalies. The training set comprised 325 sequences of common infections, comprising a total of 12,491 slices. The validation set comprised 175 sequences of common infections (6,658 slices) and 144 sequences of abnormal infections, including 100 sequences of COVID-19 (4,435 slices), and 44 sequences of melioidosis pneumonia (1,501 slices). The test set comprised 322 sequences of common infections (12,169 slices) and 289 sequences of abnormal infections, including 201 sequences of COVID-19 (8,873 slices) and 88 sequences of melioidosis pneumonia (3,039 slices).

The CT scans were acquired with a slice thickness of 5 mm and a tube voltage of 120 kVp. The CT images in the PICT dataset originally had a resolution 512×512 and were downscaled to 256×256 to meet our requirements. Each CT sequence contained 15 to 63 slices and was padded to 64 slices using images with pixel values of 0, added evenly at both ends, to ensure consistent input length for training and evaluation. During evaluation, the padded images were excluded from the anomaly score calculation. Additionally, all image pixel values were scaled to the range [0, 1] to standardize the model input.

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the ethics committees of The First Affiliated Hospital of Hainan Medical University (No. 2024-KYL-088) and The Third People’s Hospital of Longgang District Shenzhen (No. AF/SC-08/01.2), and the requirement of individual consent for this retrospective analysis was waived.

Implementation

All experiments were conducted using the PyTorch 1.13.1 (44) deep learning framework on a system equipped with an Intel Core i9-12900K CPU and an NVIDIA GeForce RTX 3090 GPU, running Ubuntu 22.04.1. The model was trained using the Adam (45) optimizer with a learning rate set to 0.0001. To improve generalization, data augmentation strategies such as random horizontal flipping, rotation (±10 degrees), and intensity scaling (±10%) were employed. Hyperparameters, including the patch size in the DOPS mechanism, were manually tuned based on the performance of the LRAE in the validation set. Specifically, based on our research experience, we set 2a=2b=16 to yield the best results.

Evaluation metrics

In this article, TP and TN are abbreviations for true positive and true negative, respectively. The area under the curve (AUC), F1-score, which was calculated as F1-score = 2TP/(2TP + FN + FP), and ACC, which was calculated as ACC = (TP + TN)/(TP + TN + FN + FP), were used for model evaluation. The AUC represents the AUC of the receiver operating characteristic (ROC) curve across different operating thresholds. The Y-axis of the ROC curve is the TP rate (TPR), which was calculated as TPR = TP/(TP + FN), and its X-axis is the FP rate (FPR), which was calculated as FPR = FP/(FP + TN). As an evaluation metric for UAD, the AUC is more comprehensive than the F1-score and ACC. However, when dealing with imbalanced datasets, the AUC and F1-score are more reliable than ACC. The operating threshold for the F1-score and ACC was determined based on the optimal F1-score value.

Results

Comparison with other methods

To validate the effectiveness of the proposed LRAE in UAD for the PICT image sequences, using the PICT dataset, we conducted comparative experiments between the LRAE and the following eight state-of-the-art UAD methods: f-AnoGAN (19), VAE (31), Skip-GANomaly (25), Puzzle-AE (26), ARAE (16), AAE (21), OCR-GAN (27), and Skip-GANomaly++ (28). All the comparative methods are based on CNNs. Specifically, ARAE is based on a conventional convolutional AE, VAE uses a CNN-based variational AE architecture, f-AnoGAN and AAE are adversarial AEs built on CNNs, while Skip-GANomaly, Puzzle-AE, OCR-GAN, and Skip-GANomaly++ are based on CNN AEs with skip connections. Given that our training dataset contained only CT sequences of common pulmonary infections, it was not feasible to train supervised classification models that require labels for both common and abnormal infection types. Moreover, supervised models are not equipped to handle previously unseen infection patterns, making them unsuitable for the detection of novel APIDs. Therefore, performance comparisons with existing supervised methods were not conducted in this study. The experimental results are presented in Table 2.

“Common + COVID-19” refers to the first test, which assesses the ability of the model to detect COVID-19 CT image sequences using both common pulmonary infection and COVID-19 CT image sequences;
“Common + melioidosis” refers to the second test, which assesses the ability of the model to detect melioidosis pneumonia CT image sequences using common pulmonary infection and melioidosis pneumonia CT image sequences;
“Common + COVID-19 + melioidosis” refers to the third test, which assesses the ability of the model to detect two types of abnormalities using common pulmonary infection, COVID-19, and melioidosis pneumonia CT image sequences, with the latter two categorized as abnormal infections.

Table 2

Comparison of the anomaly detection performance of the proposed method and the other methods using the PICT dataset

Methods	Common + COVID-19			Common + melioidosis			Common + COVID-19 + melioidosis
Methods	AUC	F1-score	ACC	AUC	F1-score	ACC	AUC	F1-score	ACC
f-AnoGAN	0.7379	0.6604	0.6520	0.8073	0.5603	0.7244	0.7590	0.7305	0.6727
VAE	0.6611	0.6184	0.5966	0.7974	0.5483	0.7146	0.7027	0.7034	0.6481
Skip-GANomaly	0.7684	0.6737	0.7055	0.7410	0.5000	0.6976	0.7601	0.7252	0.6874
Puzzle-AE	0.7225	0.6232	0.6577	0.8141	0.5703	0.7317	0.7504	0.7172	0.7005
ARAE	0.7151	0.6618	0.6424	0.7838	0.5546	0.7415	0.7360	0.7222	0.6727
AAE	0.7459	0.6763	0.7017	0.7854	0.5188	0.6878	0.7579	0.7340	0.6809
OCR-GAN	0.7934	0.7044	0.7304	0.8111	0.5537	0.7366	0.7987	0.7524	0.7414
Skip-GANomaly++	0.7854	0.6851	0.7170	0.8452	0.5939	0.7732	0.8036	0.7535	0.7430
LRAE (ours)	0.8269^†	0.7242^†	0.7801^†	0.8716^†	0.6415^†	0.8146^†	0.8405^†	0.7796^†	0.7741^†

^†, the best performance. AAE, adversarial autoencoder; ACC, accuracy; AE, autoencoder; ARAE, adversarially regularized autoencoder; AUC, area under the curve; COVID-19, coronavirus disease 2019; f-AnoGAN, fast anomaly detection generative adversarial network; LRAE, local reconstruction autoencoder; OCR-GAN, omni-frequency channel-selection reconstruction generative adversarial network; PICT, pulmonary infection computed tomography; VAE, variational autoencoder.

As shown in Table 2, the LRAE achieved the highest values across all metrics in the three tests, indicating the effectiveness of the LRAE in detecting abnormal PICT image sequences. Notably, the LRAE significantly outperformed Puzzle-AE in detecting both COVID-19 CT image sequences and melioidosis pneumonia CT image sequences. Thus, compared to Puzzle-AE, the finer granularity of the LRAE allows it to focus more precisely on local regions of images, thereby more effectively distinguishing between common and abnormal pulmonary infection areas. In the first test, OCR-GAN performed second best; however, in the second test, Skip-GANomaly++ surpassed OCR-GAN, achieving the second-best performance. This indicates that other methods struggle to consistently maintain outstanding performance in detecting both COVID-19 and melioidosis pneumonia CT image sequences. Conversely, the LRAE achieved the best performance in all three tests, demonstrating stability and good generalizability, which is essential for detecting unknown abnormal PICT image sequences.

Notably, in the three tests, the F1-score of all methods reached its lowest value in the second test and its highest value in the third test. This was mainly because the number of abnormal samples in the second test was the lowest, at only 88, while the number of normal samples was 322, resulting in an imbalanced dataset. This imbalance caused the precision to remain low even when the FPR was low, thereby reducing the F1-score. However, in the third test, the abnormal samples included all the abnormal samples from the first two tests, totaling 289, which was roughly equal to the number of normal samples. This mitigated the sample imbalance issue, leading to improved precision and a corresponding increase in the F1-score.

The ROC curves of the LRAE and the compared methods in the third test are shown in Figure 9. Notably, at low FPRs, the TPR of the LRAE was significantly higher than that of the other methods, indicating its significant performance advantage under low FP conditions. Although the TPR of the LRAE began to fall below that of some methods as the FPR increased, its TPR at this stage already exceeded 0.9, which is sufficient to meet most requirements in anomaly detection. Additionally, the AUC value of the ROC curve of the LRAE was higher than that of the other methods, further demonstrating its overall performance superiority.

Figure 9 ROC curves of our method and the compared methods in the third test. AAE, adversarial autoencoder; AE, autoencoder; ARAE, adversarially regularized autoencoder; AUC, area under the curve; f-AnoGAN, fast anomaly detection generative adversarial network; LRAE, local reconstruction autoencoder; OCR-GAN, omni-frequency channel-selection reconstruction generative adversarial network; ROC, receiver operating characteristic; VAE, variational autoencoder.

We also evaluated the sensitivity (SEN) and specificity (SPE) of the LRAE and the comparative methods. The SEN is equivalent to the TPR, while the SPE equals 1 − FPR. Since the ROC curve depicts the relationship between TPR and FPR across different thresholds, the SEN and SPE correspond to the coordinates of a specific threshold on the ROC curve. To mitigate the impact of data imbalance, we selected the threshold that maximizes min(SEN, SPE) as the final threshold for calculating both metrics. The detailed results are presented in Table 3. Notably, the LRAE achieved the highest SEN and SPE values in all three tests, indicating that the LRAE not only sensitively detected abnormal PICT image sequences but also effectively avoided misclassifying common PICT image sequences as abnormal. These results further validated the effectiveness of the LRAE.

Table 3

Comparison of the SEN and SPE of the LRAE and other methods using the PICT dataset

Methods	Common + COVID-19		Common + melioidosis		Common + COVID-19 + melioidosis
Methods	SEN	SPE	SEN	SPE	SEN	SPE
f-AnoGAN	0.6866	0.6894	0.7273	0.7236	0.7093	0.705
VAE	0.6269	0.6273	0.7159	0.7112	0.6505	0.646
Skip-GANomaly	0.6915	0.6925	0.7045	0.6957	0.6955	0.6957
Puzzle-AE	0.6667	0.6646	0.7386	0.7329	0.699	0.6925
ARAE	0.6617	0.6522	0.75	0.7391	0.6747	0.677
AAE	0.6965	0.6894	0.7159	0.6957	0.6955	0.6925
OCR-GAN	0.7313	0.7267	0.7614	0.7298	0.7336	0.7298
Skip-GANomaly++	0.7313	0.7298	0.7727	0.7733	0.7439	0.7391
LRAE (ours)	0.7612^†	0.7609^†	0.7841^†	0.7857^†	0.7682^†	0.7702^†

^†, the best performance. AAE, adversarial autoencoder; AE, autoencoder; ARAE, adversarially regularized autoencoder; COVID-19, coronavirus disease 2019; f-AnoGAN, fast anomaly detection generative adversarial network; LRAE, local reconstruction autoencoder; OCR-GAN, omni-frequency channel-selection reconstruction generative adversarial network; PICT, pulmonary infection computed tomography; SEN, sensitivity; SPE, specificity; VAE, variational autoencoder.

Ablation study

The patch size in the DOPS mechanism directly affects the granularity of the proposed method, which in turn influences its generalizability on the test data. In this study, the method had to achieve a suitable level of generalization to effectively reconstruct common pulmonary infection areas while hindering the reconstruction of abnormal areas. To determine the optimal patch size for the DOPS mechanism, we experimented with various patch sizes and integrated the DOPS mechanism with these sizes into the AE for comparative experiments on the PICT dataset. The experimental results are shown in Table 4. Notably, in all three tests, the AE integrated with the DOPS mechanism using a 16×16 patch size achieved the best performance across all metrics. Therefore, this study selected 16×16 as the optimal patch size for the DOPS mechanism.

Table 4

Experimental results of the AE integrated with the DOPS mechanism with different patch sizes using the PICT dataset

Test dataset	Metric	Patch size
Test dataset	Metric	8×8	16×16	32×32
Common + COVID-19	AUC	0.7873	0.7959^†	0.7777
	F1-score	0.6822	0.6983^†	0.6712
	ACC	0.7400	0.7572^†	0.7228
Common + melioidosis	AUC	0.8144	0.8380^†	0.8225
	F1-score	0.5650	0.6009^†	0.5792
	ACC	0.7634	0.7829^†	0.7732
Common + COVID-19 + melioidosis	AUC	0.7956	0.8087^†	0.7914
	F1-score	0.7422	0.7500^†	0.7397
	ACC	0.7021	0.7381^†	0.7316

^†, the best performance. ACC, accuracy; AE, autoencoder; AUC, area under the curve; COVID-19, coronavirus disease 2019; DOPS, diagonally opposite patches swap; PICT, pulmonary infection computed tomography.

We incorporated the LFF module into the AE integrated with the DOPS mechanism to construct the LRAE. The performance comparison between the LRAE (AE + DOPS mechanism with a patch size of 16×16 + LFF module) and the AE + DOPS mechanism with a patch size of 16×16 using the PICT dataset is shown in Table 5. Notably, the LFF module significantly improved the performance of the latter, thereby demonstrating the effectiveness of the LFF module.

Table 5

Comparison of the performance of the LRAE (AE + DOPS mechanism with a patch size of 16×16 + LFF module) and AE + DOPS mechanism with a patch size of 16×16 using the PICT dataset

Test dataset	Metric	AE + DOPS mechanism with a patch size of 16×16	LRAE (AE + DOPS mechanism with a patch size of 16×16 + LFF module)
Common + COVID-19	AUC	0.7959	0.8269^†
	F1-score	0.6983	0.7242^†
	ACC	0.7572	0.7801^†
Common + melioidosis	AUC	0.8380	0.8716^†
	F1-score	0.6009	0.6415^†
	ACC	0.7829	0.8146^†
Common + COVID-19 + melioidosis	AUC	0.8087	0.8405^†
	F1-score	0.7500	0.7796^†
	ACC	0.7381	0.7741^†

^†, the best performance. AUC, area under the curve; ACC, accuracy; AE, autoencoder; COVID-19, coronavirus disease 2019; DOPS, diagonally opposite patches swap; LFF, local feature fusion; LRAE, local reconstruction autoencoder; PICT, pulmonary infection computed tomography.

Analysis of histogram

To further validate the effectiveness of the proposed LRAE in detecting abnormal PICT image sequences, we visualized the anomaly scores of common and abnormal PICT image sequences across the three tests, as shown in Figure 10. Notably, the anomaly scores of common sequences were generally low, while those of abnormal sequences were significantly higher. This indicates that the proposed LRAE was able to effectively distinguish between common and abnormal sequences based on their anomaly scores.

Figure 10 Comparison of the anomaly scores between common and abnormal PICT image sequences across the three tests. (A) The first test. (B) The second test. (C) The third test. PICT, pulmonary infection computed tomography.

Visual analysis

To detect abnormal PICT image sequences, the model should effectively reconstruct common pulmonary infection areas while struggling to reconstruct abnormal infection areas, such that the anomaly scores of common PICT image sequences will be relatively low, while those of abnormal sequences will be higher. To intuitively demonstrate the effectiveness of the proposed LRAE, we conducted a qualitative visual analysis of its performance using the test set. Specifically, for common sequences in the test set, we visualized the reconstruction results produced by the LRAE. For abnormal sequences, we further generated heatmaps by computing the differences between the original images and their corresponding reconstructions, highlighting the anomalous regions. Selected visualization examples are shown in Figure 11. By comparing the CT images of common pulmonary infections with their reconstructions using the LRAE in this figure, it was observed that the LRAE was capable of accurately reconstructing the detailed features of common infection areas, resulting in relatively small reconstruction errors in these areas. Conversely, in the reconstruction error heatmaps of abnormal PICT images, the abnormal infection areas were predominantly displayed in red and yellow, while other areas appeared mostly in blue or green. This indicates that the abnormal infection areas exhibited significant reconstruction errors, reflecting the inability of the LRAE to effectively reconstruct these areas. These visualization results demonstrate that the proposed LRAE can effectively distinguish between common and abnormal pulmonary infection areas, thereby enabling the accurate detection of abnormal PICT image sequences.

Figure 11 Qualitative visualization results. The first two rows show CT images of common pulmonary infections and their corresponding LRAE reconstruction results; the last two rows show CT images of abnormal pulmonary infections and the reconstruction error heatmaps between them, and the corresponding reconstruction results by the LRAE. In the heatmap, the reconstruction errors correspond to colors from low to high in the order of blue, green, yellow, and red. D¹ and D² denote the use of operations D¹ and D², respectively, in the DOPS mechanism during inference. In the original image, the area outlined in red indicates the region of infection. COVID-19, coronavirus disease 2019; CT, computed tomography; DOPS, diagonally opposite patches swap; LRAE, local reconstruction autoencoder.

Discussion

Overview of important findings

This study introduced a novel UAD method, the LRAE, for the early identification of APIDs using CT image sequences. This approach leveraged common PICT image sequences as normal data to train the UAD method, enabling it to detect both known and previously unseen APIDs. The proposed DOPS mechanism and LFF module were central to the improved anomaly detection performance: DOPS allowed the network to focus on local lesion areas, enhancing anomaly identification, while LFF strengthened the model’s reconstruction ability on normal samples, increasing the contrast in anomaly scores. The experimental results on the newly constructed PICT dataset validated the effectiveness of the LRAE, which outperformed several state-of-the-art UAD methods in identifying abnormal infection patterns.

Comparison with previous work

Previous studies on pulmonary infection diagnosis have largely focused on supervised learning approaches, which require extensive labeled datasets and are limited to detecting known diseases. Conversely, our UAD-based method does not rely on abnormal annotations, making it better suited for the early detection of rare or emerging APIDs. Additionally, many existing UAD methods require that noise be added to the samples during training, which may disrupt or obscure subtle features of lesion areas in PICT images. Conversely, the proposed LRAE transforms the image reconstruction task into a local region restoration task. This not only effectively avoids interference from noise in the lesion areas but also enhances attention to local detail features in the images, thereby facilitating better differentiation between common and abnormal pulmonary infections.

Practical implications

The proposed method holds significant potential for real-world clinical applications, especially in scenarios where newly emerging or rare pulmonary infections may not yet be documented in labeled datasets. By using commonly available CT sequences of known infections as training data, this framework offers a scalable and efficient diagnostic tool that can operate under limited annotation conditions. In the context of global health emergencies such as COVID-19 or future unknown outbreaks, early anomaly detection methods like LRAE could play a crucial role in facilitating the rapid screening and triaging of patients. Additionally, the proposed LRAE can be integrated into the clinical workflow as a Picture Archiving and Communication System (PACS)-compatible auxiliary tool. After a patient undergoes a routine chest CT scan, the image sequence can be automatically input into our model. The model reconstructs the CT sequence and two types of results are produced:

A quantitative anomaly score at the sequence level, computed by measuring the difference between the original and reconstructed image sequences. If this score exceeds a predefined threshold, it serves as an early warning signal of possible APIDs.
Qualitative anomaly heatmaps at the pixel level, which highlight localized regions with significant reconstruction errors. These interpretable outputs can assist radiologists to identify areas of concern.

By offering both sequence-level alerts and pixel-level visualizations, the system can improve the early detection of APIDs and serve as a valuable decision-support tool in clinical practice. Further, the design of the PICT dataset lays the foundation for continued research and benchmarking in anomaly detection for infectious lung diseases.

Analysis of limitations

This study had several limitations. First, the PICT dataset used for training and validation was relatively small and was collected from a limited number of institutions, which may introduce potential selection bias and limit the generalizability of the model. Second, the method has not yet been externally validated across multiple centers, which is essential to confirm its robustness in broader clinical contexts. Third, although our approach was designed to detect both known and unknown APIDs, further evaluation is needed on truly unknown APID categories to confirm the method’s adaptability. Finally, variability in CT acquisition parameters, such as scanner type, slice thickness, and reconstruction algorithms, may affect model performance and should be accounted for in future studies.

Conclusions

UAD has the potential to identify unknown or rare anomalies. Based on this framework, this study proposed an early identification method for APIDs. Specifically, the proposed method uses CT image sequences of common pulmonary infections as normal samples to train the UAD network, which then detects abnormal PICT image sequences during inference. This method not only detects known APIDs but also identifies previously unknown APIDs. To facilitate training and evaluation, we constructed the PICT dataset, which comprises sequences of various common infections and two categories of known abnormal infections (COVID-19 and melioidosis pneumonia). Moreover, we designed a novel UAD network named the LRAE. Specifically, we introduced the DOPS mechanism, which decomposes the global reconstruction task of an image into a local restoration task. This approach enables the network to focus on lesion areas at a finer granularity, thereby enhancing the separability between abnormal and common infection areas. Additionally, we introduced the LFF module, which fuses multi-scale local features to enhance the reconstruction capability of the AE for common infection sequences, thus amplifying the anomaly score gap between normal and abnormal samples. Comparative experiments with several state-of-the-art UAD methods using the PICT dataset demonstrated the effectiveness of the LRAE in detecting abnormal infection sequences.

In the future, our research will focus on several key aspects. First, we will expand the PICT dataset through collaborations with additional hospitals across different regions to increase the sample size and improve coverage of APID categories. Second, we intend to conduct systematic external validation using multi-center datasets acquired using different CT scanners and protocols to assess the robustness of the proposed method in diverse real-world clinical environments. Third, we will incorporate domain adaptation techniques and robust training strategies to mitigate the effect of variability in CT acquisition parameters. Fourth, we intend to evaluate the model’s generalizability on emerging or less common APID categories by including newly identified infections as they become available in clinical practice. In addition, we will also explore model-level enhancements. Specifically, ensemble-based strategies that combine multiple reconstruction-based models to capture diverse abnormal patterns may further improve the robustness and generalizability of the LRAE. Further, approaches such as self-supervised pre-training and the incorporation of prior clinical knowledge may help enhance model detection performance.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1285/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1285/dss

Funding: This work was supported by Key Field R&D Plan Project of Guangdong Province (No. 2020B0101130019), Natural Science Foundation of Chongqing (No. CSTB2024NSCQ-MSX0314), Shenzhen Longgang District Innovation and Technology Special Fund (Nos. LGWJ2023-120 and LGKCYLWS2024-27), Key Project of Education Department of Hainan Province (No. Hnky2023ZD-9), and Shenzhen Health Economics Society 2025 Project, China (No. 202504).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1285/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of The First Affiliated Hospital of Hainan Medical University (No. 2024-KYL-088) and The Third People’s Hospital of Longgang District Shenzhen (No. AF/SC-08/01.2), and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Mathew D, Giles JR, Baxter AE, Oldridge DA, Greenplate AR, Wu JE, et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science 2020;369:eabc8511. [Crossref] [PubMed]
Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN. Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med 2021;4:15. [Crossref] [PubMed]
Dang Y, Ma W, Luo X, Wang H. CAD-Unet: A capsule network-enhanced Unet architecture for accurate segmentation of COVID-19 lung infections from CT images. Med Image Anal 2025;103:103583. [Crossref] [PubMed]
Yang D, Ren G, Ni R, Huang YH, Lam NFD, Sun H, Wan SBN, Wong MFE, Chan KK, Tsang HCH, Xu L, Wu TC, Kong FS, Wáng YXJ, Qin J, Chan LWC, Ying M, Cai J. Deep learning attention-guided radiomics for COVID-19 chest radiograph classification. Quant Imaging Med Surg 2023;13:572-84. [Crossref] [PubMed]
Chen Y, He D, Wu Y, Li X, Yang K, Zhan Y, Chen J, Zhou X. A new computed tomography score-based staging for melioidosis pneumonia to predict progression. Quant Imaging Med Surg 2024;14:3863-74. [Crossref] [PubMed]
Guo J, Lu S, Jia L, Zhang W, Li H. Encoder-Decoder Contrast for Unsupervised Anomaly Detection in Medical Images. IEEE Trans Med Imaging 2024;43:1102-12. [Crossref] [PubMed]
Ren Q, Zhou B, Tian L, Guo W. Detection of COVID-19 With CT Images Using Hybrid Complex Shearlet Scattering Networks. IEEE J Biomed Health Inform 2022;26:194-205. [Crossref] [PubMed]
Chu Y, Wang J, Xiong Y, Gao Y, Liu X, Luo G, Gao X, Zhao M, Huang C, Qiu Z, Meng X. Point-annotation supervision for robust 3D pulmonary infection segmentation by CT-based cascading deep learning. Comput Biol Med 2025;187:109760. [Crossref] [PubMed]
Lu S, Zhang W, Zhao H, Liu H, Wang N, Li H. Anomaly Detection for Medical Images Using Heterogeneous Auto-Encoder. IEEE Trans Image Process 2024;33:2770-82. [Crossref] [PubMed]
Liu M, Jiao Y, Chen H. Skip-st: Anomaly detection for medical images using student-teacher network with skip connections. In: 2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE; 2023:1-5.
Dong A, Liu J, Lv G, Cheng J. GLMR-Net: global-to-local mutually reinforcing network for pneumonia segmentation and classification. Pattern Recognit 2025;162:111371.
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput 2001;13:1443-71. [Crossref] [PubMed]
Tax DM, Duin RP. Support vector data description. Mach Learn 2004;54:45-66.
Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui SA, Binder A, Müller E, Kloft M. Deep one-class classification. In: International Conference on Machine Learning. PMLR; 2018:4393-402.
Vincent P, Larochelle H, Bengio Y, Manzagol PA. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. 2008:1096-103.
Salehi M, Arya A, Pajoum B, Otoofi M, Shaeiri A, Rohban MH, Rabiee HR. ARAE: Adversarially robust training of autoencoders improves novelty detection. Neural Netw 2021;144:726-36. [Crossref] [PubMed]
Sabokrou M, Khalooei M, Fathy M, Adeli E. Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:3379-88.
Perera P, Nallapati R, Xiang B. OCGAN: one-class novelty detection using GANs with constrained latent representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:2898-906.
Schlegl T, Seeböck P, Waldstein SM, Langs G, Schmidt-Erfurth U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 2019;54:30-44. [Crossref] [PubMed]
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of the International Conference on Neural Information Processing Systems. 2014;2:2672-80.
Sun L, He M, Wang N, Wang H. Improving autoencoder by mutual information maximization and shuffle attention for novelty detection. Appl Intell 2023;53:17747-61.
Zhang QL, Yang YB. SA-Net: shuffle attention for deep convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2021:2235-9.
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B. Adversarial autoencoders. arXiv:1511.05644 [Preprint]. 2015. Available online: https://arxiv.org/abs/1511.05644
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing; 2015:234-41.
Akçay S, Atapour-Abarghouei A, Breckon TP. Skip-ganomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019:1-8.
Salehi M, Eftekhar A, Sadjadi N, Rohban MH, Rabiee HR. Puzzle-ae: Novelty detection in images through solving puzzles. arXiv:2008.12959 [Preprint]. 2020. Available online: https://arxiv.org/abs/2008.12959
Liang Y, Zhang J, Zhao S, Wu R, Liu Y, Pan S. Omni-Frequency Channel-Selection Representations for Unsupervised Anomaly Detection. IEEE Trans Image Process 2023;32:4327-40. [Crossref] [PubMed]
Park JY, Hong JR, Kim MH, Kim TJ. Skip-GANomaly++: skip connections and residual blocks for anomaly detection (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence. 2024;38:23615-7.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-8.
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv:1312.6114 [Preprint]. 2013. Available online: https://arxiv.org/abs/1312.6114
Zimmerer D, Isensee F, Petersen J, Kohl S, Maier-Hein K. Unsupervised anomaly localization using variational auto-encoders. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing; 2019:289-97.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 [Preprint]. 2015. Available online: https://arxiv.org/abs/1503.02531
Bergmann P, Fauser M, Sattlegger D, Steger C. Uninformed students: student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:4183-92.
Salehi M, Sadjadi N, Baselizadeh S, Rohban MH, Rabiee HR. Multiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:14902-12.
Deng H, Li X. Anomaly detection via reverse distillation from one-class embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:9727-36.
Tien TD, Nguyen AT, Tran NH, Huy TD, Duong STM, Nguyen CDT, Truong SQH. Revisiting reverse distillation for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:24511-20.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. ImageNet large scale visual recognition challenge. Int J Comput Vis 2015;115:211-52.
Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA. USAD: unsupervised anomaly detection on multivariate time series. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020:3395-404.
Pietron M, Zurek D, Faber K, Corizzo R. AD-NEv: A Scalable Multilevel Neuroevolution Framework for Multivariate Anomaly Detection. IEEE Trans Neural Netw Learn Syst 2025;36:8939-53. [Crossref] [PubMed]
Pietroń M, Żurek D, Faber K, Wójcik A, Corizzo R. AD-NEv++: the multi-architecture neuroevolution-based multivariate anomaly detection framework. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2024:607-10.
Garg A, Zhang W, Samaran J, Savitha R, Foo CS. An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series. IEEE Trans Neural Netw Learn Syst 2022;33:2508-17. [Crossref] [PubMed]
Noroozi M, Favaro P. Unsupervised learning of visual representations by solving jigsaw puzzles. In: Proceedings of the European Conference on Computer Vision. 2016:69-84.
Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 2016:424-32.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the International Conference on Neural Information Processing Systems. 2019:8024-35.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980 [Preprint]. 2014. Available online: https://arxiv.org/abs/1412.6980

(English Language Editor: L. Huleatt)

Cite this article as: Liu R, Zhu Y, Lyu Z, Gao Y, Zhan Y, Zhan Y. Early identification of abnormal pulmonary infectious diseases using unsupervised anomaly detection. Quant Imaging Med Surg 2025;15(12):12561-12581. doi: 10.21037/qims-2025-1285

Early identification of abnormal pulmonary infectious diseases using unsupervised anomaly detection

Introduction

Table 1

UAD

Puzzle-AE and motivation

Methods

The overall architecture of the LRAE

The DOPS mechanism

LFF module

Loss function

Anomaly score

Dataset

Implementation

Evaluation metrics

Results

Comparison with other methods

Table 2

Table 3

Ablation study

Table 4

Table 5

Analysis of histogram

Visual analysis

Discussion

Overview of important findings

Comparison with previous work

Practical implications

Analysis of limitations

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share