Deep learning-based low count whole-body positron emission tomography denoising incorporating computed tomography priors

Zhengyu Peng; Fanwei Zhang; Han Jiang; Guichao Liu; Jingzhang Sun; Yu Du; Zhonglin Lu; Ying Wang; Greta S. P. Mok

doi:10.21037/qims-24-489

Original Article

Deep learning-based low count whole-body positron emission tomography denoising incorporating computed tomography priors

Zhengyu Peng^1# , Fanwei Zhang^2#, Han Jiang^1,3, Guichao Liu², Jingzhang Sun^1,4, Yu Du^1,5, Zhonglin Lu^1,5, Ying Wang^2*, Greta S. P. Mok^1,5,6*

¹Biomedical Imaging Laboratory (BIG), Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau, China; ²Department of Nuclear Medicine, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China; ³Positron Emission Tomography-Computed Tomography (PET-CT) Center, Fujian Medical University Union Hospital, Fuzhou, China; ⁴Biomedical Imaging Group, School of Cyberspace Security, Hainan University, Haikou, China; ⁵Centre for Cognitive and Brain Sciences, Institute of Collaborative Innovation, University of Macau, Macau, China; ⁶Ministry of Education Frontiers Science Center for Precision Oncology, University of Macau, Macau, China

Contributions: (I) Conception and design: GSP Mok; (II) Administrative support: GSP Mok, Y Du; (III) Provision of study materials or patients: F Zhang, Y Wang; (IV) Collection and assembly of data: Z Peng, F Zhang; (V) Data analysis and interpretation: Z Peng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

^*These authors contributed equally to this work.

Correspondence to: Ying Wang, MD. Department of Nuclear Medicine, The Fifth Affiliated Hospital of Sun Yat-sen University, 52 Meihua E. Rd., Zhuhai 519099, China. Email: wangy9@mail.sysu.edu.cn; Greta S. P. Mok, PhD. Biomedical Imaging Laboratory (BIG), Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Research Building N21-2016, Avenida da Universidade, Taipa, Macau 999078, China; Center for Cognitive and Brain Sciences, Institute of Collaborative Innovation, University of Macau, Macau, China; Ministry of Education Frontiers Science Center for Precision Oncology, University of Macau, Macau, China. Email: gretamok@um.edu.mo.

Background: Deep-learning-based denoising improves image quality and quantification accuracy for low count (LC) positron emission tomography (PET). Conventional deep-learning-based denoising methods only require single LC PET image input. This study aims to propose a deep-learning-based LC PET denoising method incorporating computed tomography (CT) priors to further reduce the dose level.

Methods: Fifty patients who underwent their routine whole-body 2-deoxy-2-[¹⁸F]fluoro-D-glucose (¹⁸F-FDG) PET/CT scans in March 2022 were retrospectively and non-consecutively recruited. For full count (FC) PET, patients were injected with 3.7 MBq/kg FDG and scanned for 5 bed positions with 2 min/bed. LC PET of 1/10 (LC-10) and 1/20 (LC-20) count levels of FC PET were obtained by randomly down-sampling the FC list mode data, which were then paired with FC PET for training U-Net (U-Net-1) and cGAN (cGAN-1). Networks incorporated CT images as input (U-Net-2 and cGAN-2) were also implemented. Quantitative analysis of physical and clinical indices was performed and statistically assessed with Wilcoxon sign-rank test with Bonferroni correction.

Results: Mean square error and structural similarity index were the best for cGAN-2, followed by U-Net-2, cGAN-1 and U-Net-1. The errors of mean standardized uptake value (SUV) and maximum SUV were lowest for cGAN-2, followed by cGAN-1, U-Net-2 and U-Net-1. For cGAN-2, image quality and lesion detectability scores were 3.71±0.94 and 4.25±0.83 for LC-10, 3.57±0.79 and 3.61±1.21 for LC-20, while they were 3.49±0.92 and 4.42±0.08 for FC. Notably, some small lesions were “masked out” on cGAN/U-Net-1 but can be retrieved on cGAN/U-Net-2 denoised PET for LC-20.

Conclusions: Deep-learning-based LC PET denoising incorporating CT priors is more effective than conventional deep-learning-based denoising with single LC PET input, especially at lower dose levels.

Keywords: Deep learning (DL); positron emission tomography/computed tomography (PET/CT); denoising; conditional generative adversarial network; U-Net

Submitted Mar 17, 2024. Accepted for publication Oct 12, 2024. Published online Nov 21, 2024.

doi: 10.21037/qims-24-489

Introduction

Positron emission tomography/computed tomography (PET/CT) is a highly sensitive and non-invasive imaging technology widely used for diagnosis (1,2), staging (3), and treatment planning in oncology (4,5). 2-deoxy-2-[¹⁸F]fluoro-D-glucose (¹⁸F-FDG) is the most commonly used PET tracer for various applications (6). Typically, high dosage of radiotracers or long acquisition time is required to obtain high-quality PET images in clinical practice (7), which constitutes ~40% of total effective dose of a PET/CT scan (8). However, the use of high-dosage ¹⁸F-FDG increases the ionizing radiation exposure and subsequent secondary cancer risk, particularly for the increasing population of younger patients (8-10) and patients who need to undergo repeated follow up scans (11). Though CT contributes more effective dose than PET in a PET/CT scan, based on the ALARA (As Low As Reasonably Achievable) principle (12), it is still necessary to pursue low count (LC) PET, especially CT radiation dose has been decreasing [~13.14±5.14 mSv (8) vs. ~25.95 mSv (9)] due to the advance of statistical iterative reconstruction (13) as well as artificial intelligence (AI) denoising (14,15). On the other hand, long acquisition time poses more burden on patients and motion artifacts are more prone to be observed (16). Thus, low dose or fast PET, collectively called LC PET, is desirable by reducing either the injection dose or acquisition time (17). However, these approaches may compromise image quality (IQ) and quantitative accuracy of full count (FC) PET due to the higher noise. Nonlocal means (NLM), block-matching and 3-dimensional filtering (BM3D), and block-matching and 4-dimensional filtering (BM4D) algorithms are commonly used for PET denoising, with limited performance (18,19).

Deep learning (DL)-based denoising is promising for LC PET. Zhou et al. (20) proposed a modified cycle-consistent generative adversarial network (cycleGAN) for 1/100 LC PET denoising, obtaining denoised PET images without significant image degradation. Their evaluations were based on purely physical indices and more clinically relevant assessments are needed to demonstrate the effectiveness of their proposed method. Sanaat et al. (21) used a modified cycleGAN model to denoise 1/8 LC PET images and maintained the diagnostic accuracy. The same group also proposed a modified U-Net model (22) which generated denoised 1/20 LC sinograms, resulting in improved PET IQ as compared to denoising in image domain (23). Additionally, Liu et al. (24) proposed a personalized DL denoising strategy which trained the model using LC PET at different dose levels as input, showing advantages as compared to training based on one dose level.

Previous studies have demonstrated the utility of multi-modality data for DL-based LC PET image denoising (25-35). Xu et al. (26) incorporated multi-contrast information from simultaneous magnetic resonance imaging (MRI) for ultra-LC PET denoising, reducing the dose level to 1/200 FC. Deng et al. (27) proposed a discrete wavelet-transform-convolution neural network (CNN) which used both LC PET and MRI images as input for denoising LC PET in PET/MRI and reported the possibility of reducing the dose level to 1/100 FC. Xiang et al. (28) also proposed a deep auto-context CNN which used both PET and T1 MRI images as input for denoising PET. This network achieved competitive IQ and decreased computational time compared to a data-driven multilevel canonical correlation analysis (MCCA) scheme. Wang et al. (29) proposed a CNN with a weighted-loss function module, using 6.25% ultra-LC PET and MRI images as input to generate denoised PET images. They claimed that anatomical details could be reserved as compared to using only LC PET images as input. Zhao et al. (30) proposed a 3-dimensional (3D) encoder-decoder-based model to achieve self-supervised joint PET and CT denoising for 1/2, 1/4 and 1/8 LC ¹⁸F-FDG head PET/CT, showing better performance than single modality denoising based on physical indices. Song et al. (31) proposed an unsupervised 3D PET image denoising method with a MRI-guided deep decoder. Their results showed improved IQ while retaining spatial resolution and quantitative accuracy. Huang et al. (32) proposed a novel 3D network with a spatial brain transform model with low-dose whole-brain PET and MRI images as input to obtain high quality PET images. Their results showed improved quantitative results and clinical assessments. Hosch et al. (33) implemented a modified 2-dimensional (2D) pix2pixHD network to denoise 1/30 LC whole-body ¹⁸F-FDG PET images with multi-channel PET and CT input. Their results showed improved lesion detectability and quantification results as compared to LC PET input. Ladefoged et al. (34) used a 3D U-Net with multi-channel PET and CT inputs to denoise 1/10 and 1/100 LC cardiac ¹⁸F-FDG PET and showed improved IQ and reduced left ventricular ejection fraction error as compared to LC PET. Xie et al. (35) input both multi-channel and multi-branch PET on a 3D CNN incorporated in image reconstruction for 1/10 LC chest ¹⁸F-FDG PET. Both methods produced a better lesion contrast versus noise trade-off curve as compared to without CT input based on physical indices.

Most of the LC PET denoising studies with CT-prior input only compared their methods with LC PET images, lacking performance evaluation as compared to standard DL methods and with only LC PET input for training. We evaluated the effectiveness of 3D U-Net and cGAN with 2-channel PET and CT input for whole-body ¹⁸F-FDG PET denoising. A more comprehensive evaluation based on physical as well as clinical indices on lesion detectability and IQ scoring was performed. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-489/rc).

Methods

This study was performed in line with the principles of the Declaration of Helsinki (as revised in 2013). Approval was granted by the Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China (approval No. 2023-K146-1). The requirement ofwritten informed consent was waived because of the retrospective and observational nature of the study.

Clinical dataset

Fifty anonymous patients aged 18 years or above who underwent a routine whole-body ¹⁸F-FDG scan on a clinical PET/CT system (uMI 780, United Imaging, Shanghai, China) in March 2022 were retrospectively and non-consecutively recruited in this study in the Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China under the local ethics approval. Figure 1 shows our study flowchart. For each patient, five bed positions (1 for head, 4 for body) were acquired with a 2-minute scan per bed position. Scanning was performed 1–1.5 hours after administering a 3.7 MBq/kg injection dose for the FC PET scan. Data from base of skull to mid-thigh were used for further analysis. A CT scan (120 kVp; 179 mAs; pitch 1:1.2875; matrix size 512 ´ 512 ´ variable axial coverage; voxel size 1.1719´1.1719´1.5 mm³) was performed before the PET scan for attenuation correction and anatomical reference. The effective dose of CT was ~15 mSv. LC PET sinograms were generated by down-sampling the FC list mode data to 1/10 (LC-10) and 1/20 (LC-20) of FC count level, which could be regarded as LC PET. As compared to low-dose PET, the down-sampling used in this study is actually more similar to fast PET with a relatively shorter acquisition time. Thus, less motion artifacts would be expected. Both FC and LC sinograms were reconstructed with random, attenuation and model-based scatter corrections using the 3D ordered-subset expectation-maximization (OS-EM) algorithm (2 iterations, 20 subsets). The reconstruction matrix size was 150 ´ 150 ´ variable axial coverage with a voxel size of 4´4´3 mm³. All FC PET images were filtered with a Gaussian filter with a full width at half maximum of 3 mm and converted to standardized uptake value (SUV) maps.

Figure 1 The flowchart of this study. SUV, standardized uptake value; CT, computed tomography, LC, low count; PET, positron emission tomography; cGAN, conditional generative adversarial network; MSE, mean square error; SSIM, structural similarity index; SUV_mean, mean standardized uptake value; SUV_max, maximum standardized uptake value.

Data preprocessing

CT images were resampled to the same voxel and matrix size as PET images (matrix size of 150 × 150 × variable axial coverage with a voxel size of 4×4×3 mm³). The minimum CT Hounsfield unit (HU) values on CT images were shifted to zero, then normalized to the mean value of the corresponding LC PET. PET and CT images were divided into 128×128×128 patches. The number of patches were dependent on the axial length of the reconstructed images, which varied for each patient. More than 73% of the patches were overlapped.

DL networks

We implemented 4 DL-based denoising methods: (I) 3D U-Net with single LC PET input (U-Net-1); (II) 3D U-Net with both LC PET and CT input (U-Net-2); (III) 3D cGAN with single LC PET input (cGAN-1); and (IV) 3D cGAN with LC PET and CT input (cGAN-2). Both U-Net-1 and U-Net-2 consisted of an encoder, bottleneck layers and a decoder (Figure 2A). Each layer in the encoder contained a 3´3´3 convolution (Conv), a batch normalization (BN), a rectified linear unit (ReLU) activation function, and a down-sampling layer with 2´2´2 max pooling (pooling). To prevent overfitting, a dropout layer with 20% dropout rate was added after the ReLU layer in the bottleneck layers. The decoder consisted of a sequence of up-sampling layers with a stride of 2, 3´3´3 Conv, and ReLU activation functions to restore the input image size. Mean absolute error (MAE) loss function was used in all networks. Figure 2B shows the architecture of the cGAN-1 and cGAN-2, each of which consisted of a generator and a discriminator. The generator was based on the U-Net structure. The cross-entropy-based discriminator was implemented. We applied a concatenate layer to each layer of the DL networks, but not to the first layer of the encoder and decoder for U-Net-2 and cGAN-2 due to the different structural information presented in PET and CT images.

Figure 2 Diagrams of different denoising networks. (A) U-Net-1 and U-Net-2; (B) cGAN-1 and cGAN-2. LC, low count; FC, full count; PET, positron emission tomography; CT, computed tomography; cGAN, conditional generative adversarial network; Conv, convolution; BN, batch normalization; ReLU, rectified linear unit; L_MAE, mean absolute error loss; L_D, discriminator loss; L_ADV, adversarial loss.

All networks were implemented using Python 3.7 and TensorFlow GPU version 2.2.0 on a NVIDIA GeForce RTX A6000 with 48 GB RAM. The Adam optimizer (36) was applied with an initial learning rate of 0.0001 with auto-adaptive decay, running up to 400 epochs. The training times for 400 epochs are as follows: 6 hours for U-Net-1, 8 hours for cGAN-1, 13 hours for U-Net-2 and 17 hours for cGAN-2. The testing time is similar for all 4 models, taking 3 minutes to process 128 patches.

For U-Net-2 and cGAN-2, the 3D CT patches were concatenated to corresponding 3D LC PET patches as input, while only 3D LC PET patches were used for U-Net-1 and cGAN-1. We selected 35 (428 patches), 5 (60 patches), and 10 (128 patches) patient data for training, validation and testing, respectively. A 5-fold cross validation was further applied to test all 50 clinical patient data. Hyper-parameters of those networks including number of layers and feature numbers (37) were optimized. The denoised patches were later regrouped to form a denoised whole-body PET image.

Data post processing and quantitative analysis

We selected 18 lesions in 16 patients with clear border and a diameter >1.0 cm based on FC PET and CT images. Sample coronal images for all 18 lesions were depicted in Figure 3. Lesion segmentation was performed using a threshold of 41% of maximum SUV (SUV_max) (38), and the mean SUV (SUV_mean) of lesions was then calculated for the segmented lesions. SUV_mean and SUV_max errors {Eqs. [1,2]} were calculated using the FC PET images as reference, and a Bland-Altman plot was performed on SUV_mean and SUV_max errors. Mean square error (MSE) (39) {Eq. [3]} and structural similarity index (SSIM) (39) {Eq. [4]} were also computed between whole LC/DL-denoised PET as compared to FC PET images.

${SUV}_{mean} error = \frac{| {SUV}_{mean- LC / DL} - {SUV}_{mean-FC} |}{{SUV}_{mean-FC}} \times 100 %$ [1]

${SUV}_{max} error = \frac{| {SUV}_{max- LC / DL} - {SUV}_{max-FC} |}{{SUV}_{max-FC}} \times 100 %$ [2]

$M S E (f, g) = \frac{1}{M N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} {(f_{i j} - g_{i j})}^{2}$ [3]

$S S I M (x, y) = {[l (x, y)]}^{α} \times {[c (x, y)]}^{β} \times {[s (x, y)]}^{γ}$ [4]

Figure 3 Sample images of the 18 lesions for further analysis. Lesions were indicated by red arrows. *^,#, lesions belong to the same patient.

Clinical assessments

In this study, IQ and lesion detectability were assessed by two nuclear medicine physicians, each with ten years of clinical experience. We anonymized and randomly arranged 50 FC PET images and 400 DL-based denoised images (4×50 for LC-10 and 4×50 for LC-20) for grading. We used a 5-point scoring system (22) to assess IQ: 5 indicates excellent; 4 indicates good; 3 indicates adequate; 2 indicates poor; 1 indicates uninterpretable. We also implemented a modified 6-point scoring scheme (40) for lesion detectability based on the 18 selected lesions: 5 indicates excellent, no/minimal heterogeneities; 4 indicates very good, subtle, tiny heterogeneities; 3 indicates good, small heterogeneities visible; 2 indicates poor, some significant heterogeneities of varying size and magnitude; 1 indicates lesion reserved but numerous significant heterogeneities; 0 indicates lesion disappeared. To aid the physicians in locating specific lesions, FC PET and CT images were provided for reference.

Statistical analysis

In this study, the agreement between the two physicians’ scoring was assessed using interclass correlation coefficient (ICC). The statistical analysis of Wilcoxon signed-rank test (41) (GraphPad Software, San Diego, California, USA) was applied on MSE, SSIM, errors on SUV_mean and SUV_max, IQ and lesion detectability score results. A P value <0.05 was considered statistically significant.

Results

Datasets information

The basic demographic characteristics of the patients and the selected lesions are shown in Tables 1,2.

Table 1

Patient demographic information

Characteristics	With lesions	Without lesions	Total
No. of patients	26	24	50
Male/female	15/11	15/9	30/20
Age (years)	48.23±9.21 [24–57]	42.38±10.46 [20–57]	45.42±10.26 [20–57]
Weight (kg)	58.82±9.52 [39.6–79.1]	64.23±12.85 [45.3–100]	61.4±11.58 [39.6–100]
BMI (kg/m²)	21.43±2.87 [16.70–26.71]	22.61±3.89 [16.70–33.03]	22.09±3.39 [16.70–33.03]
Injection activity (MBq)	217.63±35.21 [146.52–292.67]	237.65±47.53 [167.61–370]	227.11±42.8 [146.52–370]

Data are presented as number or mean ± standard deviation [range]. BMI, body mass index.

Table 2

Lesion characteristics

Characteristics	Value
No. of selected patients with lesions (selected/total)	16/26
No. of selected lesions (selected/total)	18/86
SUV_max	15.42±9.14 [3.57–41.20]
SUV_mean	9.54±5.30 [2.37–23.36]
Lesion diameter (cm)	2.23±1.51 [1.02–7.00]
Locations (No. of lesions)
Duodenum	1
Esophagus	1
Adrenal gland	1
Axillary	1
Lung	6
Colon	2
Kidney	1
Liver	1
Ileum	1
Stomach	1
Thyroid	1
Pleura	1
No. of patients with 2 lesions	2

Data are presented as mean ± standard deviation [range] if not otherwise specified. SUV_mean, mean standardized uptake value; SUV_max, maximum standardized uptake value.

Network optimization

The number of layers and feature numbers for U-Net-1 and U-Net-2 are determined to be 5 layers with 24 feature maps and 2 layers with 16 feature maps, respectively. For cGAN-1 and cGAN-2, 3 layers and 40 feature maps show the best performance. The network optimization results are shown in Figure S1.

Quantitative analysis

Figure 4A shows sample coronal images of FC, LC-10, LC-20, DL-based denoised images by U-Net-1, cGAN-1, U-Net-2 and cGAN-2 of a female patient. Compared to LC PET images, DL-based denoised images provides more similar appearance to FC PET images. The visual inspection reveals that the images derived by cGAN-2 are better than others for both dose levels with sharper bone structures. Figure 4B presents FC, LC, and DL-based denoised images of a sample patient with a small lesion in the right lung. For LC-20, the lesion is ‘masked out’ in U-Net-1, cGAN-1 and U-Net-2, while cGAN-2 successfully retrieves it. For LC-10, all DL-based denoised images are able to detect the small lesion, where cGAN-2 provides better contrast than others.

Figure 4 Sample coronal whole-body PET images of two patients. (A) A 49-year-old female patient with BMI of 16.89 kg/m². Red arrows indicate the bony structure observed better in U-Net-2 and cGAN-2 as compared to U-Net-1 and cGAN-1, respectively. (B) A 49-year-old male patient with BMI of 22.37 kg/m². For a lower dose level (LC-20, bottom row), a small lesion (red box) was masked out by U-Net-1, cGAN-1 and U-Net-2 but not cGAN-2. It can be retrieved for all denoised images for LC-10 (upper row). LC, low count; FC, full count; PET, positron emission tomography; cGAN, conditional generative adversarial network; BMI, body mass index.

Figure 5A,5B summarize the average MSE and SSIM on 50 patients for LC and DL-based denoised images. cGAN-2 shows significantly superior performance as compared to the other DL networks on MSE and SSIM for both LC-10 and LC-20 (all P<0.05), followed by U-Net-2, cGAN-1 and U-Net-1, while all are better than LC. cGAN-based methods are consistently and significantly better than U-Net-based methods (all P<0.05). Figure 5C,5D show the average SUV_meanand SUV_maxerrors calculated on LC and DL-based denoised images on both dose levels. cGAN-2 shows more similar results with the FC on both SUV_mean and SUV_max. U-Net-2 also shows lower SUV_meanand SUV_max errors than those of U-Net-1.

Figure 5 Quantitative results of different denoising methods. (A) MSE; (B) SSIM for 50 patients; (C) SUV_mean error; (D) SUV_max error of 18 selected lesions. *, P<0.05; **, P<0.01; ***, P<0.001. MSE, mean square error; SSIM, structural similarity index; SUV_mean, mean standardized uptake value; SUV_max, maximum standardized uptake value; LC, low count, cGAN, conditional generative adversarial network.

Figure 6 illustrates Bland-Altman results of SUV_mean and SUV_max error for each patient, again showing that cGAN-2 shows the lowest bias than others. U-Net-2 also shows lower bias than U-Net-1, and cGAN shows lower bias than U-Net in general. For both MSE and SSIM, on both LC-10 and LC-20, U-Net-2 and cGAN-2 show significant difference as compared to the U-Net-1 and cGAN-1, respectively. For SUV_mean, cGAN-2 is significantly different than cGAN-1 on LC-20.

Figure 6 Bland-Altman plots of different denoising methods. (A) Mean standardized uptake value (SUV_mean) errors of DL-based methods for two dose levels; (B) maximum standardized uptake value (SUV_max) errors of DL-based methods for two dose levels. Full count images were used as reference. Red dotted line: mean difference; black dotted line: 95% confidence internal of the difference. LC, low count, cGAN, conditional generative adversarial network; DL, deep learning.

Clinical assessment

The average clinical assessment results for IQ [ICC: 0.73, 95% confidence interval (CI): 0.69–0.77] and lesion detectability (ICC: 0.92, 95% CI: 0.87–0.97) are shown in Figure 7. In the IQ assessment, the average scores across the entire test dataset for LC-20 are 2.97±1.01, 2.93±1.20, 3.43±0.89, 3.57±0.79 and 3.49±0.92 for U-Net-1, cGAN-1, U-Net-2, cGAN-2 and FC, respectively. The cGAN-2 obtains higher score than other DL-based methods and even FC, while other DL-based methods show lower IQ scores than FC. U-Net-2 also achieves higher IQ score than U-Net-1. There is a significant difference between U-Net-2 vs. U-Net-1 (P<0.01) and cGAN-2 vs. cGAN-1 (P<0.001) for LC-20. For LC-10, the average IQ scores are 2.98±0.93, 3.54±1.10, 3.58±0.99 and 3.71±0.94 for U-Net-1, cGAN-1, U-Net-2 and cGAN-2, respectively. All DL-based methods receive higher scores than FC except U-Net-1 in LC-10. Notably, cGAN-2 performs better than U-Net-2 and cGAN-1, and U-Net-2 demonstrates significantly better performance than U-Net-1 (P<0.001). IQ scores for DL processed LC-10 are generally higher than those of LC-20.

Figure 7 Clinical assessment results for different deep learning-based denoised and the original FC images. (A) Average image quality results from 50 (×2 physicians’ scoring) patients; (B) average lesion detectability results from 18 (×2 physicians’ scoring) lesions. Mean scores are presented on top of each bar. *, P<0.05; **, P<0.01; ***, P<0.001. LC, low count; FC, full count; cGAN, conditional generative adversarial network.

The average lesion detectability scores for LC-20 are 3.14±1.70, 3.22±1.70, 3.56±1.46 and 3.61±1.21 for U-Net-1, cGAN-1, U-Net-2, and cGAN-2, respectively. For LC-10, the scores are 4.03±1.07, 4.11±0.84, 4.17±0.96 and 4.25±0.83 for U-Net-1, cGAN-1, U-Net-2, and cGAN-2, respectively. The average score for FC is 4.42±0.08, superior to all DL methods for both LC-10 and LC-20. Among all DL methods, cGAN-2 achieves the highest scores for both LC-10 and LC-20. Meanwhile, all 18 lesions can be detected by all DL networks for LC-10. For LC-20, cGAN-2 shows better performance than cGAN-1 (depicting 18 and 16 out of 18 lesions respectively), and U-Net-2 performs better compared to U-Net-1 (depicting 16 and 15 out of 18 lesions respectively).

Discussion

Previous studies have demonstrated that DL-based denoising methods outperformed conventional filter denoising methods, including NLM, BM3D and BM4D (18,21,25). Therefore, in this study, we focused mainly on evaluating the performance of various DL networks with and without incorporating CT priors as input. Our findings indicated that cGAN-2 and U-Net-2, i.e., with the use of CT priors, achieved superior IQ and yielded more favorable results in quantitative analysis when compared to cGAN-1 and U-Net-1, i.e., without the use of CT priors. Particularly, cGAN-2 illustrated better performance on bony structure and lesion reservation than cGAN-1 for both dose levels (Figure 4, red arrow and red box, respectively), while U-Net-2 reserved more lesion structure than U-Net-1 on LC-10 but not on LC-20 (Figure 7). Notably, a minor lesion that was not clearly visible on the input LC-20 image but can be covered in cGAN-2 processed images (Figure 4B), indicate that the supplementary anatomical information provided by CT could potentially aid in denoising. cGAN-2 showed the best performance in various indices for both dose levels, and U-Net-2 also showed better performance than U-Net-1 in all indices. These results showed that incorporating CT priors into the DL methods improved IQ along with better quantitative and clinical assessment results, probably attributed to the additional anatomical information provided. For LC-10, most DL-denoised methods performed better than FC PET, except U-Net-1. While for LC-20, FC PET exhibited superior IQ scores compared to most DL-denoised methods except cGAN-2. DL can be regarded as a “smart filter” and processed images are generally smoother than FC (23). However, it is worth noting that FC PET outperformed all DL denoising methods, even with CT priors, in terms of lesion detectability as DL may smooth out some image details such as small lesions. These results may reflect the limitations of current DL models. On the other hand, after examining every loss curve, we discovered that none of the DL models exhibit overfitting. Due to significantly lower clinical assessment scores observed for LC PET images in comparison to FC and DL denoised images, they were not included in the analysis.

In our previous study, we have proved that conventional cGAN was superior to conventional U-Net for LC single photon emission computed tomography (SPECT) denoising, probably attributed to cGAN has two different networks with adversarial training method, and a large variety of functions can be incorporated into the model (42). A similar finding was also observed in this study for DL methods incorporating CT priors for all indices. In addition, a previous study has suggested that in multi-modality imaging systems such as PET/CT and PET/MRI, CT is superior in soft-tissue resolution in lungs and mediastinal nodal diseases, while MRI is superior in head and neck, pelvis, and colorectal cancers (43). Our results showed that CT priors can be used to enhance IQ and quantitative accuracy. Our study is concordant with existing literature (28,30) that incorporating anatomical priors could potentially enhance the performance of LC PET DL-based denoising, yet comparison of the use of MRI or CT priors is beyond the scope of this study. Misregistration between PET and the anatomical images may degrade the performance of this method (44), and further evaluation is warranted. For our study, all PET images were verified to align well with the corresponding CT images.

A relatively small patient cohort was used for training the DL model in this study, which could lead to potential overfitting and low generalizability of the trained models. Dropout layers were applied in all the DL-based methods to address this problem. We have also checked the loss curves of all models and ensure that there is no over-fitting concern. Evaluation of our proposed method with a greater number of patient data is warranted (45). The LC PET input should be applicable to both reduced injection activity as well as reduced acquisition time of PET, addressing the clinical relevant problem of lowering patient/clinical personnel effective dose and reducing substantial motion artifacts. In the future, the dose level of CT images could be reduced to be input as prior, further reducing the effective dose of patients.

Conclusions

We evaluated the performance of the conventional DL-based denoising and a newly proposed DL-based denoising method with incorporations of CT priors for whole-body ¹⁸F-FDG LC PET. The use of CT priors (U-Net-2 and cGAN-2) showed superior IQ, quantitative accuracy and lesion recovery as compared to the conventional methods (U-Net-1 and cGAN-1).

Acknowledgments

Funding: This study received funding by research grants from the Science and Technology Development Fund (FDCT) of Macau (Nos. 0016/2023/R1B1 and 0178/2024/AMJ), Science and Technology Talents Innovation Project of Hainan Province (No. KJRC2023D30), and Development in Key Areas of Guangdong Province (No. 2018B030335001).

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-489/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-489/coif). G.S.P.M. serves as an unpaid editorial board member of Quantitative Imaging in Medicine and Surgery. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was performed in line with the principles of the Declaration of Helsinki (as revised in 2013). Approval was granted by the Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China (approval No. 2023-K146-1). The requirement of written informed consent was waived because of the retrospective and observational nature of the study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Townsend DW, Carney JP, Yap JT, Hall NC. PET/CT today and tomorrow. J Nucl Med 2004;45:4S-14S. [PubMed]
Bar-Shalom R, Yefremov N, Guralnik L, Gaitini D, Frenkel A, Kuten A, Altman H, Keidar Z, Israel O. Clinical performance of PET/CT in evaluation of cancer: additional value for diagnostic imaging and patient management. J Nucl Med 2003;44:1200-9. [PubMed]
Poeppel TD, Krause BJ, Heusner TA, Boy C, Bockisch A, Antoch G. PET/CT for the staging and follow-up of patients with malignancies. Eur J Radiol 2009;70:382-92. [Crossref] [PubMed]
Fletcher JW, Djulbegovic B, Soares HP, Siegel BA, Lowe VJ, Lyman GH, Coleman RE, Wahl R, Paschold JC, Avril N, Einhorn LH, Suh WW, Samson D, Delbeke D, Gorman M, Shields AF. Recommendations on the use of 18F-FDG PET in oncology. J Nucl Med 2008;49:480-508. [Crossref] [PubMed]
Rohren EM, Turkington TG, Coleman RE. Clinical applications of PET in oncology. Radiology 2004;231:305-32. [Crossref] [PubMed]
Ben-Haim S, Ell P. 18F-FDG PET and PET/CT in the evaluation of cancer treatment response. J Nucl Med 2009;50:88-99. [Crossref] [PubMed]
Zhang YQ, Hu PC, Wu RZ, Gu YS, Chen SG, Yu HJ, Wang XQ, Song J, Shi HC. The image quality, lesion detectability, and acquisition time of (18)F-FDG total-body PET/CT in oncological patients. Eur J Nucl Med Mol Imaging 2020;47:2507-15. [Crossref] [PubMed]
Li Y, Jiang L, Wang H, Cai H, Xiang Y, Li L. Effective radiation dose of 18F-FDG PET/CT: how much does diagnostic CT contribute? Radiat Prot Dosimetry 2019;187:183-90. [Crossref] [PubMed]
Huang B, Law MW, Khong PL. Whole-body PET/CT scanning: estimation of radiation dose and cancer risk. Radiology 2009;251:166-74. [Crossref] [PubMed]
Kapoor V, McCook BM, Torok FS. An introduction to PET-CT imaging. Radiographics 2004;24:523-43. [Crossref] [PubMed]
National Comprehensive Cancer Network. B-cell Follicular Lymphoma (Version 2.2024). [accessed September 09, 2024]. Available online: https://www.nccn.org/patients/guidelines/content/PDF/nhl-follicular-patient.pdf
Miller DL, Schauer D. The ALARA principle in medical imaging. Philosophy 1983;44:595-600.
Geyer LL, Schoepf UJ, Meinel FG, Nance JW Jr, Bastarrika G, Leipsic JA, Paul NS, Rengo M, Laghi A, De Cecco CN. State of the Art: Iterative CT Reconstruction Techniques. Radiology 2015;276:339-57. [Crossref] [PubMed]
Gholizadeh-Ansari M, Alirezaie J, Babyn P. Deep Learning for Low-Dose CT Denoising Using Perceptual Loss and Edge Detection Layer. J Digit Imaging 2020;33:504-15. [Crossref] [PubMed]
Zhang Y, Hao D, Lin Y, Sun W, Zhang J, Meng J, Ma F, Guo Y, Lu H, Li G, Liu J. Structure-preserving low-dose computed tomography image denoising using a deep residual adaptive global context attention network. Quant Imaging Med Surg 2023;13:6528-45. [Crossref] [PubMed]
Sureshbabu W, Mawlawi O. PET/CT imaging artifacts. J Nucl Med Technol 2005;33:156-61; quiz 163-4. [PubMed]
Xing Y, Qiao W, Wang T, Wang Y, Li C, Lv Y, Xi C, Liao S, Qian Z, Zhao J. Deep learning-assisted PET imaging achieves fast scan/low-dose examination. EJNMMI Phys 2022;9:7. [Crossref] [PubMed]
Maggioni M, Katkovnik V, Egiazarian K, Foi A. Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans Image Process 2013;22:119-33. [Crossref] [PubMed]
Dutta J, Leahy RM, Li Q. Non-local means denoising of dynamic PET images. PLoS One 2013;8:e81390. [Crossref] [PubMed]
Zhou L, Schaefferkoetter JD, Tham IWK, Huang G, Yan J. Supervised learning with cyclegan for low-dose FDG PET image denoising. Med Image Anal 2020;65:101770. [Crossref] [PubMed]
Sanaat A, Shiri I, Arabi H, Mainta I, Nkoulou R, Zaidi H. Deep learning-assisted ultra-fast/low-dose whole-body PET/CT imaging. Eur J Nucl Med Mol Imaging 2021;48:2405-15. [Crossref] [PubMed]
Sanaat A, Arabi H, Mainta I, Garibotto V, Zaidi H. Projection Space Implementation of Deep Learning-Guided Low-Dose Brain PET Imaging Improves Performance over Implementation in Image Space. J Nucl Med 2020;61:1388-96. [Crossref] [PubMed]
Sun J, Jiang H, Du Y, Li CY, Wu TH, Liu YH, Yang BH, Mok GSP. Deep learning-based denoising in projection-domain and reconstruction-domain for low-dose myocardial perfusion SPECT. J Nucl Cardiol 2023;30:970-85. [Crossref] [PubMed]
Liu Q, Liu H, Mirian N, Ren S, Viswanath V, Karp J, Surti S, Liu C. A personalized deep learning denoising strategy for low-count PET images. Phys Med Biol 2022; [Crossref] [PubMed]
Sun H, Jiang Y, Yuan J, Wang H, Liang D, Fan W, Hu Z, Zhang N. High-quality PET image synthesis from ultra-low-dose PET/MRI using bi-task deep learning. Quant Imaging Med Surg 2022;12:5326-42. [Crossref] [PubMed]
Xu J, Gong E, Ouyang J, Pauly J, Zaharchuk G, editors. Ultra-low-dose 18F-FDG brain PET/MR denoising using deep learning and multi-contrast information. Proc. SPIE; 2020;113131P. doi:10.1117/12.2548350.10.1117/12.2548350
Deng F, Li X, Yang F, Sun H, Yuan J, He Q, Xu W, Yang Y, Liang D, Liu X, Mok GSP, Zheng H, Hu Z. Low-Dose 68 Ga-PSMA Prostate PET/MRI Imaging Using Deep Learning Based on MRI Priors. Front Oncol 2021;11:818329. [Crossref] [PubMed]
Xiang L, Qiao Y, Nie D, An L, Wang Q, Shen D. Deep Auto-context Convolutional Neural Networks for Standard-Dose PET Image Estimation from Low-Dose PET/MRI. Neurocomputing (Amst) 2017;267:406-16. [Crossref] [PubMed]
Wang YJ, Baratto L, Hawk KE, Theruvath AJ, Pribnow A, Thakor AS, Gatidis S, Lu R, Gummidipundi SE, Garcia-Diaz J, Rubin D, Daldrup-Link HE. Artificial intelligence enables whole-body positron emission tomography scans with minimal radiation exposure. Eur J Nucl Med Mol Imaging 2021;48:2771-81. [Crossref] [PubMed]
Zhao F, Li D, Luo R, Liu M, Jiang X, Hu J. Self-supervised deep learning for joint 3D low-dose PET/CT image denoising. Comput Biol Med 2023;165:107391. [Crossref] [PubMed]
Song TA, Yang F, Dutta J. Noise2Void: unsupervised denoising of PET images. Phys Med Biol 2021;66: [Crossref] [PubMed]
Huang Z, Li W, Wu Y, Yang L, Dong Y, Yang Y, Zheng H, Liang D, Wang M, Hu Z. Accurate Whole-Brain Image Enhancement for Low-Dose Integrated PET/MR Imaging Through Spatial Brain Transformation. IEEE J Biomed Health Inform 2024;28:5280-9. [Crossref] [PubMed]
Hosch R, Weber M, Sraieb M, Flaschel N, Haubold J, Kim MS, Umutlu L, Kleesiek J, Herrmann K, Nensa F, Rischpler C, Koitka S, Seifert R, Kersting D. Artificial intelligence guided enhancement of digital PET: scans as fast as CT? Eur J Nucl Med Mol Imaging 2022;49:4503-15. [Crossref] [PubMed]
Ladefoged CN, Hasbak P, Hornnes C, Højgaard L, Andersen FL. Low-dose PET image noise reduction using deep learning: application to cardiac viability FDG imaging in patients with ischemic heart disease. Phys Med Biol 2021;66:054003. [Crossref] [PubMed]
Xie Z, Li T, Zhang X, Qi W, Asma E, Qi J. Anatomically aided PET image reconstruction using deep neural networks. Med Phys 2021;48:5244-58. [Crossref] [PubMed]
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 [Preprint]. 2014. Available online: https://doi.org/10.48550/arXiv.1412.6980
Sun J, Zhang Q, Du Y, Zhang D, Pretorius PH, King MA, Mok GSP. Dual gating myocardial perfusion SPECT denoising using a conditional generative adversarial network. Med Phys 2022;49:5093-106. [Crossref] [PubMed]
Cheebsumon P, Yaqub M, van Velden FH, Hoekstra OS, Lammertsma AA, Boellaard R. Impact of [18F]FDG PET imaging parameters on automatic tumour delineation: need for improved tumour delineation methodology. Eur J Nucl Med Mol Imaging 2011;38:2136-44. [Crossref] [PubMed]
Sara U, Akter M, Uddin MS. Image quality assessment through FSIM, SSIM, MSE and PSNR—a comparative study. Journal of Computer and Communications 2019;7:8-18. [Crossref]
Liu G, Chen S, Hu Y, Cao S, Yang X, Zhou Y, Shi H. Respiratory-gated PET imaging with reduced acquisition time for suspect malignancies: the first experience in application of total-body PET/CT. Eur Radiol 2023;33:3366-76. [Crossref] [PubMed]
Ly J, Minarik D, Jögi J, Wollmer P, Trägårdh E. Post-reconstruction enhancement of [18F]FDG PET images with a convolutional neural network. EJNMMI Res 2021;11:48. [Crossref] [PubMed]
Sun J, Du Y, Li C, Wu TH, Yang B, Mok GSP. Pix2Pix generative adversarial network for low dose myocardial perfusion SPECT denoising. Quant Imaging Med Surg 2022;12:3539-55. [Crossref] [PubMed]
Al-Nabhani KZ, Syed R, Michopoulou S, Alkalbani J, Afaq A, Panagiotidis E, O'Meara C, Groves A, Ell P, Bomanji J. Qualitative and quantitative comparison of PET/CT and PET/MR imaging in clinical practice. J Nucl Med 2014;55:88-94. [Crossref] [PubMed]
Du Y, Shang J, Sun J, Wang L, Liu YH, Xu H, Mok GSP. Deep-learning-based estimation of attenuation map improves attenuation correction performance over direct attenuation estimation for myocardial perfusion SPECT. J Nucl Cardiol 2023;30:1022-37. [Crossref] [PubMed]
Ultra-low Dose PET Imaging Challenge. [accessed September 09, 2024]. Available online: https://ultra-low-dose-pet.grand-challenge.org/Dataset/

Cite this article as: Peng Z, Zhang F, Jiang H, Liu G, Sun J, Du Y, Lu Z, Wang Y, Mok GSP. Deep learning-based low count whole-body positron emission tomography denoising incorporating computed tomography priors. Quant Imaging Med Surg 2024;14(12):8140-8154. doi: 10.21037/qims-24-489

Deep learning-based low count whole-body positron emission tomography denoising incorporating computed tomography priors

Introduction

Methods

Clinical dataset

Data preprocessing

DL networks

Data post processing and quantitative analysis

Clinical assessments

Statistical analysis

Results

Datasets information

Table 1

Table 2

Network optimization

Quantitative analysis

Clinical assessment

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share