Dynamic positron emission tomography image reconstruction using spatiotemporal kernel method with deep image prior
Introduction
Positron emission tomography (PET) is widely used in oncology, cardiology, neurology, and pharmaceutical research due to its non-invasive nature and high sensitivity (1-3). In dynamic PET imaging, sequential time frames, ranging from seconds to minutes, are acquired immediately following tracer injection, facilitating quantitative and visual analysis of metabolic and functional processes at the molecular level via kinetic modeling (4). However, PET reconstruction is inherently ill-posed, and low-count statistics in short frames worsen this problem (5). Consequently, dynamic PET images typically exhibit higher noise levels than static PET images acquired over the same total duration. This degradation adversely affects the accuracy of parametric images derived from kinetic modeling, thus hindering reliable quantitative analysis. Therefore, improving dynamic PET reconstruction quality is crucial for reliable quantitative analysis in clinical and research settings.
PET reconstruction methods can be broadly classified into two main categories: analytical and iterative (6). Analytical methods, such as filtered back-projection (FBP) (7,8), generally ignore noise and acquisition errors, leading to poor image quality. In contrast, iterative methods construct a statistical model of the measurement process and cyclically update the reconstructed image to best fit the measured data, achieving much better quality. The maximum likelihood expectation maximization (MLEM) algorithm (9) is the most common iterative method. However, MLEM has the weaknesses of slow convergence and noise amplification at later iterations, which negatively impact image quality. To mitigate these issues, several acceleration techniques have been proposed to speed up MLEM’s convergence, including ordered-subsets expectation maximization (OSEM) (10) and the row-action maximum likelihood algorithm (RAMLA) (11). Additionally, various regularization strategies have been explored to further enhance image quality (12,13). These methods impose spatial smoothing and edge-preservation constraints within local neighborhoods (14,15) or across adjacent image regions (16,17). Furthermore, anatomical images have been incorporated as prior information within the maximum a posteriori (MAP) reconstruction framework, leading to substantial improvements in PET image quality (18-21). However, such anatomy-guided methods require precise registration of PET and anatomical scans to avoid artifacts caused by misalignment.
Kernel methods inspired by machine learning have also been applied to PET reconstruction. Wang and Qi introduced the kernelized expectation maximization (KEM) algorithm (5), which incorporates prior information into the projection model, improving image quality without explicit regularization. In dynamic PET reconstruction, this method constructs the kernel matrix using composite frame images reconstructed from rebinned dynamic PET data. To further enhance dynamic PET reconstruction, Wang later proposed the spatiotemporal kernel expectation maximization (STKEM) method (22), which extends the kernel method by incorporating a temporal kernel. This addition improves the signal-to-noise ratio (SNR) of time-activity curves in dynamic PET imaging. However, relying on a single global temporal kernel may inadequately capture localized temporal variations, especially in regions with atypical tracer kinetics, leading to over-smoothing of high-contrast features. Additionally, similar to standard MLEM, kernel-based reconstructions are susceptible to image quality degradation due to noise amplification at later iterations.
Recently, deep learning has shown substantial potential in PET reconstruction (23-25). Broadly, deep learning-based PET reconstruction methods are categorized into three main categories: (I) end-to-end mapping methods (26,27), in which neural networks directly transform raw sinogram data into reconstructed images; (II) regularization-based methods (28-30), which incorporate learned priors as penalty terms within an iterative reconstruction framework; and (III) unrolled neural network methods (31,32), which unroll an iterative algorithm into a neural network architecture to implicitly model prior or penalty terms and eliminate the need for manually designed regularizers. However, these methods typically require extensive supervised training on large datasets, which limit their applicability to real-world PET data. In contrast, the deep image prior (DIP) framework (33) can effectively solve inverse problems without requiring external training data, making it particularly attractive for PET reconstruction. For example, Gong et al. proposed the DIP-based reconstruction method (DIPRecon), which combined DIP with the MLEM algorithm by using a patient’s magnetic resonance imaging (MRI) as the network input, significantly improving PET image quality (32). Similarly, Li et al. proposed the DIP-based kernelized expectation maximization (NeuralKEM), which utilizes DIP to estimate the kernel coefficient images during the reconstruction process and employs the three high-count images reconstructed from rebinned dynamic PET data as the neural network input, thereby demonstrating superior reconstruction performance over traditional kernel methods (34). Despite these advances, each method has limitations: some rely on external anatomical images as priors, whereas others fail to fully exploit the intrinsic spatiotemporal information in dynamic PET data.
In this study, we propose a novel dynamic PET reconstruction method that integrates the DIP framework with a spatiotemporal kernel. We formulated the reconstruction task as a constrained optimization problem, solved using the alternating direction method of multipliers (ADMM) algorithm. In each iteration, a DIP-based convolutional neural network (CNN) denoises the kernel coefficient images across all frames, and a temporal kernel enforces consistency across frames, improving overall reconstruction quality. To ensure a fair comparison with existing methods, we adhered to the established practice of employing composite images reconstructed from three high-count frames that have been rebinned from dynamic data (34) as input to the DIP network and for constructing the spatial kernel. Our method relies solely on the intrinsic information within the measured dynamic PET data, eliminating the need for external training data or anatomical priors. In simulations and preclinical studies, the proposed method has yielded higher-quality images with better noise suppression and structural detail preservation compared to MLEM with Gaussian post-filtering (MLEM-G), KEM, STKEM, DIPRecon, and NeuralKEM. It maintained high image quality even after numerous iterations, avoiding the degradation observed in other methods. The preliminary results of this work were presented at the 2024 IEEE Nuclear Science Symposium and the Medical Imaging Conference (NSS/MIC) (35).
Methods
Dynamic PET image reconstruction model
Dynamic PET data are modeled as independent Poisson random variables. The corresponding log-likelihood function is given by reference (13):
where Ni is the total number of lines of response and M denotes the number of frames in dynamic acquisition. The relationship between the expected projection data and the image to be reconstructed is defined by an affine transformation (36):
where is the system matrix with each element denoting the probability that a coincidence event originating from a positron annihilation event at pixel j is detected by line of response (LOR) i in frame m. Nj represents the total number of pixels and represents the expectation of random and scattered events.
The maximum likelihood estimates of the image is obtained by maximizing the log-likelihood function:
This maximization can be performed iteratively using the expectation maximization (EM) algorithm (9):
where n denotes the iteration number, the superscript T indicates a matrix transpose, and is a vector of length , with all elements equal to 1. Multiplication and division of vectors is performed element-wise. This equation is the standard iterative update rule of the MLEM algorithm.
Spatiotemporal kernel method
The spatiotemporal kernel method extends the conventional kernel-based PET reconstruction method by incorporating temporal correlations through an additional kernel component (22). In this method, is defined as a low-dimensional feature vector encoding the spatiotemporal prior information for pixel j in frame m. This feature vector consists of a spatial component , which captures spatial correlations between pixels, and a temporal component , which represents temporal correlations across frames: . Consequently, pixel j in the PET image of frame m can be expressed as a linear combination of kernel function:
where α represents the kernel coefficient image, is the spatiotemporal kernel function, is the spatial kernel function derived from the prior images, and is the temporal kernel function obtained from the temporal prior. Each element of the kernel function reflects the similarity between the feature vectors of the corresponding pixels.
The overall spatiotemporal kernel matrix can be decomposed into the spatial kernel matrix and the temporal kernel matrix :
where denotes the Kronecker product.
Gaussian kernels are commonly used to measure both spatial and temporal similarity. The spatial kernel is typically constructed using a k-nearest neighbor (kNN) search based on the prior image. Subsequently, the spatial kernel is calculated as:
where is a constant parameter controlling the weight of the spatial kernel.
Similarly, the temporal kernel is defined as:
where is a constant parameter controlling the temporal kernel weight, and is the time window width.
In this kernel framework, the prior information is incorporated into the forward projection model via the kernel function:
The kernel coefficient image α is obtained by using the KEM algorithm:
After obtaining the kernel coefficient estimate , the final reconstructed image is computed as:
The proposed method
As an unsupervised method, the DIP framework (33) offers significant advantages in PET reconstruction, particularly in scenarios where large, annotated datasets are unavailable. In the proposed method, the unknown kernel coefficient image α is formulated as:
where f represents the neural network, θ denotes the network parameters, and is a prior image that serves as the input to the network. Once the kernel coefficient image α is obtained, the reconstructed PET image is computed using Eq. [11].
By combining the Poisson log-likelihood with the kernel representation and the DIP framework, we formulate the image reconstruction problem as a constrained optimization problem:
To solve this constrained problem efficiently, we reformulate it as an augmented Lagrangian function:
where µ is the Lagrange multiplier and ρ is a hyperparameter. Eq. [14] is solved using the ADMM method, which decomposes the optimization problem into three iterative subproblems: (I) updating the kernel coefficient image α; (II) updating the DIP network parameters θ; and (III) updating the Lagrange multiplier variable µ to enforce the equality constraint.
The first subproblem (Eq. [15]) can be interpreted as a PET reconstruction problem with a quadratic penalty term. By defining , the solution can be derived using the optimization transfer technique (16), in which a surrogate function is constructed. The surrogate function must satisfy the following conditions that ensure convergence:
where denotes the gradient with respect to α.
The surrogate function is defined as:
where , and is given by:
After deriving the surrogate function, pixel-wise optimization of the kernel coefficient image α can be performed. For pixel j, the surrogate objective function corresponding to Eq. [15] is expressed as:
By setting the first derivative of the above function to zero, the iterative update formula for each pixel is obtained as:
Notably, Eq. [16] constitutes a nonlinear least-squares problem. For the network training, we employ the Adam optimizer, treating as the label and using the mean squared error (MSE) as the loss function.
The proposed method differs substantially from NeuralKEM by Li et al. (34). NeuralKEM incorporates the DIP framework into the conventional KEM algorithm through an optimization transfer strategy, performing independent frame-by-frame reconstruction of dynamic PET data. In contrast, our method integrates the DIP into the original STKEM algorithm within an ADMM optimization framework, allowing for the simultaneous reconstruction of all dynamic frames. This design not only reduces noise and preserves structural details in individual frames but also improves temporal continuity across the reconstructed image sequence.
Network structure
A residual U-Net architecture is employed in the DIP framework, with both 2-dimensional (2D) and 3-dimensional (3D) versions depending on the dimension of the PET image. Specifically, a 2D network is used for simulation studies, whereas a 3D network is used for real data. To process dynamic PET data, the number of output channels is set equal to the number of time frames, as illustrated in Figure 1. The network consists of: (I) 2D (3D) convolutional layers with kernel sizes of 3×3 (3×3×3); (II) 2D (3D) batch normalization (BN) layers; (III) leaky rectified linear unit (LReLU) activation functions; (IV) 2D (3D) convolutional layers with kernel sizes of 3×3 (3×3×3) and strides of 2×2 (2×2×2) for down-sampling; (V) bilinear (trilinear) interpolation layers with kernel sizes of 2×2 (2×2×2) for up-sampling; and (VI) identity mapping layers that concatenate encoder and decoder features. To enforce non-negativity in the reconstructed kernel coefficient images, a rectified linear unit (ReLU) activation function is applied to the network output. Additionally, dropout layers with a probability of 0.3 are added after each convolutional layer to mitigate the risk of overfitting.
Evaluation metrics
We evaluated reconstruction performance using multiple quantitative metrics. In particular, we calculated the ensemble mean squared bias, variance, and MSE:
where denotes the ensemble mean of the image estimates, and Ns is the total number of noise realizations (Ns=10 in this study).
Additionally, the SNR between the ground truth and the reconstructed images is defined as:
where and denote the ground truth and reconstructed images, respectively.
Furthermore, we used the structural similarity index (SSIM) to quantitatively assess perceptual image quality, which is defined as:
where and denote the mean values of the true image and the reconstructed image, respectively, and denote the standard deviations (SDs) of the true and reconstructed images, and is the covariance between the reconstructed and true images. The constants c1 and c2 are defined as and , where L is the dynamic range of voxel intensity.
For the simulation studies with a known tumor region, we calculated the contrast recovery coefficient (CRC) for the region of interest (ROI) as follows:
where and denote the mean activities in the ROI and background region for the i realization, respectively; and represent their corresponding true mean values. The white matter region was selected as the background for calculating the noise SD, with the reported SD representing the average noise level across all realizations.
For the preclinical data, where the ground truth was unavailable, we defined a contrast recovery (CR) metric to compare the myocardial region against a low-activity background region:
where and denote the mean activities in the myocardial and selected background regions, respectively.
Implementation details
Simulation setup
The scanner used to generate the simulated data is the GE DST whole-body PET scanner (GE HealthCare, Chicago, IL, USA). This scanner is composed of 280 detector modules arranged in four rings, with each ring containing 70 modules. Each detector module consists of a 6×6 array of crystals, with each crystal measuring 6.3×6.3×30 mm3. The system has a diameter of 886 mm and a transverse field of view (FOV) of 700 mm. The spatial resolution at the system’s center is 6.09 mm. In the simulation study, a 2D brain phantom was generated based on an anatomical model from the BrainWeb database (37), as shown in Figure 2A. The phantom image had a resolution of 217×217 pixels and included a simulated tumor with 5-pixel radius embedded in the white matter region. The dynamic scanning protocol comprised 24 frames: 4×20, 4×40, 4×60, 4×180, and 8×300 s. Regional time-activity curves, as illustrated in Figure 2B, were assigned to the corresponding brain regions. We used the Michigan Image Reconstruction Toolbox (https://web.eecs.umich.edu/~fessler/irt/irt/) to generate the system matrix with an area-based ray-tracing projector. This projector accounts for the finite detector bin width and voxel footprint, thereby providing implicit modeling of the system’s geometric point-spread function. Forward projection was subsequently performed on the noise-free dynamic images to generate noise-free sinograms of size 249×210. Poisson noise was then introduced to simulate a total of 3×107 detected events for a 60-minute acquisition, with 20% of the events modeled as uniform background noise to simulate realistic conditions. Scatter events were not considered in this study. All reconstruction methods included attenuation and random corrections. A total of 10 independent noisy realizations were generated for statistical analysis.
The proposed algorithm was compared with conventional methods, including MLEM-G, KEM, STKEM, DIPRecon, and NeuralKEM. The prior images were constructed by rebinning the original 24 frames into three high-count frames using temporal averaging. Each composite sinogram was reconstructed using MLEM for 100 iterations, followed by Gaussian smoothing and intensity normalization. The spatial kernel was constructed using a kNN approach with 48 neighbors, and the temporal kernel used a Gaussian window with a width of three frames. To avoid dimension mismatches caused by network architecture, images were cropped to 208×208 pixels. Following the paper (34), DIPRecon was configured with four iterations for image updating and 50 subiterations for network training, with the hyperparameter ρ set to 5e−2, whereas NeuralKEM employed 200 subiterations for network training; the network architectures for both methods were consistent with those described in (34). In the proposed method, the number of DIP subiterations was empirically set to 200, and the ADMM hyperparameter ρ was empirically set to 1e−2 to balance convergence and image quality. For all three deep learning-based methods, the input to the neural network was the same composite image prior. All reconstruction algorithms were implemented on a 64-core central processing unit (CPU) (AMD EPYC 7763) with 252 GB of RAM. The neural network was implemented in PyTorch 1.12.0 (https://pytorch.org/) and trained on an NVIDIA (Santa Clara, CA, USA) A40 graphics processing unit (GPU) with 48 GB of memory. The network was trained using the Adam optimizer with a learning rate of 1e−3.
Preclinical setup
To further validate the simulation results, we evaluated the proposed algorithm on a preclinical dynamic PET dataset of a rat from an open-access Zenodo repository (38). The data were acquired with a Siemens Inveon PET scanner (Siemens, Erlangen, Germany) following intravenous injection via the tail vein of 21.4 MBq (578.37 µCi) 18F-fluorodeoxyglucose (18F-FDG), with a 60-minute dynamic list-mode acquisition measuring whole-body glucose metabolism. After downsampling the data to one-fifth of the original, 143,802,086 coincidence events were recorded and rebinned into 24 frames using the same time protocol as in the simulations (4×20, 4×40, 4×60, 4×180, and 8×300 s). No attenuation, scatter, or random corrections were applied.
The data were reconstructed using in-house list-mode PET reconstruction software developed at Shenzhen Bay Laboratory. The reconstructed image had dimensions of 128×128×160 with a voxel size of 0.78×0.78×0.80 mm3. As in the simulation study, the prior images were generated by rebinning the 24 frames into three high-count composite frames. Each composite frame was reconstructed using MLEM with 30 iterations, followed by Gaussian filtering and intensity normalization. The spatial kernel was computed using a kNN approach with nine neighbors, and the temporal kernel used a time window width of three frames. DIPRecon was configured with four iterations for image updating and 50 subiterations for network training, with the hyperparameter ρ was set to 4e5, while NeuralKEM employed 50 subiterations for network training. For the proposed method, 50 DIP subiterations were used, and the ADMM hyperparameter ρ was set to 4e5. All other settings were identical to those in the simulation study, and each reconstruction method was executed for 20 main iterations.
Results
Simulation results
Figure 3 compares the ground truth images and reconstructed results for frames 2, 12, and 24 at the 100th iteration using MLEM-G, KEM, STKEM, DIPRecon, NeuralKEM, and the proposed method, along with the SNR and SSIM metrics. MLEM-G reconstructions exhibited substantial noise that obscured anatomical structures and degrades spatial detail. Although KEM and STKEM provided improved noise suppression compared to MLEM-G, residual noise persisted in the white and gray matter regions, especially in early frames. For the DIPRecon method, the reconstruction results were poor for frames 2 and 12 under the current reconstruction parameters, whereas frame 24 showed better performance but experienced significant blurring of fine details. NeuralKEM produced relatively good results across different time frames, but considerable noise remained in the reconstructed images. In contrast, the proposed method consistently yielded the highest visual image quality, effectively preserving anatomical structures and suppressing noise across all time frames. Quantitatively, the proposed method achieved the highest SNR and SSIM values among all methods, confirming its superior reconstruction performance.
Figure 4 provides a comprehensive quantitative comparison of SNR, SSIM, and log-transformed MSE across all 24 frames at the 100th iteration for each reconstruction method. The proposed method outperforms the comparison methods by achieving higher SNR and SSIM and lower MSE values across all frames, demonstrating superior reconstruction performance. Furthermore, as shown in Figure 4C, NeuralKEM exhibits instability in reconstruction results under different noise realizations.
Figure 5 shows the variation of SNR and SSIM as a function of the iteration number from 0 to 100 for frames 2, 12, and 24 for each reconstruction method. MLEM-G displayed the typical pattern of an initial SNR improvement followed by significant degradation at later iterations due to noise amplification. KEM and STKEM were more stable than MLEM-G but still exhibited a gradual performance decline as the iterations progress. Under the current reconstruction parameters, the DIPRecon method exhibited poor performance in frames 2 and 12, but achieved better results in frame 24, with both SNR and SSIM showing a steady increase with more iterations. NeuralKEM demonstrated performance comparable to the proposed method in frame 2, although its SSIM values were less stable during early iterations, whereas in frames 12 and 24, it maintained a relatively stable trend across iterations, but both SNR and SSIM values consistently remained below those achieved by the proposed method. In contrast, the proposed method maintained stable or even increasing SNR and SSIM values as the iteration count increased, effectively mitigating the degradation observed with the conventional methods.
Figure 6 depicts the trade-off between ensemble mean squared bias and ensemble mean variance as a function of the iteration number from 10 to 100 for frames 2, 12, and 24. MLEM-G reconstructions show a rapid increase in both bias and variance, indicating instability and noise accumulation at later iterations. KEM and STKEM partially mitigate this trade-off but still show higher bias and variance than the proposed method. DIPRecon exhibits relatively poor bias and variance across all three frames, although in frame 24, both bias and variance are comparatively stable. NeuralKEM performs well in frames 2 and 12, but displays instability in frame 24, indicating less consistent reconstruction results under different noise realizations. The proposed method achieves a more favorable balance between bias and variance, maintaining low variance while avoiding excessive bias, thereby improving reconstruction stability.
Figure 7 plots the trade-off between CRC and background SD for the tumor ROI as a function of iteration number from 10 to 100 for frames 2, 12, and 24. MLEM-G achieved higher CRC with increasing iterations but at the cost of substantially elevated background noise. KEM and STKEM obtained a more balanced trade-off, attaining moderate CRC values with better-controlled background noise levels. DIPRecon exhibited poor CRC-SD performance across all three frames. NeuralKEM showed relatively good performance in frames 12 and 24, although it was less stable in frame 2. The proposed method achieved comparable CRC values to the comparison methods while attaining lower background noise in frames 2 and 12; in frame 24, it yielded the lowest background noise but at the expense of a lower CRC value. These results highlight the effectiveness of the proposed method in preserving contrast while suppressing noise.
Overall, the simulation results demonstrate that the proposed method significantly improves the quality of reconstructed dynamic PET images. It effectively preserves structural details and edges while mitigating the degradation typically associated with excessive iterations.
Preclinical rat data results
Figure 8 displays representative reconstructed slices from frames 16 and 24 at the 20th iteration for MLEM-G, KEM, STKEM, DIPRecon, NeuralKEM, and the proposed method. The MLEM-G reconstructions contained significant noise that obscured anatomical details. KEM and STKEM improved the visibility of myocardial structures and reduced noise relative to MLEM-G, but some residual noise remained. DIPRecon tended to oversmooth the images, removing noise but also blurring structural information. NeuralKEM produced images with relatively high levels of noise, and the myocardial region became blurred. The proposed method produced superior visual quality, with markedly reduced noise and well-preserved edges and structural details, confirming its enhanced performance in this preclinical scenario.
Figure 9 highlights the selected myocardial ROI and the corresponding background region, and it plots CR versus background SD and the mean activity versus ROI SD trade-off for the myocardial ROI in frames 16 and 24 from iteration 3 to 30. For MLEM-G, CR and mean ROI activity increased with successive iterations but at the cost of a rapid rise in background and ROI noise. KEM and STKEM achieved better trade-offs between CR and background noise as well as mean activity and ROI noise, indicating that the incorporation of spatiotemporal regularization can improve the accuracy of the reconstructed images. DIPRecon and NeuralKEM achieved relatively high CR and mean ROI activity values but exhibited significant instability during the iteration process. The proposed method outperformed all the other methods, achieving comparable CR and mean ROI activity with substantially reduced noise, further demonstrating its ability to recover contrast while preserving image fidelity under realistic preclinical conditions.
Performance on data at different count levels
To evaluate the robustness of the proposed algorithm at different count levels and to assess its low-dose applicability, we performed additional simulations with reduced total event counts. Figure 10A plots the SNR across all 24 frames for each method, whereas Figure 10B and Figure 10C show the evolution of SNR with iteration number for frames 2 and 24, respectively. Across all conditions, the proposed method yielded higher SNR compared to STKEM, DIPRecon, and NeuralKEM. Notably, even in low-count scenarios, where the conventional methods exhibited increased noise and instability, the proposed method maintained superior image quality, demonstrating its resilience and suitability in low-dose PET imaging.
Effect of method parameters
We also evaluated the impact of critical hyperparameters on the performance of the proposed algorithm. Guided by pilot experiments, hyperparameters were tuned using one-factor-at-a-time sweeps from empirically chosen baselines, selecting the setting that maximized SNR on the reconstructed images. Although this efficient method does not explore all parameter interactions, it consistently identified robust settings that produced high-performing configurations across all compared methods. Figure 11A presents the SNR for frames 2, 12, and 24 as a function of the number of DIP subiterations. Performance gains plateaued beyond a certain number of subiterations; consequently, 200 DIP subiterations were chosen for subsequent experiments as an optimal balance between reconstruction quality and computational cost. Figure 11B shows the SNR as a function of the number of main ADMM iterations, indicating that the reconstruction stabilizes and neither overfits nor degrades as the number of iterations increases. Figure 11C illustrates the effect of the ADMM penalty parameter ρ on reconstructed image quality. A relatively small value of ρ allows the proposed method to achieve superior image quality after sufficient iterations, whereas larger values lead to excessive smoothing of the images. Based on these findings, we empirically selected ρ=1e−2 to yield optimal image quality in the simulation study. Overall, although the method showed some sensitivity to the choice of ρ, it remained robust within a reasonable range of values. Figure 11D shows the effect of dropout in the neural network on reconstructed image quality. Although a higher dropout probability (P=0.5) resulted in a higher SNR, it also caused instability during the iteration process; therefore, P=0.3 was selected in this study. We also evaluated the impact of key hyperparameters on the performance of the reference methods. As shown in Figure 11E, the ADMM penalty parameter ρ in DIPRecon had a significant effect on reconstruction quality, and DIPRecon was more sensitive to hyperparameter selection compared to the proposed algorithm. We ultimately chose ρ=5e−2 to achieve high-quality images. Figure 11F demonstrates the impact of the number of DIP subiterations on NeuralKEM reconstruction quality, for which 200 subiterations were selected to obtain optimal results.
Effect of composite images
In this study, we evaluated the impact of using different numbers of composite images as the prior. As shown in Figure 12A and Figure 12B, both single and 3-composite images resulted in progressive SNR improvement with increasing iterations, and the differences between the two approaches were generally small, especially in later frames. However, Figure 12C demonstrates that employing three composite images as the prior yielded time-activity curves in the tumor region that were closer to the ground truth and achieved higher SNR compared to using a single composite image. The use of only one composite image may compromise temporal information in dynamic PET imaging. Based on these findings, we selected three composite images as prior information to better preserve temporal fidelity and enhance reconstruction quality.
Reconstruction time and memory usage
Table 1 summarizes the reconstruction time and memory usage for all methods. For frame-by-frame reconstruction algorithms (MLEM-G, KEM, DIPRecon, and NeuralKEM), we report the per-iteration time as well as CPU and GPU memory usage for frame 16. For algorithms that reconstruct all time frames simultaneously (STKEM and the proposed method), the reported values correspond to the time required to update all frames in one iteration and the associated CPU and GPU memory usage. Total reconstruction time refers to the overall duration required to complete the reconstruction of all time frames over 100 iterations in the simulation studies (2D) and 30 iterations for the real data (3D).
Table 1
| Methods | Per-iteration (minutes) | All iteration (minutes) | CPU memory usage (GB) | GPU memory usage (GB) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2D | 3D | 2D | 3D | 2D | 3D | 2D | 3D | ||||
| MLEM-G | 0.0007 | 0.1 | 1.44 | 45.57 | 0.36 | 2.78 | 0 | 0 | |||
| KEM | 0.0007 | 0.1 | 1.74 | 128.70 | 0.36 | 3.33 | 0 | 0 | |||
| STKEM | 0.016 | 2.53 | 1.77 | 138.97 | 0.54 | 7.43 | 0 | 0 | |||
| DIPrecon | 0.093 | 0.9 | 237.8 | 604.23 | 0.37 | 2.98 | 1.52 | 6.28 | |||
| NeuralKEM | 0.14 | 0.43 | 330.86 | 393.62 | 0.37 | 3.15 | 1.53 | 6.29 | |||
| Proposed | 0.16 | 3.53 | 17.58 | 167.73 | 0.57 | 7.43 | 1.65 | 16.25 | |||
For the frame-by-frame reconstruction algorithms MLEM-G, KEM, DIPRecon and NeuralKEM, we report the iteration time for the 16th frame and the memory usage of CPU and GPU. For the simultaneous reconstruction of all time frames by STKEM and our proposed algorithm, we report the time required to update all time frames once and the memory usage of CPU and GPU. Regarding the total reconstruction time, we report the total duration consumed to complete the reconstruction of all time frames. 2D, 2-dimensional; 3D, 3-dimensional; CPU, central processing unit; DIP, deep image prior; DIPRecon, DIP-based reconstruction; GB, gigabyte; GPU, graphics processing unit; KEM, kernelized expectation maximization; MLEM-G, MLEM with Gaussian post-filtering; NeuralKEM, DIP-based kernelized expectation maximization; STKEM, spatiotemporal kernel expectation maximization.
Overall, MLEM-G, and KEM showed the highest computational efficiency, with per-iteration times as little as 0.0007 min in 2D and 0.1 min in 3D, and total reconstruction times of 1.44 min (2D) and 45.57 min (3D) for MLEM-G. Neural network-based methods (DIPRecon, NeuralKEM, and the proposed method) were more computationally intensive; for example, the proposed method required 0.16 min per iteration (2D) and 3.53 min (3D), with total reconstruction times of 17.58 and 167.73 min, respectively. Regarding memory usage, the proposed method required up to 7.43 GB CPU memory and 16.25 GB GPU memory in 3D, which is higher than that required by traditional methods but comparable to other neural network-based approaches. These results demonstrate that the proposed method achieves a good balance between reconstruction quality and computational cost.
Ablation study
To disentangle the contribution of each component, we evaluated six configurations that incrementally add the spatial kernel, temporal kernel, and DIP to an MLEM baseline. For fairness, the DIP used in the ablations adopts the same network architecture and training protocol as in our full method. We report relative improvements with respect to MLEM on frame 12; Table 2 provides a component-method matrix with check marks. Compared with MLEM, introducing the spatial kernel improved SNR by 8.49 dB and SSIM by 0.07. Adding the temporal kernel on top of the spatial kernel yielded an improvement of 8.94 dB in SNR and 0.08 in SSIM. Using DIP alone resulted in an improvement of 8.76 dB in SNR and 0.08 in SSIM, which is comparable to STKEM, indicating strong regularization from the unsupervised prior. Combining DIP with the spatial kernel produces improvements of 9.68 dB in SNR and 0.12 in SSIM, evidencing complementary effects between the learned prior and spatial context. The full configuration achieved the largest gains: 9.93 dB in SNR and 0.13 in SSIM, confirming that all three components contribute synergistically.
Table 2
| MLEM | Spatial kernel | Temporal kernel | DIP | SNR (dB) | SSIM |
|---|---|---|---|---|---|
| √ | 3.13 | 0.66 | |||
| √ | √ | 11.62 | 0.73 | ||
| √ | √ | √ | 12.07 | 0.74 | |
| √ | √ | 11.89 | 0.74 | ||
| √ | √ | √ | 12.81 | 0.78 | |
| √ | √ | √ | √ | 13.06 | 0.79 |
Check marks (√) indicate inclusion of the MLEM algorithm, spatial kernel, temporal kernel, and DIP components in each method. SNR (dB) and SSIM are reported for the reconstructed frame 12 under each configuration. MLEM, maximum likelihood expectation maximization; DIP, deep image prior; SNR, signal-to-noise ratios; SSIM, structural similarity index.
Discussion
In this study, we proposed a dynamic PET reconstruction algorithm integrating the DIP framework with a spatiotemporal kernel, effectively overcoming common limitations observed in traditional methods. The proposed algorithm achieved significant improvements in image quality, showing substantial noise reduction and enhanced preservation of anatomical structures and fine details, both in simulated and preclinical experiments. These improvements were confirmed quantitatively by higher SNR and SSIM and lower MSE compared to conventional methods.
The proposed algorithm has several important advantages over existing kernel-based reconstruction techniques. Unlike KEM and STKEM, which use global kernels that may inadequately represent localized spatiotemporal variations, our method leverages the DIP network to adaptively learn complex spatial and temporal correlations directly from the measured data. This adaptive modeling helps to avoid over-smoothing and preserves high-contrast features. Additionally, the unsupervised nature of the DIP framework eliminates the need for external training data or anatomical priors, thus enhancing the generalizability and practicality of our method for clinical and preclinical PET imaging scenarios. Moreover, the ADMM-based iterative optimization decomposes the reconstruction problem into modular subproblems, facilitating the integration of different reconstruction methods or neural network architectures with minimal adjustments.
The performance analysis across iterations demonstrated another advantage of our method: robustness against noise amplification at later iterations. Unlike MLEM-G, KEM, STKEM, and DIPRecon, which exhibited deteriorating performance as the number of iterations increased, our proposed approach maintained stable or even improved image quality with successive iterations. This stability arises from the DIP-based regularization mechanism, which continuously denoises the kernel coefficient images during the iterative reconstruction process, effectively balancing data fidelity and image smoothness. Consequently, our approach achieves a superior bias and variance trade-off, ensuring both stability and accuracy in reconstructed PET images.
In addition to direct comparisons, we conducted a comprehensive ablation study to isolate the individual contributions of each component. Using frame 12 as a representative example, we observed that the inclusion of the spatial kernel, temporal kernel, and DIP denoiser each resulted in substantial improvements over the MLEM baseline. Notably, both the spatial kernel and DIP exhibited strong performance when applied individually, and their combination led to further enhancement of the results. The integration of all three components achieved the highest reconstruction quality. These findings provide strong evidence for the complementary roles of learned priors and spatiotemporal constraints in enhancing dynamic PET reconstruction.
Furthermore, our analysis of data at different count levels confirmed the robustness for low-dose PET imaging scenarios: even with significantly reduced event counts, our method consistently outperformed conventional reconstruction methods by maintaining higher SNR and lower background noise. This characteristic is particularly valuable in clinical practice, where reducing radiation dose without sacrificing diagnostic quality remains an important challenge.
Despite the demonstrated advantages, our study has several limitations warranting further investigation. Firstly, the validation was limited to a simulated 2D phantom and preclinical (small-animal) data, mainly due to the current lack of access to human or large-animal datasets and the ongoing process of obtaining ethical approvals. Future research will aim to validate the method using clinical-scale data, such as human or large-animal studies, across a range of imaging conditions, including variations in tracer kinetics and imaging conditions, to further improve the generalizability and robustness of the proposed approach. Secondly, reconstruction time remains a critical limitation, especially for experimental data. A significant computational burden arises from forward and backward projections and network training during each iteration. Future work will explore acceleration strategies such as network pretraining and more efficient architectures to reduce reconstruction time. Finally, the ADMM penalty parameter ρ requires manual tuning, which controls the regularization strength. Its optimal value may differ for other scanners, tracers, or noise conditions. Future studies will pursue adaptive or automated strategies, such as auto-tuning algorithms, to reduce this sensitivity and enhance robustness across diverse imaging conditions.
In conclusion, our novel dynamic PET reconstruction algorithm, integrating a DIP-based spatiotemporal kernel framework, demonstrated superior performance in image quality, robustness against noise amplification, and suitability for low-dose imaging compared to existing reconstruction methods. This study highlights the potential of DIP-based approaches to significantly advance the field of PET reconstruction, paving the way for enhanced diagnostic accuracy and reduced radiation exposure in clinical and research settings.
Conclusions
This study introduced a novel dynamic PET image reconstruction algorithm that integrates the DIP framework with a spatiotemporal kernel. The algorithm exploits the intrinsic spatial and temporal information embedded within dynamic PET data, eliminating the need for external anatomical priors or extensive training data. Compared with conventional reconstruction methods (MLEM-G, KEM, and STKEM) and deep learning-based methods (DIPRecon and NeuralKEM), the proposed method demonstrated substantial improvements in image quality, effective noise reduction, and superior preservation of structural details. It effectively mitigated image degradation at later iterations and maintained robust performance even under low-count conditions, indicating its potential for reducing radiation dose or shortening scan durations. Furthermore, the modular and flexible architecture of the algorithm facilitates seamless integration into existing PET image reconstruction workflows, enhancing its applicability for clinical diagnostics and PET imaging research.
Acknowledgments
None.
Footnote
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1018/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Tomasi G, Turkheimer F, Aboagye E. Importance of quantification for the analysis of PET data in oncology: review of current methods and trends for the future. Mol Imaging Biol 2012;14:131-46.
- Klein R, Beanlands RS, deKemp RA. Quantification of myocardial blood flow and flow reserve: Technical aspects. J Nucl Cardiol 2010;17:555-70.
- Varnäs K, Varrone A, Farde L. Modeling of PET data in CNS drug discovery and development. J Pharmacokinet Pharmacodyn 2013;40:267-79.
- Bentourkia M, Zaidi H. Tracer Kinetic Modeling in PET. PET Clin 2007;2:267-77.
- Wang G, Qi J. PET image reconstruction using kernel method. IEEE Trans Med Imaging 2015;34:61-71.
- Reader AJ, Zaidi H. Advances in PET Image Reconstruction. PET Clin 2007;2:173-90.
- Shepp LA, Logan BF. The Fourier reconstruction of a head section. IEEE Trans Nucl Sci 1974;21:21-43.
- Kinahan PE, Rogers JG. Analytic 3D image reconstruction using all detected events. IEEE Trans Nucl Sci 1989;36:964-8.
- Shepp LA, Vardi Y. Maximum likelihood reconstruction for emission tomography. IEEE Trans Med Imaging 1982;1:113-22.
- Hudson HM, Larkin RS. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans Med Imaging 1994;13:601-9.
- Browne J, de Pierro AB. A row-action alternative to the EM algorithm for maximizing likelihood in emission tomography. IEEE Trans Med Imaging 1996;15:687-99.
- Fessler JA. Penalized weighted least-squares image reconstruction for positron emission tomography. IEEE Trans Med Imaging 1994;13:290-300.
- Qi J, Leahy RM. Iterative reconstruction techniques in emission computed tomography. Phys Med Biol 2006;51:R541-78.
- Silverman BW, Jones MC, Wilson JD, Nychka DW. A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography. J R Stat Soc Ser B Stat Methodol 1990;52:271-324.
- Mumcuoğlu EU, Leahy RM, Cherry SR. Bayesian reconstruction of PET images: methodology and performance analysis. Phys Med Biol 1996;41:1777-807.
- Wang G, Qi J. Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization. IEEE Trans Med Imaging 2012;31:2194-204.
- Hou Q, Huang J, Bian Z, Chen W, Ma J. PET reconstruction via nonlocal means induced prior. J Xray Sci Technol 2015;23:331-48.
- Bowsher JE, Yuan H, Hedlund LW, Turkington TG, Akabani G, Badea A, Kurylo WC, Wheeler CT, Cofer GP, Dewhirst GW, Johnson GA. Utilizing MRI information to estimate F18-FDG distributions in rat flank tumors. In: IEEE Nuclear Science Symposium Conference Record; 2004:2488-92.
- Lu L, Ma J, Feng Q, Chen W, Rahmim A. Anatomy-guided brain PET imaging incorporating a joint prior model. Phys Med Biol 2015;60:2145-66.
- Hutchcroft W, Wang G, Chen KT, Catana C, Qi J. Anatomically-aided PET reconstruction using the kernel method. Phys Med Biol 2016;61:6668-83.
- Ehrhardt MJ, Markiewicz P, Liljeroth M, Barnes A, Kolehmainen V, Duncan JS, Pizarro L, Atkinson D, Hutton BF, Ourselin S, Thielemans K, Arridge SR. PET Reconstruction With an Anatomical MRI Prior Using Parallel Level Sets. IEEE Trans Med Imaging 2016;35:2189-99.
- Wang G. High Temporal-Resolution Dynamic PET Image Reconstruction Using a New Spatiotemporal Kernel Method. IEEE Trans Med Imaging 2019;38:664-74.
- Wang G, Ye JC, Mueller K, Fessler JA. Image Reconstruction is a New Frontier of Machine Learning. IEEE Trans Med Imaging 2018;37:1289-96.
- Gong K, Berg E, Cherry SR, Qi J. Machine Learning in PET: from Photon Detection to Quantitative Image Reconstruction. Proc IEEE Inst Electr Electron Eng 2020;108:51-68.
- Reader AJ, Corda G, Mehranian A, da Costa-Luis CO, Ellis S, Schnabel JA. Deep learning for PET image reconstruction. IEEE Trans Radiat Plasma Med Sci 2020;5:1-25.
- Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555:487-92.
- Häggström I, Schmidtlein CR, Campanella G, Fuchs TJ. DeepPET: A deep encoder-decoder network for directly solving the PET image reconstruction inverse problem. Med Image Anal 2019;54:253-62.
- Kim K, Wu D, Gong K, Dutta J, Kim JH, Son YD, Kim HK, El Fakhri G, Li Q. Penalized PET Reconstruction Using Deep Learning Prior and Local Linear Fitting. IEEE Trans Med Imaging 2018;37:1478-87.
- Wang X, Zhou L, Wang Y, Jiang H, Ye H. Improved low-dose positron emission tomography image reconstruction using deep learned prior. Phys Med Biol 2021;
- Gong Kuang, Guan Jiahui. Kyungsang Kim, Xuezhu Zhang, Jaewon Yang, Youngho Seo, El Fakhri G, Jinyi Qi, Quanzheng Li. Iterative PET Image Reconstruction Using Convolutional Neural Network Representation. IEEE Trans Med Imaging 2019;38:675-85.
- Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on Machine Learning (ICML); 2010:399-406.
- Gong K, Catana C, Qi J, Li Q. PET Image Reconstruction Using Deep Image Prior. IEEE Trans Med Imaging 2019;38:1655-65.
- Ulyanov D, Vedaldi A, Lempitsky V. Deep image prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018:9446-54.
- Li S, Gong K, Badawi RD, Kim EJ, Qi J, Wang G. Neural KEM: A Kernel Method With Deep Coefficient Prior for PET Image Reconstruction. IEEE Trans Med Imaging 2023;42:785-96.
- Zhao Z, Zou H, Liu H, Liang D, Xie S, Peng Q. Dynamic PET image reconstruction using spatiotemporal kernel method with deep image prior. In: IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room-Temperature Semiconductor Detector Conference (RTSD); 2024:1-1.
- Qi J, Leahy RM, Cherry SR, Chatziioannou A, Farquhar TH. High-resolution 3D Bayesian image reconstruction using the microPET small-animal scanner. Phys Med Biol 1998;43:1001-13.
- Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ, Evans AC. Design and construction of a realistic digital brain phantom. IEEE Trans Med Imaging 1998;17:463-8.
- Wettenhovi VV, Jokivarsi K. Preclinical PET data [dataset]. Kuopio: University of Eastern Finland; 2019. [cited 2024 Dec 4]. Available online: https://erepo.uef.fi/items/8c656145-d7ee-4ecd-8c4c-c3dfe0b028b7

