Sparse-view spectral CT reconstruction via a coupled subspace representation and score-based generative model

Jie Guo; Yizhong Wang; Shaoyu Wang; Zhizhong Zheng; Lei Li; Ailong Cai; Bin Yan

doi:10.21037/qims-24-2226

Original Article

Sparse-view spectral CT reconstruction via a coupled subspace representation and score-based generative model

Jie Guo , Yizhong Wang , Shaoyu Wang , Zhizhong Zheng , Lei Li , Ailong Cai , Bin Yan

Henan Key Laboratory of Imaging and Intelligent Processing, PLA Information Engineering University, Zhengzhou, China

Contributions: (I) Conception and design: J Guo; (II) Administrative support: B Yan, L Li; (III) Provision of study materials or patients: A Cai, Z Zheng; (IV) Collection and assembly of data: Y Wang, S Wang; (V) Data analysis and interpretation: J Guo, A Cai; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Ailong Cai, PhD; Bin Yan, PhD. Henan Key Laboratory of Imaging and Intelligent Processing, PLA Information Engineering University, No. 62 Science Avenue, Zhengzhou 450000, China. Email: cai.ailong@163.com; ybspace@hotmail.com.

Background: Spectral computed tomography (CT) demonstrates significant potential for clinical application by providing rich structural and compositional information about scanned objects. However, sparse-view scanning introduces streak artifacts during image reconstruction, severely degrading image quality. Conventional regularization-based methods exhibit inherent limitations in preserving fine details and edge structures. To address this challenge, this study aimed to enhance reconstruction quality by developing a novel framework that synergistically integrates subspace decomposition with deep generative priors, effectively leveraging both low-rank properties and data-driven representations inherent to spectral CT images.

Methods: To address these challenges, we proposed an unsupervised reconstruction framework for sparse-view imaging that synergistically integrates subspace representation with a score-based generative model (SGM), which exploits intrinsic information in the measurement signals. This framework leverages the low-rank prior of the subspace representation to guide the SGM in generating images that highly coincide with the ground truth. Specifically, high-dimensional spectral CT images are first decomposed into orthogonal subspace basis components and corresponding eigen-images, effectively reducing dimensionality while preserving spectral correlations. Subsequently, we employed a data-driven SGM to learn the statistical distribution of the image. This deep prior knowledge effectively supplements the limitations of low-rank regularization in capturing complex probability distribution of image. Afterward, we integrated an efficient alternating optimization algorithm that alternately updates subspace coefficients, enforcing consistency between physical measurements and learned priors. This integration results in a synergetic effect between model-driven low-rank priors and the data-driven distribution learning, significantly enhancing the accuracy of image and the model’s generalization across diverse datasets.

Results: In the simulation experiment, compared with the optimal comparison algorithm (Wavelet-SGM), the proposed algorithm has increased the peak signal-to-noise ratio (PSNR) by at least 3dB, and the structural similarity index measure (SSIM) by 2.54%. In the real data experiment, the results of this paper were the closest to the ground truth, with minimum error. Both qualitative and quantitative analysis demonstrated the promising and competitive performance of the proposed method in preserving details and reducing streaking artifacts.

Conclusions: Our framework established a new paradigm for spectral CT reconstruction through the synthesis of the model-driven low-rank prior with a data-driven deep prior, which yielded mutual enhancement and complementarity, collectively improving the overall quality of the reconstructed images. This dual mechanism enables comprehensive utilization of measurement signals while preventing hallucinated structures—a critical advancement for clinical applications where artifact-induced misdiagnosis carries significant risks. Our experimental results clearly demonstrate that the proposed method significantly outperforms baseline methods. On the whole, our work introduces a robust and practical sparse-view spectral CT reconstruction technique that exhibits exceptional detail preservation capabilities.

Keywords: Spectral computed tomography (spectral CT); sparse-view reconstruction; low-rank subspace representation; score-based generative model (SGM)

Submitted Oct 14, 2024. Accepted for publication Mar 12, 2025. Published online May 28, 2025.

doi: 10.21037/qims-24-2226

Introduction

Spectral computed tomography (spectral CT), with enhanced tissue contrast and dose efficiency, holds significant potential for clinical and industrial applications (1-3). However, adherence to the “As Low as Reasonably Achievable” (ALARA) principle (4) in the realm of medical CT necessitates reduced radiation doses, which introduces challenges such as streaking artifacts, noise amplification, and diminished signal-to-noise ratio (SNR). These issues may obscure critical anatomical details and compromise diagnostic accuracy, underscoring the need for advanced reconstruction algorithms to improve the accuracy of reconstructed images, thereby facilitating improved diagnostic and therapeutic decision-making.

When the energy distribution of photons across different channels is overlooked, spectral CT can be considered an extension of monochromatic CT. Existing reconstruction algorithms have been employed including filtered back projection (FBP), algebraic reconstruction techniques (ART) (5,6), tight frame iterative reconstruction (7), total-variation (TV) regularization (8,9), dictionary learning (10,11), energy-fusion sensing reconstruction (12), and statistical iterative reconstruction (13). However, these strategies fail to fully exploit the non-local self-similarity and global correlation across spectrum resulting in suboptimal image quality (14).

Given the high-dimension property of spectral CT images, an emerging trend involves modeling spectral CT images as high-order tensors, thereby establishing the reconstruction model based on low-rank prior (15,16) and subspace representation (17). The low-rank property of spectral CT images stems from the high correlation between different channels and non-local self-similarity within channels. The traditional low-rank regularization priors are based on the model assumption of the image’s macroscopic structure. By leveraging the redundancy and global correlation, the images are projected onto a low-dimensional subspace, simplifying the calculation. This regularization can effectively suppress noise and preserve the main structure of the image. However, it may have limitations when dealing with complex local details such as textures and edges. Subspace representation can achieve dimensionality reduction and decrease computational load. Model-based reconstruction algorithms mainly rely on manually designed priors grounded in subjective assumptions about the image. The difficulty lies in articulating these assumptions, often requiring numerous iterative refinements.

Deep learning (DL)-based methods have emerged as powerful tools for CT reconstruction, learning deep prior knowledge through an end-to-end framework (18). However, supervised DL-based image reconstructions require paired ground truth datasets as labels. Self-supervised DL approaches have achieved reconstruction results comparable to supervised learning (19-21). Wu et al. (22) explored the combination of DL-based methods and iterative reconstruction techniques. Iterative reconstruction schemes based on the learned experts’ assessment-based reconstruction network (23), and extended deep neural network schemes (24), have achieved commendable results in sparse-view CT image reconstruction. In the context of spectral CT imaging, Chen et al. (25) introduced a DL optimization method incorporating the model-based prior knowledge. Despite demonstrating progress, such DL methods often lack a rigorous theoretical foundation in neural network architecture design, potentially undermining their generative capacity and generalizability. From the perspective of statistical inference, it is noteworthy that current DL-based reconstruction methods primarily produce point estimation of images without capturing there underlying probability distribution. The recently proposed score-based generative models (SGM) indicate promising capabilities in accurately representing data distribution and generating new samples. SGM are a class of DL models used for generating data with complex distributions and have attracted remarkable attention due to their ability to estimate the probability distribution of the image, performing notably in CT image reconstruction (26,27). The sampling process of SGM, in similarity with the image reconstruction process, progressively removes the noise in the recovery of the underlying image. SGM are built on a data-driven framework, learning the complex probability distributions and subtle features of images from a large amount of training data. This deep prior enables the model to capture richer high-dimensional distributions and local detail features. As a result, the generated images not only conform to global properties such as low-rank, but also exhibit more realistic and reasonable details, thus compensating for the deficiencies of traditional regularization in depicting detailed information.

The reconstruction images without/with the guidance of prior are displayed in Figure 1. Although the SGM can capture accurate data distribution, their probabilistic generative nature may lead to the generation of unwanted structures often referred to as “false structures” in inverse imaging when the observation data is severely under-sampled (28,29). The A3 column in Figure 1 shows image reconstruction using SGM (30). Obvious false structures are present in the reconstructed images [as indicated by the red arrow in the regions of interest (ROIs)]. This problem poses major challenges for clinical diagnosis and has the potential to lead to misdiagnosis or missed diagnoses, directly impacting patient care and treatment outcomes. For instance, in pulmonary medical imaging, false structures could lead to misdiagnosis of malignant nodules or lung cancer, thereby inducing unnecessary patient anxiety and resulting in unwanted treatment. The SGM, previously effective in conventional CT, overlooks the global correlations spectrum and non-local self-similarity.

Figure 1 Reconstructed images from 50 projections. (A1-A4) Display channel 2 images with a display window of [0.001, 0.0026] cm⁻¹; while (B1-B4) present channel 5 images using the identical display parameters. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, wavelet-SGM method without low-rank prior guidance, and our proposed algorithm incorporating low-rank prior guidance. Red and yellow bounding boxes highlight ROIs, with corresponding zoomed views shown in the upper-right insets. Red arrows indicate areas demonstrating significant structural differences between reconstruction methods. FBP, filtered back projection; ROIs, regions of interest; SGM, score-based generative model.

To address these limitations, our work integrated subspace representation as the regularization with SGM to capture the low-rank property mentioned above within spectral CT images. The low-rank of high-dimension spectral CT images is leveraged, allowing for projection of the image onto a low-dimensional subspace via the product of orthogonal basis and representation coefficients. A model-driven low-rank prior was designed, encapsulating non-local self-similarity and global correlation of spectral CT images. Concurrently, the SGM encodes the probability distribution of the spectral CT images, proficiently extracting the crucial features and complex textural structures embedded within the spectral CT images. The experimental results, shown in Figure 1 (A4), indicate that the fidelity of the reconstructed image is significantly improved by the proposed method. This robust artificial prior knowledge, grounded in spectral CT image property, not only effectively constrains the search scope of the solution space but also facilitates the SGM in generating images with more faithful structures and finer details, avoiding the generation of false structures.

The proposed framework achieves deep integration of model-driven tensor low-rank priors and data-driven generative priors within an iterative optimization scheme, comprehensively distilling image information from measurement signals. Our principal innovations include the following:

We propose an unsupervised framework for spectral CT image reconstruction, synergistically combining model- and data-driven strategies via SGM, eliminating dependency on labeled training data while outperforming supervised methods. The SGM can describe the data generation process, which means that the generated images represent not merely a point estimation, but an embodiment of the probability distribution for the image. A key focus is the formulation of the robust reconstruction models and optimization to improve both interpretability and accuracy of the reconstruction.

The integration of a deep generative model and subspace representation offers dual advantages. On the one hand, it suppresses noise while preserving and enhancing image details, thereby providing a more comprehensive description of spectral CT images. On the other hand, subspace representation is theoretically rigorous but limited by its simplified assumptions, whereas deep generative models, despite lacking theoretical guarantees, demonstrate robust empirical performance. Their combination achieves a complementarity balance between theoretical assurance and practical effectiveness. The high-dimensional spectral CT images are dimensionally reduced using subspace decomposition, thereby boosting stability during the training process. The incorporation of data fidelity augments the precision of model sampling, avoiding the generation of unwanted structures. The SGM, trained on the American Association of Physics in Medicine (AAPM) dataset, exhibited significant performance when processing preclinical mouse datasets, thereby verifying the robust generalization of the proposed image reconstruction framework. Notably, it has indicated capabilities for adjusting to various probabilistic distributions. Comprehensive experiments emphasize the superiority of our method over representative model-driven based and the state-of-the-art DL-based reconstruction algorithms, highlighting its potential to serve as a promising strategy in spectral CT imaging.

The structure of this paper is as follows: the second section introduces the method, primarily focusing on subspace identification, the SGM, and the solution of the model. The third section is the experiment and analysis of the experimental results. The paper concludes with a brief discussion. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2226/rc).

Methods

Spectral CT imaging model

The spectral CT imaging can be approximated by a linear model, as shown in the following formula:

$y_{s} = A x_{s} + ε_{s}, s = 1, \dots, S$ [1]

where $x_{s} \in ℝ^{N_{h} N_{w}}$ and $y_{s} \in ℝ^{U_{y} N_{v}}$ represent the vectorized image and projection data of the s-th energy channel, respectively. $N_{h}$ and $N_{w}$ are the height and width of image. $U_{y}$ and $N_{v}$ are the number of detector and projection elements. The projection system matrix $A$ is the discrete expression of the line integral of the attenuation coefficient of the scanned object. $ε_{s}$ denotes the observed noise. The tensor $X \in ℝ^{N_{h} \times N_{w} \times S}$ is used to represent the spectral CT image composed of $S$ energy channels, and the tensor $Y \in ℝ^{U_{y} \times N_{v} \times S}$ represents the projection data. In the spectral CT image reconstruction model. A regularization term $Φ (X)$ is introduced to mitigate the impact of noise and streaking artifacts on the reconstruction process. The corresponding model is presented as follows:

$\underset{X}{m i n} \frac{1}{2} \sum_{s = 1}^{S} {‖ A x_{s} - y_{s} ‖}_{2}^{2} + μ Φ (X)$ [2]

where the first term represents the data fidelity, denoting the discrepancy between the target image and the measurement, the second is the regularization term that is reflected in the optimization process of the subspace representation and the SGM, and $μ$ denotes the regularization parameter. The data fidelity term in the above equation primarily ensures a high degree of consistency between the reconstructed image and the observed data. By minimizing this term, the error between the projection of the reconstructed image and the observed projection can be constrained within a certain range, thereby enabling the reconstructed image to accurately reflect the true structure of the object.

Subspace representation

Images derived from spectral CT, acquired via a single scan, exhibit different attenuation coefficients across different energy channels. However, from physical viewpoint, these images reveal similar structures across different energy channels, indicating global correlation and non-local self-similarity. This observation implies that the multi-dimensional tensor derived from spectral CT images possess the property of low-rank. The insight warrants consideration of the mapping of high-dimensional spectral CT images onto low-dimensional manifold (31). By harnessing this inherent self-similarity among different energy channels, specific energy channels with a high SNR can be pinpointed to represent other energy channels linearly (32). This can be accomplished via algorithms such as hyperspectral signal identification by minimum error (HySime) (33) or singular value decomposition (SVD) (34) to minimize the objective function and ascertain the optimal projection subspace. It is assumed that the image $X \in ℝ^{S \times N_{p}}$ can be approximated by $X = E Z$ , where the columns $E = [e_{1}, e_{2}, ..., e_{k}] \in ℝ^{S \times k}$ are a set of basis of $k (k \leq S)$ -dimensional subspace, which can well preserve the spectral information of the image. The variable $Z \in ℝ^{k \times N_{p}}$ denotes the representation coefficient of the image $X$ under the basis $E$ . The row vector of $Z$ , termed as eigen-images, embody the shared feature information across all channels in the pixel space, thereby preserving a greater amount of spatial domain information of the image. Without loss of generality, it is assumed that the basis $E$ is column-orthogonal, namely, $E^{T} E = I_{k}$ , where $I_{k}$ is an k-order identity matrix. Therefore, the spectral CT image can be dimensionally reduced and compressed into the eigen-subspace for processing through the transformation $Z = E^{T} X$ , which overcomes the disadvantage of having to implement iterative optimization in both the spatial and spectral domains, significantly alleviating computational burden and enhancing efficiency. According to the study by Zhuang et al. (35), the extracted eigen-images $Z$ can represent spectral CT image $X$ effectively, allowing for reconstruction of the spectral CT image through the reassemble of the denoised eigen-images. The basis $E$ can be solved by the following formula:

$E = \underset{E}{a r g m i n} {‖ X - E Z ‖}_{F}^{2}, s . t . E^{T} E = I_{k}$ [3]

where the ${‖ \cdot ‖}_{F}^{2}$ denotes the Frobenius norm. The optimization problem can be solved by HySime technique (33). Given the estimated basis $E$ , optimizing on the eigen-images $Z$ via the SGM becomes achievable.

SGM

There are two types of SGM: denoising score matching (DSM) (36,37) and stochastic differential equations (SDE) (38). The SGM captures intricate details and textures in medical images, thus facilitating more accurate diagnostics. We use the trained SGM to generate high-quality spectral CT images by sampling from the reverse denoised process. The forward process of SDE can be represented as follows:

$d x = f (x, t) d t + g (t) d w$ [4]

where $f (\cdot, t), g (t)$ are the drift coefficient and the diffusion coefficient, respectively, and $w$ represents standard Brownian motion. Song et al. (26) use SDE to process medical images, where the drift coefficient and diffusion coefficient are represented as follows:

$f (\cdot, t) = 0, g (t) = \sqrt{d [σ^{2} (t)] / d t}$ [5]

where $σ (t) > 0$ is a monotonically increasing function with $t$ . When starting from noisy $x_{T} \sim p_{T} (x)$ and gradually removing the noise, the clean image is obtained. This sampling process is similar to image reconstruction, which can be achieved through the reverse-time SDE process. The specific reverse-time SDE expression is as follows (39):

$d x = [f (x, t) - g^{2} (t) \nabla_{x} l o g p_{t} (x)] d t + g (t) d \tilde{w}$ [6]

where $\tilde{w}$ represents standard Brownian motion. Substituting Eq. [5] into Eq. [6] yields the following:

$d x = - \frac{d [σ^{2} (t)]}{d t} \nabla_{x} l o g p_{t} (x) + \sqrt{\frac{d [σ^{2} (t)]}{d t}} d \tilde{w}$ [7]

In order to represent the score function $\nabla_{x} l o g p_{t} (x)$ , we apply the DSM strategy to substitute $\nabla_{x} l o g p_{t} (x (t) | x (0))$ for $\nabla_{x} l o g p_{t} (x)$ . The optimization objective is as follows:

$θ^{*} = \underset{θ}{a r g m i n} E_{t} {η (t) E_{x (t) ~ p (x (t) | x (0)), x (0) ~ p_{d a t a}} [{‖ s_{θ} (x (t), t) - \nabla_{x (t)} l o g p_{t} (x (t) | x (0)) ‖}_{2}^{2}]}$ [8]

where $s_{θ} (x, t)$ is the gradient of the logarithm of the probability distribution $p (x)$ , termed the score network model. When the neural network training is finished, the estimation $s_{θ^{*}} (x, t)$ of score function is obtained. We carry out the sampling of SDE by the Euler-Maruyama method, which is used to solve Eq. [7] by $s_{θ^{*}} (x (t), t) \approx \nabla_{x} l o g p_{t} (x)$ (36).

Motivation

The SGM, by revealing the statistical probability structure of images, is capable of generating high-quality images. However, it may occasionally lead to the generation of false structures in the images. To rectify this, we incorporate appropriate prior knowledge to guide the generation of the SGM, pushing the outputted images closer to the ground truth. The method proposed herein combines subspace representation with the SGM in the sparse optimization reconstruction process of spectral CT, leveraging the strengths of each aspect while ensuring mutual enrichment. A toy illustration of motivation is visualized in Figure 2A. Let $R (x)$ denote the regularization prior of subspace representation, and $p (x)$ represent the probability distribution of the image. When we define a convex set on the real axis as $Ω : = [x_{1}, x_{2}]$ , and the optimal solution $x_{*} \in Ω$ , it is difficult to estimate accurately the solution with insufficient measurement. It is necessary to consider regularizing the priors $p (x)$ or $R (x)$ . Within $Ω$ , points $x_{3}$ are sought to maximize the probability distribution $p (x)$ , yet it may deviate from $x_{*}$ , implying that the optimal solution may not be $x_{3}$ but lies in the vicinity of it. The optimal solution concurrently minimizes the regularization prior $R (x)$ , namely $x_{4}$ , which is also not the perfect solution. However, with iterations of the balance $p (x)$ and $R (x)$ , a final estimation of $x_{n}$ may converge to $x_{*}$ .

Figure 2 The illustration of the motivation and the probabilistic graphical model. (A) Illustration of the motivation. The feasible region for the solution is

Ω

, denoted as

[x_{1}, x_{2}]

. The guidance provided by

R (x)

assists the maximum value point

x_{3}

of

p (x)

towards the optimal solution

x^{*}

of the spectral CT reconstruction model gradually. (B) Probabilistic graphical model for image reconstruction. CT, computed tomography.

Probabilistic graphical reconstruction model

The goal of spectral CT imaging is to maximize the posterior distribution $p (x | y)$ given the measurement $y$ . From the perspective of statistical modeling, image reconstruction is essentially a problem of estimating variables $x$ (which could represent the pixel values of the image) given some observed data $y$ (which could be the corrupted or partial observation of the image). This is typically formulated as a conditional probability problem, which can be represented as follows:

$x^{*} = a r g m a x_{x} p (x | y)$ [9]

In this work, as illustrated in Figure 2B, a subspace representation is introduced, constructing a probabilistic graphical model concerning $y$ in observation space, $x$ in pixel space, and $z$ in eigen-subspace. It is an undirected probabilistic graphical model (40). The approximate formulation of the joint distribution for these related variables can be stated as follows:

$\begin{array}{l} p (x, y, z) = \frac{1}{Γ} Ψ_{1} (x, y) Ψ_{2} (x, z) \propto \frac{1}{Γ} p (x, y) p (x, z) \\ = \frac{1}{Γ} p (y | x) p (x) p (x | z) p (z) \end{array}$ [10]

where $Γ$ is a normalization constant, called the partition function, $Ψ (\cdot, \cdot)$ denotes the edge potential function, $p (y | x)$ is the likelihood, the probability of observing our data given the variables, $p (x | z)$ is the conditional distribution, and $p (x)$ is the prior. We formulate a maximum posteriori probability model to infer images $x$ and eigen-image $z$ from the measurements $y$ , as follows:

$a r g m a x_{x, z} p (x, z | y) = a r g m a x_{x, z} p (x, y, z) / p (y)$ [11]

Considering Eq. [10], we obtain the following formula:

$p (x, z | y) = p (y | x) p (x | z) p (x) p (z) / p (y)$ [12]

where $p (y)$ is the evidence, the probability of observing the data. The probabilistic framework allows us to incorporate prior knowledge about the image (e.g., sparsity, smoothness) into the reconstruction, which can significantly improve the quality of the image, especially when $y$ is noisy or incomplete.

Optimization algorithm

We then solve the probabilistic graphical model by employing the optimization algorithm. The schematic illustration of the proposed method is depicted in Figure 3. In order to solve the subsequent optimization problem, we derive the estimates for the image $x^{*}$ and eigen-image $z^{*}$ :

$\begin{array}{l} (x^{*}, z^{*}) = \underset{x, z}{a r g m a x} l o g p (x, z | y) \\ = \underset{x, z}{a r g m a x} l o g p (y | x) + l o g p (x | z) + l o g p (z) + l o g p (x) \end{array}$ [13]

Figure 3 Visual representation of the proposed method. The low-rank spectral CT images are projected onto a low-dimensional subspace and subjected to preliminary denoising before being reassembled. Subsequently, the reconstructed spectral CT images are fed into the SGM. Finally, the model-driven and data-driven spectral CT image priors are embedded in an iterative optimization algorithm to mutually reinforce each other. CT, computed tomography; SDE, stochastic differential equations; SGM, score-based generative model.

Regarding the solution of $z$ , we then have:

$z^{*} = \underset{z ~ p r i o r}{a r g m a x} l o g p (x | z) + l o g p (z)$ [14]

Given the following prior relationship between $X, E, Z$ as

$X = E Z, E^{T} E = I_{k}$ [15]

The optimization problem of Eq. [14] can be reformulated as follows:

$z^{*} = \underset{z}{a r g m i n} {‖ x - E z ‖}_{F}^{2} + β Κ (z)$ [16]

where the regularization term $Κ (z)$ is used to encode the prior $p (z)$ . In this work, for an estimated basis $E$ solved by Eq. [3], a one-step iteration is applied to find the approximation of $z^{*}$ as

$z = D e n (E^{T} x, β)$ [17]

where $D e n (\cdot, \cdot)$ can be any denoising algorithm including model-based or pre-trained DL-based ones, and $β > 0$ is used to control the denoising strength. Here, the block-matching and 3D filtering (BM3D) technique (41) is chosen as a plug-and-play module and incorporated into the reconstruction process.

For the solution of $x$ , we need to solve the following problem:

$x^{*} = \underset{x}{a r g m a x} l o g p (x) + l o g p (x | z) + l o g p (y | x)$ [18]

Under the assumption that $e = y - A x$ follows zero-mean Gaussian noise, we have:

$l o g p (y | x) \propto - \sum_{s = 1}^{S} {‖ A x_{s} - y_{s} ‖}_{2}^{2}$ [19]

Considering the above constraints, we have:

$l o g p (x) + l o g p (x | z) + l o g p (y | x) \propto l o g p (x) - λ {‖ x - E z ‖}_{F}^{2} - ρ \sum_{s = 1}^{S} {‖ A x_{s} - y_{s} ‖}_{2}^{2}$ [20]

Noting that $x$ satisfying $\nabla_{x} l o g p_{t} (x) \approx s_{θ} (x, t)$ , we compute the derivate of Eq. [20] and denote as follows:

$h (x, t) = s_{θ} (x, t) - λ \nabla_{x} {‖ x - E z ‖}_{F}^{2} - ρ \nabla_{x} \sum_{s = 1}^{S} {‖ A x_{s} - y_{s} ‖}_{2}^{2}$ [21]

The first two terms in the above equation serve as regularization, incorporating low-rank prior and deep prior, both of which are essential for image denoising. Specifically, the low-rank prior is characterized by low-rank subspace representation, and the BM3D algorithm is employed for the preliminary denoising of feature images. Subsequently, the SGM is utilized to deeply explore the structural distribution and subtle features of the data, thereby enhancing and refining the image details. This process gradually improves the quality of the generated image, making them more faithful to the ground truth.

Regarding the solution of Eq. [21], it involves two sub-steps. Firstly, we sample from the SGM for the updated $x^{(n + 1 / 2)}$ , and then employ a separable quadratic surrogate (SQS) method to solve for the second and third terms in Eq. [21]. Specifically, for the first step, sampling is achieved as follows:

$x_{t}^{(n + 1 / 2)} = x_{t}^{(n)} + g^{2} (t) s_{θ^{*}} (x_{t}^{(n)}, t) Δ t + g (t) \sqrt{| Δ t |} ζ$ [22]

where $ζ ~ N (0, I)$ and $n$ represent the iteration indices of the outer loops. We tackle with the second and the third terms in Eq. [21] across each energy channel. For the convenience of subsequent description, we rewrite the solution of the problems of the second and third terms in Eq. [21] as the solution of the following problem in Eq. [23]:

$g_{1} (x) = \sum_{s = 1}^{S} {‖ A x_{s} - y_{s} ‖}_{2}^{2} + γ {‖ x_{s} - E z_{s} ‖}_{F}^{2}$ [23]

The parameters $λ$ , $ρ$ are combined into a single parameter $γ$ . The second sub-step of Eq. [23] can be updated by SQS,

$x_{s}^{(n + 1)} = x_{s}^{(n + 1 / 2)} - c_{s}^{(n + 1 / 2)} / (A^{T} A 1 + γ)$ [24]

where $c_{s}^{(n + 1 / 2)} = A^{T} (A x_{t, s}^{(n + 1 / 2)} - y_{s}) + γ (x_{t, s}^{(n + 1 / 2)} - E^{(n)} z_{s}^{(n)})$ . The long division stands for pixel-wise operation.

In this study, the simulated data provided by the Mayo clinic for the AAPM Low-Dose CT Grand Challenge (42) and the preclinical data provided by the study of Niu et al. (20) were used to verify the effectiveness of the algorithm proposed in this paper. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study has been granted exemption from the requirement for ethical approval and patient consent by the Ethics Committee of PLA Information Engineering University for not involving human trials, clinical diagnosis, treatment information, sensitive personal information, commercial interests, or causing harm to the human body as well as the retrospective nature of this study.

Summarizing the optimization process of the algorithm is listed in Algorithm 1.

Algorithm 1 The proposed reconstruction method

Stage 1: Training of DSM by AAPM Challenge Data.

Train the parameters

θ

of

s_{θ} (x, t)

with optimization objective Eq. [8], using AAPM challenge dataset.

Return: trained score-net

s_{θ^{*}} (x, t)

with optimized parameters

θ^{*}

.

Stage 2: Image reconstruction of spectral CT.

Require:

s_{θ^{*}} (\cdot, \cdot)

,

y, T, N, Δ t \leftarrow 1 / T

,

ζ ~ N (0, I)

For

n = 1 : N

1: Update

E^{(n)}

by SVD with Eq. [3]

2: Update

Z^{(n)}

with Eq. [16]

3: Reassemble

X^{(n)}

with

X^{(n)} = E^{(n)} Z^{(n)}

4: For

t = T - 1

to 0 do

5:

x_{t - Δ t} = x_{t} + g^{2} (t) s_{θ^{*}} (x_{t}, t) Δ t + g (t) \sqrt{Δ t} ζ

6: Update

X^{(n)}

with Eq. [24]

End

Return: reconstruction image

X

.

Results

Experimental design

To assess the proposed algorithm, extensive experiments were conducted. This section illuminates key aspects of the experimental procedure, encompassing data preparation with both simulated and preclinical experimental datasets. Furthermore, the comparative algorithms employed and their associated assessment metrics are included.

The dataset provided by the Mayo Clinic for the AAPM Low Dose CT Grand Challenge (42), including data from 10 patients has been utilized in this study, with 9 patients for training and 1 patient for testing. An 8-channel simulated spectral CT dataset including 5,410 images with 512×512 pixels was constructed utilizing the AAPM dataset, where the X-ray spectrum at 120 kVp was segmented into the eight energy channels: [52-64), [64-72), [72-80), [80-88), [88-96), [96-104), [104-112) and [112-120]. The size of the reconstructed images was set 512×512. The dataset for sparse-view CT reconstruction was derived from the full-view projection data by extracting 30, 60, and 90 views. The distances from the source-to-rotation center and the detector are set at 1,000 and 1,500 mm, respectively. The detector consisted of 1,024 elements, each measuring 0.388 mm. Poisson noise was added to projection data as follows:

$I_{p} = P o i s s o n (I_{0} e x p (- y_{s}))$ [25]

where $I_{0}$ denotes the incident flux of the X-ray, and is set to 5×10⁶ in our experiments.

The proposed algorithm was validated using preclinical data generated the study of Niu et al. (20). The dataset incorporates scans of a mouse at a standard dose, with an exposure time per view of 300 ms, culminating in 667 slices across 5 energy channels. The SGM neural network, trained using the AAPM challenge dataset, was employed for testing the preclinical data. Projection views were sampled within 360 range at 60, 90, and 120. Further experimental details can be found in the study of Niu et al. (20).

The comparison algorithms in this paper include FBP, fast iterative shrinkage-thresholding algorithm (FISTA) (43), FBP convolutional neural network (FBPConvNet) (44), Song-2022 (26), and Wavelet-SGM (30). FBP represents a classical analytical method for CT image reconstruction. FISTA is an algorithm predicated on TV regularization. FBPConvNet is a supervised DL CT imaging algorithm. Song-2022 represents an effort to incorporate SGM into CT image reconstruction, whereas Wavelet-SGM is a method merging wavelet and SGM for the same purpose. The source code for these comparison methods has been obtainable from the corresponding authors, with their suggestions and optimizations ingeniously incorporated to ensure fairness. The experiments were conducted on the public PyTorch platform (https://pytorch.org/), utilizing four NVIDIA RTX A6000 GPUs (NVIDIA, Santa Clara, CA, USA) and Intel(R) Xeon(R) Silver 4216 CPU @ 2.10 GHz and 128 GB RAM. The network was trained using the Adam optimizer (45) with learning rate $τ = 1 \times 1 0^{- 3}$ , over 100 epochs and a batch size of 4. In the task of sparse-view reconstruction, the parameter settings during the training of DSM are consistent with those described in the study by Song and Ermon (37). The total training time was approximately 300 hours.

The performance of the proposed algorithm and benchmark algorithms was evaluated using two standard metrics: peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). These metrics are indicative of image quality, with higher PSNR and SSIM values reflecting superior image quality.

The proposed algorithm involves two primary regularization parameters: $γ$ and $β$ . Here, we adjust the parameters based on the experimental outcomes. The parameter $β$ is set as 51 in simulation experiments and 35 in preclinical data. The parameter $γ$ is set as 350 in both simulation experiments and 300 in preclinical mouse experiment. When determining the parameter $γ$ , we integrated the existing data processing experience, the evaluation from imaging professionals on image quality, and the insights from relevant literature (30) to determine its optimal value. For the parameter $β$ , we initially established its approximate tuning range through simulation experiments. Subsequently, we conducted experimental debugging within a more precise interval, and the results are shown in Table 1. Overall, the data evaluation metrics exhibited minimal fluctuation. However, a detailed examination of the three specific metrics revealed optimal values when $β = 51$ . Therefore, we ultimately determined the value of the parameter $β$ to be 51.

Table 1

Evaluation metrics corresponding to different values of the parameters

Metric	Views
Metric	45	47	49	50	51	52	55
PSNR	28.7865	28.7896	28.7936	28.7981	28.7992	28.7959	28.7948
SSIM	0.8195	0.8210	0.8223	0.8249	0.8257	0.8241	0.8235
RMSE	5.4577e−03	5.4558e−03	5.4542e−03	5.4493e−03	5.4487e−03	5.4518e−03	5.4525e−03

PSNR, peak signal-to-noise ratio; RMSE, root mean square error; SSIM, structural similarity index measure.

Simulated data experiments

The numerical simulation experiments seek to address the challenge of sparse-view spectral CT image reconstruction, focusing particularly on 90, 60, and 30 sampling views. Figures 4-6 display the reconstruction results for 90, 60, and 30 sparse views, including two specified channels (specifically, channels 3 and 6) out of the total eight. Figure 4 portrays the experimental results of different algorithms at 90 projection views. The figures demonstrate that the FBP technique yields significant noise, thereby blurring the difference between image structure and streaking artifacts. Similarly, the FISTA approach gives rise to multitude of misleading artifacts and dark noise elements, leading to the deterioration of image fidelity, as vividly pointed out by the structure highlighted by the red arrow within Figure 4. In contrast, the FBPConvNet algorithm impressively enhances the image quality compared to both FBP and FISTA. However, upon scrutinizing the ROIs, a prominent deviation emerges between the reconstructed image and the reference, especially with respect to structural details. The subsequent three methods, all emerging from the SGM, significantly augment the reconstruction quality while diminishing streaking artifacts. Although the Song-2022 algorithm and Wavelet-SGM method adeptly balance between enhancing intricate structure preservation and noise suppression, they somewhat stumble in the complete recovery of structural details. Conversely, the proposed algorithm achieves exceptional reconstruction quality, characterized by the detailed textures, minimal noise, and laudable recovery of sharp edges, as validated by the ROIs imagery in Figure 4. Similarly, as depicted in Figure 5, which displays the experimental results of different methods at 60 projection views, the proposed method in this study consistently demonstrates superior performance than the others. The magnified ROIs show that our approach captures finer details and richer textures. In comparison to the supervised FBPConvNet technique, our method not only refines textural intricacies but also diminishes noise interference. Conversely, the SGM-oriented methods, specifically, Song-2022 and Wavelet-SGM, exhibit some misleading structural details in their ROIs, likely stemming from noise present in the training datasets, impeding the restoration quality. Our method, on the other hand, ensures precise detail reconstruction. For the ultra-sparse view with only 30 projections, Figure 6 illustrates the visual effects of different methods. At this point, the image reconstruction quality of other algorithms significantly deteriorates, yet the method proposed in this work remains efficient in retaining the details and sharp edges of the image. Additionally, the SGM-oriented methods (specifically, Song-2022 and Wavelet-SGM) without prior knowledge guidance failed to generate high-fidelity images as the algorithm we proposed, which can be validated by the accompanying error imagery in Figure 6.

Figure 4 Reconstructed images of different algorithms from 90 projections. (A1-A7) Display channel 3 images with a display window of [0.001, 0.0025] cm⁻¹; whereas (B1-B7) present channel 6 images with a display window of [0.001, 0.0018] cm⁻¹. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. Red bounding boxes highlight ROIs, with corresponding zoomed views shown in the upper insets. Red arrows indicate areas demonstrating significant structural differences between reconstruction methods. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; ROIs, regions of interest; SGM, score-based generative model.

Figure 5 Reconstructed images of different algorithms from 60 projections. (A1-A7) Display channel 3 images with a display window of [0.001, 0.003] cm⁻¹; while (B1-B7) present channel 6 images using the identical display parameters. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. Red and yellow bounding boxes highlight ROIs, with corresponding zoomed views shown in the right insets. Red arrows indicate areas demonstrating significant structural differences between reconstruction methods. The yellow dashed line indicates the pixel positions of the sectional view. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; ROIs, regions of interest; SGM, score-based generative model.

Figure 6 Reconstructed images and error images of different algorithms from 30 projections. (A1-A7) display channel 3 images with a display window of [0.001, 0.0025] cm⁻¹; whereas (C1-C7) present channel 6 images with a display window of [0.001, 0.0022] cm⁻¹. The other two rows (B1-B7) and (D1-D7) represent the corresponding error images. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. Red and yellow bounding boxes highlight ROIs, with corresponding zoomed views shown in the upper-right insets. Red arrows indicate areas demonstrating significant structural differences between reconstruction methods. (B1) and (D1) indicate the color bar with the range of [0, 1]. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; ROIs, regions of interest; SGM, score-based generative model.

Figure 7 exhibits profile curves conducted along the yellow dashed line in Figure 7. This comparison reveals pronounced oscillations in the profile curves of the FBP and FISTA algorithms, inducing significant deviations from the benchmark values. Even though the FBPConvNet algorithm denotes a notable enhancement, it still falls short in effectively restoring specific regions, as signified by the highlighted A zone marked with a red arrow in Figure 7. Alternatively, both the Song-2022 algorithm and the Wavelet-SGM algorithm exhibit minor deviations from the reference, closely aligning with the subtle details of the underlying structure. The method advanced in this paper outperforms all others in performance, producing the most accurate profile results. Specifically, the region emphasized by the red arrow in Figure 7 confirms that our approach converges more closely to the benchmark value than rival methods, furnishing grayscale values of exceptional accuracy relative to the alternatives.

Figure 7 Pixel values along the yellow dashed line in Figure 5 of different reconstruction image of channel 3 from 60 sparse views. The magnified area pinpointed by arrows A and B are displayed on the 1^st row, respectively. In the legend, from top to bottom, lines of different colors and shapes represent the ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; SGM, score-based generative model.

Material decomposition in the image domain is a post-processing technique. The efficiency of material decomposition significantly relies on the quality of the reconstructed images. Superior quality in the reconstructed images facilitates a smoother attainment of post-processed material decomposition. Hence, the precision of material decomposition becomes a benchmark to judge the reconstruction competency of various methods. We adopt a popular image-domain-based technique for material decomposition (46). Figure 8 displays the decomposed material images of two substances: bone and soft tissue. Compared to alternatives, the distinctions between bone and soft tissue as obtained using our proposed technique appear more detailed and perceptible, with sharp image edges, which can be observed by the magnified ROIs in Figure 8. The insights acquired from the decomposition outcomes underscore the effectiveness of our suggested method.

Figure 8 Two material decomposition result images (soft and bone tissue) from 90 projections. (A1-A7) Display channel 6 spectral CT images with a display window of [0.001, 0.003] cm⁻¹; whereas (B1-B7) present soft tissue images with a display window of [0.2, 0.99] and (C1-C7) present bone images with a display window of [0.001, 0.9]. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. Red and yellow bounding boxes highlight ROIs, with corresponding zoomed views shown in the upper-right insets. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; ROIs, regions of interest; SGM, score-based generative model.

In this study, we additionally employed quantitative evaluation metrics including SSIM and PSNR to assess the performance of different algorithms. We calculated these metrics for 100 distinct slices from the test AAPM dataset. Tables 2-4 present the statistical average outcomes of PSNR and SSIM for images reconstructed by different algorithms. Our method attains the highest SSIM value, surpassing other algorithms by a large margin, indicating its capability to restore the majority of internal image structures. Additionally, our method records the utmost value in PSNR comparison, signifying its proximity to ground truth and alignment with the earlier analysis of reconstruction outcomes. These quantitative findings align with the visual assessments presented in Figures 4-6. As indicated in Tables 2-4, the augmentation in projection views from 30 to 90 corresponds to a significant rise in reconstruction image evaluation metrics, thereby substantially enhancing the stability of the model. Table 5 illustrates the average evaluation metrics for the 8-channel images on the 452^nd slice of AAPM data, including of PSNR and SSIM. The results suggest that the proposed approach surpasses the current state-of-the-art algorithms across all assessment metrics. Particularly under the ultra-sparse view conditions, namely 30 projection views, our method provides the competitive PSNR and SSIM results.

Table 2

Statistical reconstruction PSNR/SSIM (10⁻³) for different algorithms on the AAPM challenge data from 90 projections, in terms of mean and standard deviation values over the test dataset (100 slices)

Metric	Method	FBP	FISTA	FBPConvNet	Song-2022	Wavelet-SGM	Ours
PSNR	Channel-3	16.67±2.0	31.04±2.1	31.96±2.5	38.74±4.0	41.67±2.0	42.54±2.3
PSNR	Channel-6	16.68±2.1	31.09±2.1	32.01±2.4	38.80±4.1	41.71±2.1	42.55±2.4
SSIM	Channel-3	0.238±3.4	0.896±2.2	0.950±2.1	0.981±3.4	0.986±3.2	0.990±3.4
SSIM	Channel-6	0.245±3.3	0.901±2.6	0.952±2.3	0.989±3.3	0.987±3.3	0.991±3.3

Data are presented as mean ± standard deviation. AAPM, American Association of Physics in Medicine; FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Table 3

Statistical reconstruction PSNR/SSIM (10⁻³) for different algorithms on the AAPM challenge data from 60 projections, in terms of mean and standard deviation values over the test dataset (100 slices)

Metric	Method	FBP	FISTA	FBPConvNet	Song-2022	Wavelet-SGM	Ours
PSNR	Channel-3	15.76±1.9	30.42±2.1	29.97±3.5	36.11±3.8	40.16±1.4	41.82±2.6
PSNR	Channel-6	15.78±1.8	29.21±2.0	30.39±3.4	36.08±3.9	40.17±1.5	41.53±2.5
SSIM	Channel-3	0.209±3.7	0.867±1.2	0.881±2.1	0.973±3.3	0.979±2.4	0.987±2.8
SSIM	Channel-6	0.210±3.6	0.870±1.3	0.883±2.2	0.972±3.1	0.980±2.5	0.990±2.7

Data are presented as mean ± standard deviation. AAPM, American Association of Physics in Medicine; FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Table 4

Statistical reconstruction PSNR/SSIM (10⁻³) for different algorithms on the AAPM challenge data from 30 projections, in terms of mean and standard deviation values over the test dataset (100 slices)

Metric	Method	FBP	FISTA	FBPConvNet	Song-2022	Wavelet-SGM	Ours
PSNR	Channel-3	15.51±1.8	29.23±2.1	29.97±3.5	34.17±3.5	35.98±1.6	39.31±2.6
PSNR	Channel-6	15.49±1.7	29.21±2.0	29.89±3.4	34.47±3.6	35.95±1.5	39.34±2.5
SSIM	Channel-3	0.189±3.6	0.867±1.2	0.942±2.3	0.944±3.2	0.946±1.4	0.970±2.3
SSIM	Channel-6	0.191±3.5	0.870±1.3	0.941±2.4	0.942±3.2	0.944±1.5	0.972±2.2

Data are presented as mean ± standard deviation. AAPM, American Association of Physics in Medicine; FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Table 5

Average metrics PSNR/SSIM for different algorithms on the 452nd slice of AAPM data from 8-channel images

Views	Metric	FBP	FISTA	FBPConvNet	Song-2022	Wavelet-SGM	Ours
30	PSNR	15.5024	28.7709	29.8608	34.3172	35.9650	39.3712
30	SSIM	0.1829	0.8506	0.9421	0.9355	0.9423	0.9709
60	PSNR	15.7130	29.7338	29.9376	36.1373	39.6597	40.9901
60	SSIM	0.2017	0.8630	0.8784	0.9710	0.9807	0.9882
90	PSNR	16.5436	30.6582	30.9926	38.5990	41.6827	42.5097
90	SSIM	0.2352	0.8954	0.9411	0.9825	0.9867	0.9909

AAPM, American Association of Physics in Medicine; FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Ablation study

Within the spectral CT image reconstruction framework proposed in this paper, a low-rank regularization is implemented via a subspace representation based on the global correlation and nonlocal self-similarity of images across different energy channels. Meanwhile, the SGM is integrated into the iterative reconstruction process, effectively leveraging the deep prior of images. To assess the efficiency of the subspace representation and SGM in image reconstruction, we conducted ablation studies employing the AAPM dataset, yielding both qualitative and quantitative evaluation results, as displayed in Figure 9 and Table 6, respectively. To validate the effectiveness of the proposed method, we chose 50 projection views for further experiments in the ablation study.

Figure 9 The reconstruction results from the ablation experiment at 50 projections, from top to bottom: (A1-A4) the reconstruction images of channel 6, (B1-B4) its corresponding error images, and (C1-C4) the amplified ROIs images. The images from left to right are as follows: the reference, our proposed algorithm, without subspace representation, and without SGM. Red bounding boxes highlight ROIs, with corresponding zoomed views shown in the bottom insets. Red arrows indicate areas demonstrating significant structural differences between reconstruction methods. ROIs, regions of interest; SGM, score-based generative model.

Table 6

Statistical reconstruction PSNR/SSIM (10⁻³) for ablation study on the AAPM challenge data from 50 projections, in terms of mean and standard deviation values over the test dataset (50 slices)

Metric	Ours	Without subspace representation	Without SGM
PSNR	39.9414±2.9	38.7654±2.7	28.2372±1.4
SSIM	0.9900±2.8	0.9682±2.8	0.8838±3.7

Data are presented as mean ± standard deviation. AAPM, American Association of Physics in Medicine; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Two regularization priors, namely the low-rank prior knowledge characterized by subspace representation and the deep prior represented by the SGM, are linked in a sequential manner, significantly augmenting reconstruction performance while preserving their distinctive advantages. The ablation study results are presented in Figure 9. The first to third rows correspond to the reconstructed image, error image, and the magnified image of the ROI, respectively. Column c shows the reconstruction result that only includes the SGM (without subspace representation), whereas column d displays the results using only the subspace representation (without SGM). It can be seen that column d contains substantial noise and streak artifacts, particularly obvious in the error image, indicating that optimization relying solely on the low-rank prior from subspace representation is insufficient to achieve satisfactory results. In contrast, column c exhibits improved image quality, and the streak artifacts have been effectively removed, but false small structures have appeared, which is clearly visible in the ROIs indicated by the red arrows. Column b presents the reconstruction result that combines the low-rank prior of subspace representation and the deep prior of the SGM, achieving the best performance. It not only removes the streak artifacts but also suppresses the noise well. Under the premise of ensuring data consistency, the generation of false structures is avoided to the greatest extent, and the reconstructed images are more faithful to the ground truth. More crucially, the regularization prior of the subspace representation effectively guides the SGM, ensuring the generating images resemble the ground truth, thereby eliminating the unwanted structure. Our proposed method leverages the synergistic benefits of these prior knowledge, culminating in improved image reconstruction outcomes.

The quantitative results of different regularization methods are displayed in Table 6. These metrics suggest that relying solely on regularization based on subspace representations results in a significant decrease in image reconstruction metrics. In contrast, the adoption of a regularization technique using standalone SGM has markedly improved reconstruction quality. Although this method may introduce some false structures, its remarkable improvement in overall image quality cannot be overlooked.

As can be seen from the images shown in column d of Figure 9, using only the BM3D algorithm in the subspace representation for eigen-images denoising (as shown in Figure 10) cannot achieve satisfactory results as shown in Table 7. Especially under the condition of ultra-sparse views sampling, the effect of this traditional regularization denoising algorithm is rather poor. As the number of sampling views increases, the performance of the traditional denoising algorithm improves substantially, as shown in Figure 11. With only 30 or 60 sampling views, the restoration effect is quite limited, with severe noise and artifacts in the images, making it difficult to identify the image structure. When the number of sampling views increases to 90 or 120, the reconstruction quality improves significantly, with clearer structural features despite the presence of some residual artifacts.

Figure 10 Simulated data phantom. (A) The CT image; (B,C) the first and second eigen-images, respectively. CT, computed tomography.

Table 7

BM3D denoising evaluation metrics at different sparse views

Metric	Views
Metric	30	60	90	120
PSNR	19.6356	29.9735	30.0437	30.7059
SSIM	0.5919	0.8867	0.9244	0.9621
RMSE	8.0660e−03	4.8728e−03	1.8338e−03	8.8156e−04

BM3D, block-matching and 3D filtering; PSNR, peak signal-to-noise ratio; RMSE, root mean square error; SSIM, structural similarity index measure.

Figure 11 The CT images by reconstructing the eigen-images after BM3D denoising at different sparse views. (A) The reference and (B-E) the reconstruction images at 30, 60, 90, and 120 projections, respectively. BM3D, block-matching and 3D filtering; CT, computed tomography.

Preclinical mouse experiments

Figures 12,13 illustrate the results of reconstruction using preclinical mouse data acquired at 60 and 90 projection views, respectively. It is evident that the FBP algorithm lacks noise suppression capabilities, leading to visible noise within the reconstructed images and poor restoration of edge structures. The results of FISTA indicate a certain level of noise suppression ability; however, it still exhibits blocky artifacts and partial loss of structural intricacies. In contrast, the FBPConvNet method and the following three SGM-based methods demonstrate superior image details recovery, indicating that reconstructions based on DL methods display marked improvement in image quality. Regardless, in the error images of Figure 13, the Song-2022 and Wavelet-SGM methods still contain some noisy artifacts. Inversely, our proposed method preserves sharp structural edges and image contours while capturing intricate details, as clearly depicted in the extracted ROIs in Figure 12 and the error images in Figure 13.

Figure 12 The reconstructed images for real data from 60 projections. (A1-A7) Display channel 1 images with a display window of [0.008, 0.05] cm⁻¹, whereas (B1-B7) present channel 3 images using the identical display parameters. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed algorithm. Red and yellow bounding boxes highlight ROIs, with corresponding zoomed views shown in the left and right insets, respectively. Red and yellow arrows indicate areas demonstrating significant structural differences between reconstruction methods. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; ROIs, regions of interest; SGM, score-based generative model.

Figure 13 Reconstructed images and error images of different methods from 90 projections. (A1-A7) Display channel 1 images with a display window of [0.009, 0.055] cm⁻¹, whereas (C1-C7) present channel 3 images with the identical display parameters. The other two rows (B1-B7) and (D1-D7) represent the corresponding error images. From left to right, the columns respectively represent: ground truth reference, FBP reconstruction, FISTA reconstruction, FBPConvNet reconstruction, Song-2022 algorithm, wavelet-SGM method reconstruction, and our proposed method. (B1) and (D1) indicate the color bar with the range of [0, 1]. FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; SGM, score-based generative model.

Table 8 outlines the quantitative results derived from preclinical data. Notably, our approach surpasses other comparative methods, exhibiting the highest quantitative metrics. A noteworthy observation is the starkly inferior performance of FBP on the preclinical dataset. Meanwhile, the TV-based FISTA method exhibits relatively stable results with a marginal enhancement in quantitative metrics. However, DL-based algorithms, particularly SGM-based methods, impressively outperform model-based approaches, delivering superior quantitative reconstruction indices across 60, 90, and 120 projections. Additionally, when compared with Song-2022 and Wavelet-SGM, the proposed algorithm achieves outstanding results, demonstrating superior performance. For instance, the PSNR in the first channel reconstruction metrics reaches impressive values of 35.34, 36.88, and 37.62 dB in the 60, 90, and 120 views, respectively. Throughout, our method consistently offered elevated accuracy, enhanced detail preservation, and sharper reconstruction image edges.

Table 8

Reconstruction PSNR/SSIM of preclinical mouse data by different algorithms from 60, 90, 120 views

Views	Method	FBP	FISTA	FBPConvNet	Song-2022	Wavelet-SGM	Ours
60	Channel-1	27.39/0.738	31.22/0.858	32.76/0.862	33.86/0.949	34.18/0.981	35.34/0.995
60	Channel-3	26.90/0.741	31.21/0.848	31.86/0.862	34.10/0.948	33.99/0.979	35.46/0.994
90	Channel-1	29.79/0.801	33.16/0.909	35.06/0.979	35.62/0.979	36.35/0.985	36.88/0.997
90	Channel-3	29.02/0.800	33.06/0.899	34.18/0.966	36.10/0.968	36.20/0.983	36.79/0.996
120	Channel-1	31.81/0.837	35.76/0.927	35.89/0.969	36.23/0.972	36.88/0.989	37.62/0.998
120	Channel-3	30.98/0.833	34.99/0.914	35.78/0.970	36.51/0.981	36.79/0.986	37.79/0.996

FBP, filtered back projection; FBPConvNet, FBP convolutional neural network; FISTA, fast iterative shrinkage-thresholding algorithm; PSNR, peak signal-to-noise ratio; SGM, score-based generative model; SSIM, structural similarity index measure.

Discussion

Data-driven DL methods surpass traditional models by effectively capturing intrinsic data characteristics, achieving superior performance in inverse imaging problems (47). SGM, as an emerging development in deep generative models, are designed to generate new samples that adhere to learned data distributions. This capability necessitates immense data training in neural networks to ensure the generated samples align with the learned probability distribution. To address persistent streaking artifacts in sparse-view reconstruction of spectral CT images, we have proposed an innovative reconstruction framework that synergistically integrates subspace representation with SGM. We rigorously validated the effectiveness of this approach through comprehensive experiments using both simulated sparse-view spectral CT data and a preclinical dataset.

The sequential integration of subspace representation and SGM creates a mutually reinforcing architecture, with each component leveraging its distinct advantages; this combination improves the quality of image reconstruction. The proposed method offers a clear, comprehensive demonstration for analyzing and processing spectral CT measurements from sparse views, significantly enriching theoretical research on spectral CT measurement signals. Our methodology enables thorough extraction of crucial reconstruction information from the spectral CT observations, optimizing the utilization efficiency of the measurement signals, and achieving maximum diagnostic value.

Comparative evaluations through visual assessments and quantitative metrics demonstrate that our integrated strategy outperforms standalone SGM implementations, including the Song-2022 and Wavelet-SGM approach. Nevertheless, our present study also has certain limitations, prompting further discussion. Primarily, due to the slow sampling of the SGM algorithm, our method requires extensive computational time (28,29). Additionally, the essential subspace eigen-images denoising process introduces moderate increases in spatial resolution requirements and computational overhead (48). Based on the generation mechanism of SGM, we plan to design an accelerated sampling algorithm to enhance the efficiency of the sampling process as our next step in the future work.

The subspace-derived low-rank regularization prior provides essential guidance for SGM optimization, thereby anchoring reconstructed images more closely with ground truth. This scheme effectively circumvents the generation of artifactual structures while maintaining the anatomical contours, edges, and textual details, which becomes particularly crucial in medical screening of the lungs and thyroid, as the emergence of false structures could potentially lead to misdiagnosis or missed diagnosis, thereby posing significant risks to patients. The symbiotic integration of model-driven low-rank prior with data-driven DL prior yields mutual enhancement and complementarity, collectively improving the overall quality of the reconstructed images.

Conclusions

We have proposed an unsupervised approach for spectral CT reconstruction combining model and data-driven strategies via SGM. The innovation lies in the dimensional reduction of high-rank spectral CT images through low-dimensional subspace mapping, achieved via orthogonal basis decomposition and SGM-optimized coefficient sampling. This allows us to capture the global correlation, non-local self-similarity of images, and the probability distribution comprehensively. The robustness and effectiveness of the proposed framework was verified using the AAPM challenge dataset and preclinical mouse datasets. The results indicated that the method excels in reducing artefacts while preserving structural details and texture perception. Our experimental results clearly demonstrate that the proposed method significantly outperforms baseline methods. Collectively, this work presents a practical sparse-view spectral CT reconstruction technique with exceptional detail preservation capabilities, establishing a new benchmark for clinical translation.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2226/rc

Funding: This work was supported by the National Natural Science Foundation of China (grant Nos. 62271504, 62101596, and 62201616), Natural Science Foundation of Henan (grant No. 252300420395), Technology Innovation Leading Talent Project of Zhongyuan (grant No. 244200510015), and China Postdoctoral Science Foundation (grant No. 2023T160792).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2226/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work, including ensuring that any questions related to the accuracy or integrity of any part of the work have been appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study has been granted exemption from ethical approval and patient consent by the Ethics Committee of PLA Information Engineering University for not involving human trials, clinical diagnosis, treatment information, sensitive personal information, commercial interests, or causing harm to the human body, as well as the retrospective nature of this study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Nakamura Y, Higaki T, Kondo S, Kawashita I, Takahashi I, Awai K. An introduction to photon-counting detector CT (PCD CT) for radiologists. Jpn J Radiol 2023;41:266-82.
Hsieh SS, Leng S, Rajendran K, Tao S, McCollough CH. Photon Counting CT: Clinical Applications and Future Developments. IEEE Trans Radiat Plasma Med Sci 2021;5:441-52.
Sigovan M, Si-Mohamed S, Bar-Ness D, Mitchell J, Langlois JB, Coulon P, Roessl E, Blevis I, Rokni M, Rioufol G, Douek P, Boussel L. Feasibility of improving vascular imaging in the presence of metallic stents using spectral photon counting CT and K-edge imaging. Sci Rep 2019;9:19850. [Crossref] [PubMed]
Berrington de González A, Mahesh M, Kim KP, Bhargavan M, Lewis R, Mettler F, Land C. Projected cancer risks from computed tomographic scans performed in the United States in 2007. Arch Intern Med 2009;169:2071-7. [Crossref] [PubMed]
Gordon R. A tutorial on ART (algebraic reconstruction techniques). IEEE Transactions on Nuclear Science 1974;21:78-93.
Andersen AH, Kak AC. Simultaneous algebraic reconstruction technique (SART): a superior implementation of the art algorithm. Ultrason Imaging 1984;6:81-94. [Crossref] [PubMed]
Zhan R, Dong B. CT image reconstruction by spatial-radon domain data-driven tight frame regularization. SIAM Journal on Imaging Sciences 2016;9:1063-83.
Wu W, Hu D, An K, Wang S, Luo F. A high-quality photon-counting CT technique based on weight adaptive total-variation and image-spectral tensor factorization for small animals imaging. IEEE Transactions on Instrumentation and Measurement 2020;70:1-14.
Niu S, Gao Y, Bian Z, Huang J, Chen W, Yu G, Liang Z, Ma J. Sparse-view x-ray CT reconstruction via total generalized variation regularization. Phys Med Biol 2014;59:2997-3017. [Crossref] [PubMed]
Wang S, Wu W, Cai A, Xu Y, Vardhanabhuti V, Liu F, Yu H. Image-spectral decomposition extended-learning assisted by sparsity for multi-energy computed tomography reconstruction. Quant Imaging Med Surg 2023;13:610-30. [Crossref] [PubMed]
Zhang Y, Mou X, Wang G, Yu H. Tensor-Based Dictionary Learning for Spectral CT Reconstruction. IEEE Trans Med Imaging 2017;36:142-54. [Crossref] [PubMed]
Wang S, Yu H, Xi Y, Gong C, Wu W, Liu F. Spectral-image decomposition with energy-fusion sensing for spectral CT reconstruction. IEEE Transactions on Instrumentation and Measurement 2021;70:1-11.
Mechlem K, Ehn S, Sellerer T, Braig E, Munzel D, Pfeiffer F, Noel PB. Joint Statistical Iterative Material Image Reconstruction for Spectral Computed Tomography Using a Semi-Empirical Forward Model. IEEE Trans Med Imaging 2018;37:68-80. [Crossref] [PubMed]
Willemink MJ, Noël PB. The evolution of image reconstruction for CT-from filtered back projection to artificial intelligence. Eur Radiol 2019;29:2185-95. [Crossref] [PubMed]
Wang S, Cai A, Wu W, Zhang T, Liu F, Yu H. IMD-MTFC: Image-domain Material Decomposition via Material-image Tensor Factorization and Clustering for Spectral CT. IEEE transactions on Radiation and Plasma Medical Sciences 2023;7:382-93.
Wu W, Liu F, Zhang Y, Wang Q, Yu H. Non-Local Low-Rank Cube-Based Tensor Factorization for Spectral CT Reconstruction. IEEE Trans Med Imaging 2019;38:1079-93. [Crossref] [PubMed]
Yu X, Cai A, Li L, Jiao Z, Yan B. Low-dose spectral reconstruction with global, local, and nonlocal priors based on subspace decomposition. Quant Imaging Med Surg 2023;13:889-911. [Crossref] [PubMed]
Zhang Z, Liang X, Dong X, Xie Y, Cao G. A Sparse-View CT Reconstruction Method Based on Combination of DenseNet and Deconvolution. IEEE Trans Med Imaging 2018;37:1407-17. [Crossref] [PubMed]
Kim S, Kim B, Lee J, Baek J. Sparsier2Sparse: Self-supervised convolutional neural network-based streak artifacts reduction in sparse-view CT images. Med Phys 2023;50:7731-47. [Crossref] [PubMed]
Niu C, Li M, Fan F, Wu W, Guo X, Lyu Q, Wang G. Noise Suppression With Similarity-Based Self-Supervised Deep Learning. IEEE Trans Med Imaging 2023;42:1590-602. [Crossref] [PubMed]
Zhang C, Chang S, Bai T, Chen X, “S2MS: Self-supervised learning driven multi-spectral CT image enhancement,” in 7th International Conference on Image Formation in X-Ray Computed Tomography, SPIE, 2022;12304:473-9.
Wu W, Hu D, Niu C, Broeke LV, Butler APH, Cao P, Atlas J, Chernoglazov A, Vardhanabhuti V, Wang G. Deep learning based spectral CT imaging. Neural Netw 2021;144:342-58. [Crossref] [PubMed]
Chen H, Zhang Y, Chen Y, Zhang J, Zhang W, Sun H, Lv Y, Liao P, Zhou J, Wang G. LEARN: Learned Experts' Assessment-Based Reconstruction Network for Sparse-Data CT. IEEE Trans Med Imaging 2018;37:1333-47. [Crossref] [PubMed]
Wang Y, Cai A, Liang N, Yu X, Zhong X, Li L, Yan B. One half-scan dual-energy CT imaging using the Dual-domain Dual-way Estimated Network (DoDa-Net) model. Quant Imaging Med Surg 2022;12:653-74. [Crossref] [PubMed]
Chen X, Xia W, Yang Z, Chen H, Liu Y, Zhou J, Wang Z, Chen Y, Wen B, Zhang Y. SOUL-Net: A Sparse and Low-Rank Unrolling Network for Spectral CT Image Reconstruction. IEEE Trans Neural Netw Learn Syst 2024;35:18620-34. [Crossref] [PubMed]
Song Y, Shen L, Xing L, Ermon S. Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations; 2022.
Guan B, Yang C, Zhang L, Niu S, Zhang M, Wang Y, Wu W, Liu Q. Generative modeling in sinogram domain for sparse-view CT reconstruction. IEEE Transactions on Radiation and Plasma Medical Sciences 2023;8:195-207.
Kazerouni A, Aghdam EK, Heidari M, Azad R, Fayyaz M, Hacihaliloglu I, Merhof D. Diffusion models in medical imaging: A comprehensive survey. Med Image Anal 2023;88:102846. [Crossref] [PubMed]
Croitoru FA, Hondru V, Ionescu RT, Shah M. Diffusion Models in Vision: A Survey. IEEE Trans Pattern Anal Mach Intell 2023;45:10850-69. [Crossref] [PubMed]
Wu W, Wang Y, Liu Q, Wang G, Zhang J. Wavelet-Improved Score-Based Generative Model for Medical Imaging. IEEE Trans Med Imaging 2024;43:966-79. [Crossref] [PubMed]
Bruske J, Sommer G. Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998;20:572-5.
Chang CI, Wang S. Constrained band selection for hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing 2006;44:1575-85.
Bioucas-Dias JM, Nascimento JM. Hyperspectral subspace identification. IEEE Transactions on Geoscience and Remote Sensing 2008;46:2435-45.
Golub GH, Loan CF. Matrix Computations, 3rd ed., ser. Mathematical Sciences. Baltimore, MD: Johns Hopkins University Press; 1996.
Zhuang L, Fu X, Ng KP, Bioucas-Dias JM. Hyperspectral image denoising based on global and nonlocal low-rank factorizations. IEEE Transactions on Geoscience and Remote Sensing 2021;59:10438-54.
Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research 2005;6:695-709.
Song Y, Ermon S. Generative modeling by estimating gradients of the data distribution. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.
Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations; 2021.
Anderson BD. Reverse-time diffusion equation models. Stochastic Processes and their Applications 1982;12:313-26.
Bishop CM, Nasrabadi NM. Pattern recognition and machine learning. New York: Springer; 2006;4:738.
Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans Image Process 2007;16:2080-95. [Crossref] [PubMed]
Low Dose CT Grand Challenge. Accessed: Apr. 6, 2017. [Online]. Available online: http://www.aapm.org/GrandChallenge/LowDoseCT/
Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2009;2:183-202.
Kyong Hwan Jin. McCann MT, Froustey E, Unser M. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Trans Image Process 2017;26:4509-22. [Crossref] [PubMed]
KingmaDPBaJ. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
Niu T, Dong X, Petrongolo M, Zhu L. Iterative image-domain decomposition for dual-energy CT. Med Phys 2014;41:041901. [Crossref] [PubMed]
Wang G. A perspective on deep imaging. IEEE Access 2016;4:8914-24.
Ren J, Wang Y, Cai A, Wang S, Liang N, Li L, Yan B. MISD-IR: material-image subspace decomposition-based iterative reconstruction with spectrum estimation for dual-energy computed tomography. Quant Imaging Med Surg 2024;14:4155-76. [Crossref] [PubMed]

Cite this article as: Guo J, Wang Y, Wang S, Zheng Z, Li L, Cai A, Yan B. Sparse-view spectral CT reconstruction via a coupled subspace representation and score-based generative model. Quant Imaging Med Surg 2025;15(6):5474-5495. doi: 10.21037/qims-24-2226

Sparse-view spectral CT reconstruction via a coupled subspace representation and score-based generative model

Introduction

Methods

Spectral CT imaging model

Subspace representation

SGM

Motivation

Probabilistic graphical reconstruction model

Optimization algorithm

Results

Experimental design

Table 1

Simulated data experiments

Table 2

Table 3

Table 4

Table 5

Ablation study

Table 6

Table 7

Preclinical mouse experiments

Table 8

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share