Enhancing photon-counting computed tomography reconstruction via subspace dictionary learning and spatial sparsity regularization

Qiaofang Xing; Ailong Cai; Zhizhong Zheng; Lei Li; Bin Yan

doi:10.21037/qims-24-1248

Original Article

Enhancing photon-counting computed tomography reconstruction via subspace dictionary learning and spatial sparsity regularization

Qiaofang Xing, Ailong Cai, Zhizhong Zheng, Lei Li, Bin Yan

Henan Key Laboratory of Imaging and Intelligent Processing, Information Engineering University, Zhengzhou, China

Contributions: (I) Conception and design: Q Xing; (II) Administrative support: L Li, B Yan; (III) Provision of study materials or patients: Q Xing, A Cai; (IV) Collection and assembly of data: Q Xing, A Cai; (V) Data analysis and interpretation: Q Xing, A Cai, Z Zheng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Bin Yan, PhD; Zhizhong Zheng, PhD. Henan Key Laboratory of Imaging and Intelligent Processing, Information Engineering University, No. 62, Science Avenue, High-tech Zone, Zhengzhou 450001, China. Email: ybspace@hotmail.com; zhengzz81@163.com.

Background: Photon-counting computed tomography (CT) is an advanced imaging technique that enables multi-energy imaging from a single scan. However, the limited photon count assigned to narrow energy bins leads to increased quantum noise in the reconstructed spectral images. To address this issue, leveraging the prior information in the spectral images is essential. This study aimed to develop an efficient algorithm that enhances image reconstruction quality by reducing noise levels and preserving image details.

Methods: To improve image reconstruction quality for photon-counting CT, we propose an algorithm based on the subspace-assisted multi-prior information, including global, nonlocal, and local priors, for spectral CT reconstruction. Specifically, the algorithm first maps spectral CT images, which exhibit global low-rank characteristics, to low-dimensional eigenimages using subspace decomposition. Then, similar image patches are extracted based on the manifold structure distance from highly correlated eigenimages in both spectral and spatial domains. These patches are stacked to form a nonlocal full-channel tensor group. Subsequently, non-convex structural sparsity is applied to this tensor group through adaptive dictionary learning, exploiting nonlocal similarity. Finally, the alternating direction method of multipliers (ADMM) is applied to solve the optimization model iteratively.

Results: The simulated walnut and real mouse data were applied to validate the effectiveness of the proposed method. In the simulation experiments, the proposed method reduced the root mean square error (RMSE) by 87.74%, 86.88%, 67.01%, 46.42%, and 13.51% compared to the respective state-of-the-art five comparison methods. The time taken for one iteration of the proposed algorithm was as low as 32.57 seconds, which was 92.07% less than framelet tensor nuclear norm [framelet tensor sparsity with block-matching method (FTNN)] method and 74.13% less than total variation regularization [tensor nonlocal similarity and local TV sparsity method (ITS_TV)] method, the other two tensor block-matching (BM)-based comparison methods. The material decomposition results in real mouse data further validated the accuracy of the proposed method for different materials.

Conclusions: The experimental results indicate that the proposed algorithm effectively reduces computational costs while improving the accuracy of image reconstruction and material decomposition, showing promising advantages over the compared method.

Keywords: Photon-counting computed tomography (CT); image reconstruction; subspace dictionary learning; eigen image tensor; graph-based block-matching (GBM)

Submitted Jun 20, 2024. Accepted for publication Nov 22, 2024. Published online Dec 30, 2024.

doi: 10.21037/qims-24-1248

Introduction

Photon-counting spectral computed tomography (CT), compared to traditional CT technology, utilizes the differences in the attenuation coefficients of materials at various X-ray energy levels to provide more detailed material information. This not only aids in the precise differentiation of materials but also facilitates its application in multiple fields, such as tissue characterization (1), material discrimination (2,3), and quantitative analysis of tissue components (4). However, due to the limited number of photons and counting rate restrictions in each energy bin of the photon-counting detector (PCD), the acquired spectral projections often exhibit significant noise levels (5), which compromise the quality of multi-energy images and the accuracy of material decomposition. In recent years, the reconstruction of high-quality images has become a topic of significant interest in the field of multi-energy CT (MECT).

To enhance image quality, researchers have developed various MECT reconstruction algorithms in the past few decades. These algorithms are primarily divided into two categories: model-based methods and deep learning (DL)-based methods.

Model-based algorithms typically reflect the physical processes of imaging through corresponding mathematical models. A common approach is to utilize traditional CT reconstruction algorithms to enhance the images of each spectral channel in MECT. However, directly applying methods such as filtered-back projection (FBP) and algebraic reconstruction technique (ART) (6) to spectral CT often results in significant noise and artifacts. To address these challenges, advanced techniques have been developed that utilize sparse optimization regularized models and algorithms, including total variation (TV) regularization (7), dictionary learning (8), and wavelet transform (9). Despite these advancements, a key limitation of these methods is their tendency to neglect the interrelation between CT images across different energy channels—a critical aspect of MECT data.

Recent advancements have advocated for using a tensor model to represent MECT images, leveraging interchannel correlations. For instance, Li et al. employed the Prior Rank, Intensity, and Sparsity Model (PRISM) based on tensors to address spectral CT reconstruction (10). Semerci et al. proposed a regularization method known as the Generalized Tensor Nuclear Norm for image reconstruction using tensor singular value decomposition (SVD) (11,12). Meanwhile, Rigie and La Rivière explored the use of Constrained Total Nuclear Variation as a regularizer for reconstructing spectral CT images (13). Zhang et al. presented a tensor-based dictionary learning technique for spectral CT reconstruction (14). Although these low-rank (LR) and tensor-based strategies effectively harness global correlations in the spectral domain, they often overlook crucial spatial characteristics of MECT data, such as sparsity (15) and nonlocal self-similarity (NSS) (16,17). The advent of the block-matching (BM) strategy (18) prompted researchers to explore tensor nonlocal features, resulting in the development of more sophisticated tensor decomposition and LR techniques for MECT reconstruction models. Xu et al. integrated the LR constraint of similar image blocks with dictionary sparse representation, resulting in enhanced reconstruction outcomes compared to conventional dictionary learning methods (19). Xia et al. directly represented similar image blocks in MECT as a third-order tensor, subsequently decomposing it into tensor LR and sparse components using principal component analysis, this approach offers a more direct depiction of the non-local similarity features within MECT images (20). Chen et al. further proposed an MECT reconstruction method named the fourth-order nonlocal tensor decomposition model for spectral CT image reconstruction (FONT-SIR) (21). Yu et al. introduced the framelet tensor nuclear norm [framelet tensor sparsity with block-matching method (FTNN)] to leverage the sparsity inherent in nonlocal similarities of MECT images, enhancing image quality (22).

To explore more prior information from the images, Zhang et al. integrated the CANDECOMP/PARAFAC (CP) decomposition with intrinsic tensor sparsity regularization to take advantage of nonlocal similarity. Moreover, they expressed spatial sparsity through TV regularization [tensor nonlocal similarity and local TV sparsity method (ITS_TV)] for PCD-based MECT reconstruction (23), achieving quite satisfactory results. In addition, Wu et al. introduced a non-local LR cube-based tensor factorization (NLCTF) method (24) and later proposed an enhanced version (25) for MECT reconstruction. Wang et al. further integrated CP tensor decomposition, tensor dictionary learning, and weighted TV regularization for spectral CT reconstruction (26). Yu et al. efficiently integrated multiple priors of low-dose spectral CT images using subspace decomposition, employing BM3D as a plug-and-play denoiser on each eigenimage and integrating the L0 norm of the gradient image within each MECT image channel, offering a new approach to enhance image quality and computational complexity (27). Recent advancements have highlighted the promising potential of integrating global spectral LR and NSS priors in MECT image reconstruction. However, nonlocal-based approaches heavily rely on BM operations to identify nonlocal similar patches, making them susceptible to noise and significant computational overhead. Balancing reconstruction quality and computational efficiency is paramount when implementing such algorithms.

DL has significantly advanced various image processing tasks, including spectral CT. General reconstruction methods based on DL, such as deep convolutional neural network (FBPConvNet) (28) and automated transform by manifold approximation (AUTOMAP) (29), have found application in this field. Furthermore, Wu et al. introduced a U-Net-based network (ULTRA) for sparse spectral CT reconstruction (30), whereas Chen et al. developed the SOUL-Net, a sparse and LR unrolling network for spectral CT image reconstruction (31). Wang et al. proposed the HITI-Net model, an interpretable hybrid-domain integrative transformer iterative network, for synchronously optimizing spectral CT reconstruction and material decomposition (32). Additionally, unsupervised learning methods have been suggested for spectral CT reconstruction tasks, as discussed in article (33). These methods can achieve high performance, but they often come with a large number of parameters, making them highly dependent on the characteristics of the training data sets. Moreover, DL faces challenges such as data overfitting, limited computational resources, poor model interpretability, and issues related to data privacy and security.

Considering the aforementioned concerns and drawing inspiration from the application of the tensor subspace decomposition technique (27,34-36), we present a new MECT model in this paper, which is designed by incorporating Subspace-assisted Global, NonLocal sparsity, and local TV regularizations [subspace-assisted globe, nonlocal sparsity and local TV regularization (SGNL_TV)] simultaneously. As depicted in the flowchart in Figure 1, we initially employ subspace decomposition techniques to transform MECT images into low-dimensional eigenimages, thereby representing the entire MECT image within a LR subspace. Subsequently, non-local structural sparsity is applied to the eigenimage tensor group through dictionary learning. Then, TV regularization captures spatial sparsity by enforcing sparsity on the gradient domain of individual channel images. This novel reconstruction model leverages the inherent local sparsity, global spectral, and spatial NSS properties of MECT data. The contributions of this paper are threefold. Firstly, we employ manifold structure distance as a similarity metric for patch comparison, reducing the noise sensitivity of BM operations. Secondly, we have developed an adaptive LR dictionary of low complexity that captures the intrinsic structural correlations of non-local eigenimage tensors. Thirdly, we have integrated a generalized iterated shrinkage algorithm into an alternating minimization framework to optimize the proposed MECT reconstruction problem. Extensive experiments demonstrate that the proposed method surpasses state-of-the-art techniques in both objective and subjective quality assessments.

Figure 1 Flowchart of the proposed SGNL_TV method. The input MECT images are processed through two parts: one part is the subspace-assisted global-nonlocal structural sparsity term, including graph-based blocking-matching, non-convex dictionary learning sparse coding. The other is the local TV regularization constraint of intrachannel. The combined effect generates the output as the new iteration. GBM, graph-based block-matching; SGNL_TV, subspace-assisted global nonlocal sparsity and local TV regularization; MECT, multi-energy computed tomography; TV, total variation.

Methods

Preliminaries

Imaging model for MECT reconstruction

MECT provides multiple sets of projections by segmenting the X-ray spectrum into distinct energy channels and applying suitable post-processing steps. When considering noise in the projections, the forward model for fan-beam geometry can be represented as a linear system:

$p_{s} = A x_{s} + n_{s}, s = 1, 2, ..., S$ [1]

where $s (s = 1, 2, 3, ..., S)$ signifies the index for different energy bins, $p_{s} \in ℝ^{T_{1} \times T_{2}}$ denotes the sinogram for s-th energy bin, T₁ and T₂ are detector element and the number of views, and their product equals to T. $x_{s} \in ℝ^{I_{1} \times I_{2}}$ represents the desired image reconstructed from the s-th energy bin sinogram, I₁ and I₂ are the height and width, and their product equals to I. $A \in ℝ^{T \times I}$ serves as a linear system operator that maps the image to the projection, whereas $n_{s} \in ℝ^{T_{1} \times T_{2}}$ denote the inconsistency in projection data. To recover the target image, the most efficient approach is to fully exploit the prior knowledge of the desired multi-energy images to regulate the solution space. This strategy aims to solve the following minimization problem:

$\min_{X} R (X), s . t . \sum_{s = 1}^{S} {‖ A x_{s} - p_{s} ‖}_{2}^{2} \leq ε$ [2]

where $X = {x_{s}, 1 \leq s \leq S} \in ℝ^{I_{1} \times I_{2} \times S}$ represents a three-order tensor composed of the MECT images ${x_{s}}_{s = 1}^{S}$ , $R (X)$ represents some prior constraint on the MECT images, which could be based on sparse representation, such as TV regularization, or based on prior knowledge of the image, such as tissue types, anatomical structures, and so on, or DL regularization, such as using convolutional neural networks (CNN) to learn the complex mapping relationships of the data. The parameter ɛ denotes the tolerance for inconsistencies and noises in the observed data, playing a crucial role in defining a feasible region. Consequently, crafting the regularization term is a pivotal aspect of MECT reconstruction.

LR representation for MECT based on subspace

Multi-channel image data can be viewed as a third-order tensor, encompassing two spatial dimensions and one spectral dimension. Typically, an N-order tensor could be expressed as $T \in ℝ^{I_{1} \times I_{2} \times \cdot \cdot \cdot \times I_{N}}$ . The tensor and matrix can be transformed through an unfolding or folding operator (37). A tensor $T$ can be expanded into a matrix $T^{(n)} \in ℝ^{I_{n} \times (I_{1} \dots I_{n - 1} I_{n + 1} I_{N})}$ along the n-th dimension, which named as tensor matricization $T^{(n)} = {unfold}_{n} (T)$ . Conversely, we can recover the tensor from the unfolding matrix $T^{(n)}$ by fold operator, denoted as $T = {fold}_{(n)} (T^{(n)})$ . Moreover, the mode-n multiplication of the tensor $T \in ℝ^{I_{1} \times I_{2} \times \dots \times I_{N}}$ and matrix $Y \in ℝ^{K \times I_{N}}$ is denoted by $Z = T \times_{n} Y$ , where $T \times_{n} Y = {fold}_{(n)} (Y T^{(n)})$ .

Given the significant correlations among different channels of MECT images, the subspace low- rank representation of the desired $X \in ℝ^{I_{1} \times I_{2} \times S}$ can be written as:

$X = Z \times_{3} B$ [3]

where $B \in ℝ^{S \times M} (M \leq S)$ is a semi-orthogonal basis matrix (e.g., $B^{T} B = I$ , I denote the identity matrix), which captures the shared subspace of diverse MECT image channels, whereas $Z \in ℝ^{I_{1} \times I_{2} \times M}$ signifies the coefficient image (e.g., eigenimage tensor). Various techniques can be utilized to deduce a subspace matrix $B$ from the MECT images, including methods such as hyperspectral signal identification by minimum error (Hysime) (38) or SVD (39). As B is orthogonal, $Z \in ℝ^{I_{1} \times I_{2} \times M}$ can be derived by projecting MECT images $X$ onto the subspace B, implying that $Z = X \times_{3} B^{T}$ . Consequently, the regularization term $R (X)$ in Eq. [2] can be formulated as:

${\hat{Z}, \hat{B}} = \underset{Z, B}{\arg \min} \frac{1}{2} {‖ X - Z \times_{3} B ‖}_{F}^{2} + λ R (Z)$ [4]

where $R (Z)$ is the regularization function concerning the eigenimage tensor. By applying subspace decomposition techniques, the regularized constraint on the multispectral image itself in Eq. [2] is transformed into a regularization constraint on the eigenimage tensor. By regularizing the eigenimage tensor, we can restrict the solution space of the spectral CT image, thereby improving the image reconstruction outcomes.

Structural sparse coding model with BM operation

The BM operation plays a crucial role in non-local-based image denoising algorithms (40,41); it is a straightforward yet powerful method for locating similar patches to a specified exemplar patch. To elaborate clearly, when presented with a standard image $x \in ℝ^{N}$ , it is segmented into n overlapping patches $x_{i} \in ℝ^{\sqrt{b} \times \sqrt{b}}, i = 1, ..., n$ , where b is the size of the small image blocks. For each exemplar patch $x_{i}$ , the Euclidean distance serves as the similarity metric for comparing various patches within a local window W×W, such that:

$d = {‖ x_{i} - x_{k} ‖}_{2}$ [5]

where $x_{k}$ denotes the candidate patch for the exemplar patch $x_{i}$ . Subsequently, we select m most similar patches, ranking them in descending order based on their Euclidean distance (i.e., the first m smallest values among all d values) to the exemplar patch $x_{i}$ . These patches form the columns of each data matrix $X_{i} \in ℝ^{b \times m}$ , termed a non-local group $X_{i} = {x_{i, 1}, x_{i, 2}, \dots, x_{i, m}}$ with $x_{i, m}$ being the m-th similar vectorized patch, and sparsely represented by solving the subsequent minimization problem (42-44):

${\hat{G}}_{i} = \underset{G}{\arg \min} (\frac{1}{2} {‖ X_{i} - D_{i} G_{i} ‖}_{F}^{2} + τ {‖ G_{i} ‖}_{1}) \forall i$ [6]

where the symbol ${‖ \cdot ‖}_{F}$ signifies the Frobenius norm, which measures the difference between the data X_i and the data reconstructed through the dictionary D_i with sparse coefficients G_i. The second term of the objective function is a regularization term, L1 norm, to characterize sparsity. The parameter τ controls the trade-off between data fitting and sparsity. The goal is to find a collection of group sparse codes ${{\hat{G}}_{i}}_{i = 1}^{n}$ such that the entire image $x$ can be reconstructed through the dictionary D_i and the sparse coefficients ${{\hat{G}}_{i}}_{i = 1}^{n}$ .

Graph-based transform

Considering a graph $G (V, ϑ, W)$ , $V$ and $ϑ$ are the sets of nodes and edges in $G$ , the entry $w_{i, j}$ of the adjacency matrix $W$ is a non-negativate value (0 or 1) that characterizes the weight of the edge connecting nodes i and j. A graph data $x \in ℝ^{N}$ is one of the discrete samples of nodes $V$ in a graph $G$ , where $N = | V |$ . Each edge is assigned a weight $w_{i, j} = \exp ({‖ x_{i} - x_{j} ‖}_{2}^{2} / σ^{2})$ to reflect the similarity between adjacent nodes $x_{i}$ and $x_{j}$ , where the selected parameter σ regulates the sensitivity to noise for the similarity measure between adjacent nodes (45). A degree matrix D is a diagonal matrix with the diagonal element being $d_{i, i} = \sum_{j} w_{i, j}$ , The graph Laplacian matrix L can then be computed as L = D − W, and the graph-based transform P can be derived from the eigenvectors of L. For a more comprehensive understanding of the graph-based transform, readers are encouraged to consult (46).

The proposed model

The proposed SGNL_TV model for MECT image reconstruction

MECT images exhibit characteristics such as local sparsity, spatial NSS, and spectral correlation. The local sparsity is inherited from traditional CT images, the spectral correlation stems from the similarities among multi-energy bin images, and the NSS is attributed to the abundance of similar image patches. By leveraging these intrinsic properties, our proposed reconstruction model, SGNL_TV, is designed to effectively incorporate these priors.

$\min_{x, B, Z} (\frac{ρ}{2} {‖ X - Z \times_{3} B ‖}_{F}^{2} + λ R_{N L} (Z) + \sum_{s = 1}^{S} {‖ x_{s} ‖}_{T V}), s .t . \sum_{s = 1}^{S} {‖ A x_{s} - p_{s} ‖}_{2}^{2} \leq ε, x_{s} \geq 0,$ [7]

where ${‖ x_{s} ‖}_{T V} = {‖ \nabla_{h} x_{s} ‖}_{1} + {‖ \nabla_{υ} x_{s} ‖}_{1}$ , $\nabla_{h}$ and $\nabla_{υ}$ are the discrete directional gradient operators, corresponding to the horizontal and vertical directions, respectively. A nonnegative parameter ρ is used to balance the local sparsity and the global non-local regularization term based on subspace decomposition. The Frobenius norm term represents the difference between subspace decomposition $Z \times_{3} B$ and the MECT images $X$ . $R_{N L} (Z)$ is the designed graph-based non-convex LR prior related to eigenimages, which will be detailed below.

After obtaining the degraded coefficient image $Z \in ℝ^{I_{1} \times I_{2} \times M}$ by decomposing the noisy MECT image $X \in ℝ^{I_{1} \times I_{2} \times S}$ using the global spectral low rank prior (47), we begin by extracting overlapping full-channel patches from $Z$ . Subsequently, for each exemplar full-channel patch $Z_{f b p} \in ℝ^{\sqrt{b} \times \sqrt{b} \times M}$ of $Z$ , the BM operation is employed to search for similar non-local full-channel patches within a local spatial window W×W, forming a non-local full-channel group $Z_{i} \in ℝ^{b \times m \times M}$ , i.e., $Z_{i} = {Z_{i, 1}^{(3)}, Z_{i, 2}^{(3)}, \dots, Z_{i, m}^{(3)}}$ . Specifically, $Z_{i, m}^{(3)}$ represents the m-th similar non-local full-channel patch of the i-th group $Z_{i}$ , and $Z_{i, 1}^{(3)} \in ℝ^{M \times b}$ denotes the unfolding version of $Z_{f b p}$ in the spectral direction.

Similar to the traditional structural sparse coding model Eq. [6] discussed in section “Structural sparse coding model with BM operation”, we can obtain each non-local full-channel group $Z_{i}$ by solving the following minimization problem:

${{\hat{G}}_{i}, {{\hat{D}}_{i}^{(k)}}} = \underset{{\hat{G}}_{i}, {{\hat{D}}_{i}^{(k)}}}{\arg \min} \frac{1}{2} {‖ Z_{i} - G_{i} \times_{1} D_{i}^{(1)} \times_{2} D_{i}^{(2)} \times_{3} D_{i}^{(3)} ‖}_{F}^{2} + τ {‖ G_{i} ‖}_{1} \forall i$ [8]

where $G_{i}$ and $D_{i}^{(k)} (k = 1, 2, 3)$ respectively denote the coefficient core tensor and the $k -th$ dimensional dictionary of the i-th tensor group $Z_{i}$ . For each degraded full-channel group $Z_{i}$ , once the coefficient core tensor ${\hat{G}}_{i}$ and the dictionaries ${{\hat{D}}_{i}^{(k)}}$ are obtained by solving Eq. [8], and the latent clean coefficient image group ${\hat{Z}}_{i}$ can be reconstructed as ${\hat{Z}}_{i} = {\hat{G}}_{i} \times_{1} {\hat{D}}_{1}^{(1)} \times_{2} {\hat{D}}_{2}^{(2)} \times_{3} {\hat{D}}_{3}^{(3)}$ . After acquiring all coefficient image groups ${{\hat{Z}}_{i}}_{i = 1}^{n}$ , the denoised coefficient image $\hat{Z}$ is obtained by placing the groups back in their original locations and averaging the overlapped pixels.

However, the procedures for capturing spatial information of the original image $X$ typically encounter certain limitations: (I) due to the sensitivity of the BM operation to noise, the denoised MECT images are prone to generating unwanted visual artifacts. (II) The dictionary is learned from each mode sequentially, resulting in significant time consumption. These limitations will be addressed in the subsequent section.

Graph-based BM (GBM)

In the following section, we introduce an innovative method known as GBM. This technique employs a graph-based domain distance metric to assess the similarity between patches, different from the conventional BM operation that relies on Euclidean distance (Eq. [5]).

Specifically, we start by deriving an average image $\bar{Z} \in ℝ^{I_{1} \times I_{2}}$ from the degraded coefficient image $Z \in ℝ^{I_{1} \times I_{2} \times M}$ using the spectral average operator. Subsequently, $\bar{Z}$ is partitioned into n overlapped patches $z_{i} \in ℝ^{b \times 1}, i = 1, ...., n$ . For each exemplar patch $z_{i}$ within a local window W×W, we generate m_ccandidate patches ${z_{i, j}}_{j = 1}^{m_{c}}$ . To find patches that have similar structures to the exemplar patch, we employ the graph transform basis $P_{i}$ to project both the exemplar patch $z_{i}$ and all candidate patches ${z_{i, j}}_{j = 1}^{m_{c}}$ into the graph domain (for more detailed explanation on learning the graph transform basis, please refer to section “Graph-based transform”). Then, let

$d = {‖ P_{i} (z_{i} - z_{i, j}) ‖}_{1}$ [9]

from the entire set of d distances, the m shortest ones are selected. This process enables us to find the spatial pixel coordinates of the most m (m < m_c) similar non-local patches for each exemplar patch $z_{i}$ . Then we can obtain a non-local full-channel group $Z_{i} \in ℝ^{b \times m \times M}$ , i.e., $Z_{i} = {Z_{i, 1}^{(3)}, Z_{i, 2}^{(3)}, \dots, Z_{i, m}^{(3)}}$ . This method, known as the proposed GBM approach, effectively reduces noise sensitivity when identifying non-local similar patches.

Adaptive LR dictionary learning

In this subsection, we introduce an adaptive LR dictionary for the proposed sparsity regularizer $R_{N L} (Z)$ which is based on non-locality. Initially, we expand the non-local full-channel group $Z_{i} \in ℝ^{b \times m \times M}$ into a matrix $Z_{i}^{(2)} \in ℝ^{c \times m}$ , where $c = b \times M$ . Subsequently, we apply SVD to this matrix, denoted as:

$Z_{i}^{(2)} = U_{G_{i}} \sum_{G_{i}} V_{G_{i}}^{T} = \sum_{j = 1}^{n 1} δ_{G_{i, j}} u_{G_{i, j}} v_{G_{i, j}}^{T}$ [10]

where $\sum_{G_{i}} = diag (δ_{G_{i, 1}}, δ_{G_{i, 2}}, \dots, δ_{G_{i, n_{1}}})$ is a diagonal matrix and $n_{1} = \min (m, c)$ , here $u_{G_{i, j}} \in ℝ^{m \times 1}$ and $v_{G_{i, j}} \in ℝ^{c \times 1}$ are the columns of $U_{G_{i}} \in ℝ^{m \times n_{1}}$ and $V_{G_{i}} \in ℝ^{c \times n_{1}}$ , respectively, we define the adaptive low rank dictionary $D_{G_{i}}$ for each non-local group $Z_{i}^{(2)}$ as:

$D_{G_{i}} = [d_{G_{i, 1}}, d_{G_{i, 2}}, \dots, d_{G_{i, n_{1}}}]$ [11]

where $d_{G_{i,} j} = u_{G_{i,} j} ν_{G_{i,} j}^{T}$ , $j = 1, 2, \dots, n_{1}$ .

In optimization problems, non-convex ℓp-norm minimization (for 0≤ p <1) often yields better results than convex ℓ1-norm minimization. This is because it more effectively promotes sparsity, leading to more accurate solutions in certain cases. Consequently, the structural sparsity term $R_{N L} (Z)$ proposed in Eq. [7] can be formulated as a concise and effective structural sparsity coding problem:

$R_{N L} (Z) = \sum_{i = 1}^{n} \frac{1}{2} {‖ Z_{i}^{(2)} - D_{G_{i}} A_{G_{i}} ‖}_{F}^{2} + τ {‖ W_{G_{i}} \circ A_{G_{i}} ‖}_{p}$ [12]

The symbol $\circ$ denotes the Hadamard product of two matrices and ${‖ \cdot ‖}_{p}$ signifies the ℓp-norm (0< p ≤1). Here, we present an effective method for sparse modeling that addresses a weighted ℓp minimization problem, $W_{G_{i}}$ serves as a weight matrix for each nonlocal group $Z_{i}^{(2)}$ , enhancing the representation capability of the group’s sparse coefficients ${\hat{A}}_{G_{i}}$ . We will detail the method for determining the weight matrix $W_{G_{i}}$ in Eq. [29].

Optimization for the proposed model

Alternating direction method of multipliers solving scheme

Since the model proposed in Eq. [7] involves multiple variables, we have adopted an alternating minimization strategy to simplify the optimization process. By introducing an auxiliary variable $Y \in ℝ^{I_{1} \times I_{2} \times S}$ , we can simplify the model, allowing it to be rewritten as shown below:

$\min_{x, B, Z} (\frac{ρ}{2} {‖ X - Z \times_{3} B ‖}_{F}^{2} + λ R_{N L} (Z) + \sum_{s = 1}^{S} {‖ x_{s} ‖}_{T V}), s .t . \sum_{s = 1}^{S} {‖ A x_{s} - p_{s} ‖}_{2}^{2} \leq ε, x_{s} \geq 0, X = Y .$ [13]

Employing the augmented Lagrange function, we incorporate the equality constraints into the objective function. A convex set $Ω (ε) : = {x^{s} | {‖ A x^{s} - p^{s} ‖}_{2}^{2} \leq ε, x^{s} \in ℝ^{I_{1} \times I_{2}}}$ is defined for the inequality constraints, and the indicator function δ on the convex set Ω as:

$δ_{Ω} (x_{s}) : = {\begin{cases} 0, x_{s} \in Ω, \\ \infty, x_{s} \notin Ω . \end{cases}$ [14]

Then, the corresponding augmented Lagrangian function is:

$L (X, B, Z, Y; Λ) = \frac{ρ}{2} {‖ Y - Z \times_{3} B ‖}_{F}^{2} + λ R_{N L} (Z) + \sum_{s = 1}^{S} {‖ x_{s} ‖}_{T V} + \sum_{s = 1}^{S} δ_{Ω_{ε}} (x_{s}) + \frac{β}{2} {‖ X - Y + \frac{Λ}{β} ‖}_{F}^{2}$ [15]

where $Λ \in ℝ^{I_{1} \times I_{2} \times S}$ is Lagrange multipliers in the tensor norm, and β is the penalty coefficient. To address the problem outlined in Eq. [15], we decompose it into four sequential subproblems: the $X$ , $B$ , $Z$ , and $Y$ subproblems. The solution to each subproblem is contingent upon the outcome of the preceding one, forming a causal chain. The $X$ subproblem serves as the starting point, requiring the initial image $X^{0}$ and the projection data $p_{s}, s = 1, \dots, S$ . Subsequently, the $B$ and $Z$ subproblems are obtained through subspace decomposition of $X$ . Finally, the calculation formula for the $Y$ subproblem includes the solutions from $X$ , $B$ , and $Z$ . In the following section, we outline the solution methods for each subproblem.

(I) For the $X$ -subproblem

Considering an intermediate point $(X^{k}, Z^{k}, B^{k}, Y^{k})$ , the $X$ -subproblem can be expressed as follows:

$X^{k + 1} = \underset{X}{\arg \min} \sum_{s = 1}^{S} {‖ x_{s}^{k} ‖}_{T V} + \frac{β}{2} {‖ X^{k} - Y^{k} + \frac{Λ^{k}}{β} ‖}_{F}^{2} + \sum_{s = 1}^{S} δ_{Ω_{ε}} (x_{s}^{k})$ [16]

where $δ_{Ω_{ε}} (x_{s}^{k})$ is $0$ or $+ \infty$ based on the definition. To handle the indicator function, this problem is solved in two steps. Following the methodology outlined in (48), we initially address the problem presented in Eq. [16] without the indicator function $δ_{Ω_{ε}} (x_{s}^{k})$ . Considering that $X$ represents the tensor form of the MECT images ${x_{s}}_{s = 1}^{S}$ , the solution to the $X$ -subproblem is as follows:

$x_{s}^{k + \frac{1}{2}} = \underset{x_{s}}{\arg \min} {‖ x_{s}^{k} ‖}_{T V} + \frac{β}{2} {‖ x_{s}^{k} - y_{s}^{k} + \frac{λ_{s}^{k}}{β} ‖}_{2}^{2}, s = 1, 2, \dots, S$ [17]

where $y_{s}^{k}$ and $λ_{s}^{k}$ represent the s-th channel of the vectorized image for tensor $Y^{k}$ and $Λ^{k}$ , respectively. The initial phase involves TV denoising, where the input image is denoted as $y_{s}^{k} - λ^{k} / β$ . By applying the TV minimization technique detailed in (49), we derive the intermediate variable $x_{s}^{k + 1 / 2}$ . Subsequently, we assess if $x_{s}^{k + 1 / 2}$ lies within the convex set $Ω (ε)$ . If $x_{s}^{k + 1 / 2} \in Ω (ε)$ , then $x_{s}^{k + 1} = x_{s}^{k + 1 / 2}$ ; otherwise, we project $x_{s}^{k + 1 / 2}$ onto the convex set $Ω (ε)$ . To accomplish this projection, we employ the simultaneous ART (SART) to address the aforementioned task. Finally, we obtained the updated tensor $X^{k + 1}$ by stacking all channels of $x_{s}^{k + 1}$ :

$x_{s}^{k + 1} = {Proj}_{Ω (ε)} (x_{s}^{k + \frac{1}{2}})$ [18]

(II) For the $B$ -subproblem

$L (Z^{k + 1}, B^{k + 1}; Y^{k}) = \underset{B, Z}{\arg \min} \frac{β}{2} {‖ X^{k + 1} - Y^{k} + \frac{Λ^{k}}{β} ‖}_{F}^{2} + \frac{ρ}{2} {‖ Y^{k} - Z^{k} \times_{3} B^{k} ‖}_{F}^{2} + λ R_{N L} (Z^{k})$ [19]

Given the degenerated image $Y^{k}$ , the $B$ subproblem is reduced to:

$B^{k + 1} = \arg \min \frac{1}{2} {‖ Y^{k} - Z^{k} \times_{3} B^{k} ‖}_{F}^{2}$ [20]

The rank-M SVD of $Y^{k}$ is $U Σ V^{T}$ , then Eq. [20] has a closed form solution (50):

$B^{k + 1} = V, and {\bar{Z}}^{k + 1} = f o l d_{(3)} (U \sum)$ [21

(III) For the $Z$ -subproblem

By applying the GBM strategy (see section “Graph-based BM (GBM)”) to the obtained ${\bar{Z}}^{k + 1}$ in the B subproblem, we can construct nonlocal full-channel groups ${{\bar{Z}}_{i}}_{i = 1}^{n}$ . Subsequently, these nonlocal full channel groups are expanded into matrix forms ${\bar{Z}}_{i}^{(2)} \in ℝ^{c \times m}$ under the nonlocal mode. Utilizing a specially tailored adaptive LR dictionary $D_{G_{i}}$ (see section “Adaptive LR dictionary learning”) and integrating the non-locality-based structured sparsity regularization method $R_{N L} (Z)$ from Eq. [12], the $Z$ sub-problem is transformed into solving a structural sparse code problem, which is represented as follows:

${\hat{A}}_{G_{i}} = \underset{A_{G_{i}}}{\arg \min} {\frac{1}{2} {‖ {\bar{Z}}_{i}^{(2)} - D_{G_{i}} A_{G_{i}} ‖}_{F}^{2} + τ {‖ W_{G_{i}} \circ A_{G_{i}} ‖}_{p}} \forall i$ [22]

To effectively obtain the solution of Eq. [22], we initially introduce the following proposition.

Proposition 1. By letting ${\bar{Z}}_{i}^{(2)} \in ℝ^{c \times m}$ be the matrix form of the i-th (1≤ i ≤ n) non-local full-channel group ${\bar{Z}}_{i} \in ℝ^{b \times m \times M}$ , ${\bar{Q}}_{i}^{(2)} \in ℝ^{c \times m}$ is the approximation of ${\bar{Z}}_{i}^{(2)}$ , $D_{G_{i}}$ is the adaptive LR dictionary for ${\bar{Z}}_{i}^{(2)}$ , $R_{G_{i}}$ and $A_{G_{i}}$ are the sparse coefficients corresponding to ${\bar{Z}}_{i}^{(2)}$ and ${\bar{Q}}_{i}^{(2)}$ , then the following holds

${‖ {\bar{Z}}_{i}^{(2)} - {\bar{Q}}_{i}^{(2)} ‖}_{F}^{2} = {‖ R_{G_{i}} - A_{G_{i}} ‖}_{F}^{2}$ [23]

where ${\bar{Z}}_{i}^{(2)} = D_{G_{i}} R_{G_{i}}$ and ${\bar{Q}}_{i}^{(2)} = D_{G_{i}} A_{G_{i}}$ .

We have skipped the detailed proof of the proposition here; interested readers can consult reference (51) for a comprehensive explanation. Consequently, according to the proposition 1, the regularization model Eq. [22] can be reformulated as:

${\hat{A}}_{G_{i}} = \underset{A_{G_{i}}}{\arg \min} {\frac{1}{2} {‖ R_{G_{i}} - A_{G_{i}} ‖}_{F}^{2} + τ {‖ W_{G_{i}} \circ A_{G_{i}} ‖}_{p}} \forall i$ [24]

To efficiently solve Eq. [24], we employ the generalized soft-thresholding (GST) algorithm (51). Specifically, with given values of $R_{G_{i}}, W_{G_{i}}$ , and $p$ , we can derive a closed-form solution for Eq. [24] as follows:

${\hat{A}}_{G_{i}} = G S T (R_{G_{i}}, τ W_{G_{i}}, p) \forall i$ [25]

For further insight into the GST algorithm, please refer to citation (51). Upon acquiring the group sparse coefficient ${\hat{A}}_{G_{i}}$ by solving Eq. [25], we can obtain the non-local group ${\bar{Z}}_{i}^{(2)} = D_{G_{i}} {\hat{A}}_{G_{i}}$ . Subsequently, the latent clean non-local full-channel group ${\hat{Z}}_{i}$ can be attained by $f o l d_{(2)} ({\hat{Z}}_{i}^{(2)})$ . Upon acquiring all non-local full-channel groups ${{\hat{Z}}_{i}}_{i = 1}^{n}$ , the denoised coefficient image ${\hat{Z}}^{k + 1}$ (i.e., the solution of $Z$ sub-problem) can be reconstructed by repositioning the groups ${{\hat{Z}}_{i}}_{i = 1}^{n}$ to their original locations and averaging the overlapped pixels.

(IV) For the $Y$ -subproblem

After obtaining $B^{k + 1}, Z^{k + 1}$ , the $Y$ subproblem is transformed into solving the following constrained quadratic minimization problem:

$Y^{k + 1} = \underset{Y}{\arg \min} \frac{β}{2} {‖ X^{k + 1} - Y^{k} + \frac{Λ^{k}}{β} ‖}_{F}^{2} + \frac{ρ}{2} {‖ Y^{k} - Z^{k + 1} \times_{3} B^{k + 1} ‖}_{F}^{2}$ [26]

where has a closed-form solution, namely:

$Y^{k + 1} = {(β + η)}^{- 1} (β X^{k + 1} + Λ^{k} + ρ (Z^{k + 1} \times_{3} B^{k + 1}))$ [27]

Finally, the multipliers $Λ$ were updated as follows:

$Λ^{k + 1} = Λ^{k} + β (X^{k + 1} - Y^{k + 1})$ [28]

To enhance the efficiency of our algorithm, we opted to introduce the auxiliary variable $Y$ to solve problem Eq. [15]. The auxiliary variable $Y$ strengthens our approach in the following ways: firstly, it simplifies the resolution of subproblems, making each more manageable and easier to solve. Then, $Y$ improves the convergence speed and stability of the algorithm, ensuring that it reaches the optimal solution quickly and reliably. So far, we have obtained efficient solutions for each minimization sub-problem and have defined all the necessary parameter settings. Building upon the derivations presented earlier, a comprehensive description of the proposed algorithm SGNL_TV for MECT reconstruction is summarized in Table 1.

Table 1

Algorithm 1: the proposed SGNL_TV method for MECT reconstruction

Input: parameters σ, ρ, β, λ, ɛ,

K_{\max}

, projection data p_s, s =1, …, S

1. Initializing:

x_{s}^{0}

,

X^{0}, Y^{0}, Λ^{0}, B^{0}, Z^{0}, k = 0

, and the dimension of the subspace M

2. While not converged and

k \leq K_{\max}

do

3. Updating

x_{s}^{k + 1 / 2}

by TV minimization based on Eq. [17]

4. Updating

x_{s}^{k + 1}

by SART algorithm based on Eq. [18]

5. Formulating

X^{k + 1}

by stacking all channels of

x_{s}^{k + 1}

6. Updating the subspace basis

B^{k + 1}

and the reduced-dimensional eigenimage tensor

{\bar{Z}}^{k + 1}

based on Eq. [21]

7. For each full-channel patch

{\bar{Z}}_{f b p}^{k + 1}

in

{\bar{Z}}^{k + 1}

do

1) Construct nonlocal full channel group

{\bar{Z}}_{i}^{k + 1}

using Eq. [9]

2) Flatten the nonlocal full channel group

{\bar{Z}}_{i}^{k + 1}

into matrix

{\bar{Z}}_{i, k + 1}^{(2)}

3) Obtain the corresponding low-rank dictionary

D_{G_{i}}

from

{\bar{Z}}_{i, k + 1}^{(2)}

using Eq. [11]

4) Update τ by computing Eq. [30]

5) Update

W_{G_{i}}

by computing Eq. [29]

6) Estimate

{\hat{A}}_{G_{i}}

by computing Eq. [25]

7) Get the estimation

{\hat{Z}}_{G_{i}} = D_{G_{i}} {\hat{A}}_{G_{i}}

8) Attain the nonlocal full channel group

{\hat{\bar{Z}}}_{i}^{k + 1}

by

{fold}_{(2)} ({\hat{\bar{Z}}}_{i, k + 1}^{(2)})

8. End for

9. Aggregate all

{{\hat{Z}}_{i}}_{i = 1}^{n}

to form the latent eigenimage tensor

{\bar{Z}}^{k + 1}

10. Updating

Y^{k + 1}

by solving equation [27] via ADMM

11. Updating Lagrange multipliers

Λ^{k + 1}

via equation Eq. [28]

12.

k = k + 1

13. End while

14. Return the recovered tensor

X \leftarrow X^{k + 1}

SGNL_TV, subspace-assisted globe nonlocal sparsity and local TV regularization; MECT, multi-energy computed tomography; TV, total variation; SART, simultaneous algebraic reconstruction technique; ADMM, alternating direction method of multipliers.

Implementation of the proposed algorithm

In this section, we introduce several advantageous strategies, such as weight adjustment and regularization parameter tuning, to enhance the effectiveness of the proposed SGNL_TV algorithm and improve its overall performance.

(I) Weight setting

Considering that significant textures and edges are often represented by high values in the ${\hat{A}}_{G_{i}}$ , it is essential to preserve these details to capture the spatial information of the original image effectively. To achieve this, it is recommended to apply less shrinkage to larger values while shrinking smaller values more (52). As a result, the weight

$W_{G_{i}} (k, j) = \frac{1}{| R_{G_{i}} (k, j) | + ε^{'}}$ [29]

where ɛ' is a small positive constant employed to prevent division by zero.

(II) Regularization parameter

Similar to many regularization-based algorithms, the proposed method may require fine-tuning of parameters to achieve optimal reconstruction results. This study offers a comprehensive guide for selecting parameters tailored to each specific subproblem, ensuring the production of high-quality reconstructions.

For the solution to the $X$ subproblem, we find that setting the iteration number for TV denoising within the range of 5 to 10 iterations, as mentioned in line 3 of algorithm 1 (Table 1), is optimal based on our accumulated experience. The magnitude of the TV algorithm’s step size dictates its denoising capability, which is generally set between 0.01 and 0.05 depending on the noise level. Increasing the number of iterations and the TV step size results in a smoother image but may compromise the preservation of fine details. A larger penalty parameter β value intensifies the TV denoising effect, and a straightforward method to ascertain the optimal value is to experiment with it ranging from 1 to a higher number before evaluating the reconstruction outcomes. Theoretically, the operation of ${Proj}_{Ω (ε)}$ described in line 4 should be iterated indefinitely. However, in our implementation, we execute this operation only once per iteration to optimize computation time.

To solve the $Z$ subproblem effectively, it is crucial to pre-estimate pixel variances σ, as it dictates the strength of the global non-local prior regularization. For the majority of MECT reconstructions, values between 0.001 and 0.015 are considered appropriate. Given the specified value of σ, the related hyperparameters that require adjustment encompass the size of image blocks b, non-local similar patches m, search window size W, the power value p, and iterative regularization parameters ρ.

To enhance the stability of the proposed algorithm, we dynamically calculate the regularization parameter τ of Eq. [22] in each iteration to strike a balance between the fidelity term and the regularization term. Drawing inspiration from (53), τ is set as follows:

$τ = \frac{2 \sqrt{2} σ^{2}}{φ_{i} + ε^{″}}$ [30]

σ is the predetermined noise standard deviation, φ_i represents the estimated standard variance of $R_{G_{i}}$ , and ɛ'' is a small positive constant.

Results

In this section, we validate the effectiveness of the proposed algorithm through numerical experiments. For clarity, we refer to our algorithm as SGNL_TV throughout this article. The experiments utilize simulated walnut data and real multi-level mouse data, as depicted in Figure 2. We selected five representative spectral CT reconstruction algorithms as the comparative algorithms in this study: the FBP technique, a traditional analytical reconstruction approach. The TV-based method (54), which is presented to showcase its effectiveness in noise reduction across individual channels in the CT image. The latest subspace-based MECT reconstruction method proposed in article (27), for the convenience of the following discussion, we denote this method as subspace decomposition coming block-matching method (SBM_L0) based on its technological aspects. The FTNN method (22), which introduces the FTNN to construct regularization, is a tensor LR method that integrates global and non-local prior through image BM. Finally, the ITS_TV method (23), an MECT reconstruction model that integrates intrinsic tensor sparsity and TV regularization simultaneously. Optimal parameter values are selected for each method to ensure the fairness of comparison. To enhance readers’ understanding of the reference algorithms, we provide a concise overview of the models and algorithm design details of the comparative algorithms in Appendix 1. Image quality is assessed using the root mean square error (RMSE) along with two commonly used image quality metrics: peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Additionally, to further corroborate the algorithm’s performance, post-reconstruction material decomposition (55) is conducted.

Figure 2 The experimental phantom and the spectrum data. (A) Digital walnut phantom with three materials: bone, tissue, and iodine; (B) normalized X-ray spectrum with six divided energy bins of digital walnut phantom; (C) the reference image with the SART result; (D) real mouse data slice. SART, simultaneous algebraic reconstruction technique.

Optimal parameter selection is essential for achieving the highest quality reconstruction results with the proposed method. However, finding the theoretically optimal values is a complex problem. Therefore, we empirically selected them according to experiments. Guided by the insights from visual experiment results and quality metrics, the iteration number T for TV denoising is set to 5, the step size γ of TV is 0.03, and penalty parameter β =1. For the global nonlocal regularization involved, the parameters include pixel variances σ =1/255, the patch size b =25, non-local similar patches m =140, the search window size W =40, the power value p =0.95, and iterative regularization parameters ρ =1 in all experiments. The dimension M of the subspace is another important parameter, which is generally directly related to the number of channels in the MECT image. We select M =5 for simulating walnut data with six channels, and M =3 for the four-channel real mouse data. Additionally, the respective authors have provided the implementation source code for the comparative methods [SBM_L0 (27), FTNN (22), ITS_TV (23)]. Therefore, to ensure the fairness of the comparison, their parameters have been empirically optimized and adjusted under various data conditions. All the initial images are set to zero for all methods in experiments and the maximum iteration number for all iterative algorithms is set to 100. All the reconstruction methods reach the best results, and the corresponding optimized parameters are listed in Table 2 (the symbols are consistent with the reference).

Table 2

Parameters of all the methods

Methods	Numerical simulation	Real data
TV	λ =0.4	λ =0.1
SBM_L0	λ =1.8×10⁻⁴, β =6, ρ =1.1	λ =1.8×10⁻⁶, β =10, ρ =1.1
SBM_L0	The subspace dimension M =3	The subspace dimension M =3
FTNN	β =100	β =400
ITS_TV	γ =0.05, T_iter =5, σ =0.03	γ =0.05, T_iter =5, σ =0.005
SGNL_TV	γ =0.03, T_iter =5, σ =1/255, p =0.95	γ =0.03, T_iter =5, σ =1/255, p =0.95
SGNL_TV	The subspace dimension M =5, β = ρ =1	The subspace dimension M =3, β = ρ =1

TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization.

Numerical simulation

As shown in Figure 2A, a digital phantom comprising 512×512 image pixels is initially constructed based on the walnut data in (56). This phantom includes three materials: bone, tissue, and iodine, with the iodine contrast agent concentration set at 15 mg/mL. The mass attenuation coefficients of the basic materials are sourced from the National Institute of Standards and Technology database (https://physics.nist.gov/PhyRefData/xRayMassCoef/tab4.html). Figure 2B displays the normalized X-ray spectrum, generated using SpekCalc software (https://spekcalc.weebly.com/) with a 1 keV energy sampling interval and a tube voltage of 50 kVp. For MECT reconstruction, the X-ray spectrum was segmented into six energy bins: [20, 25), [25, 30), [30, 35), [35, 40), [40, 45), and [45, 50) keV. The scanning distances from the source to the object and detector are set to 1,000 and 1,500 mm, respectively. The scanning angle spans 360° with a 0.5° increment. Projections for each view are collected using a linear detector with 1,024 bins, each bin measuring 0.388 mm in size. Poisson noise is added to the projections to simulate image noise, as detailed below:

$p_{s i} = \frac{λ^{k}}{k!} e^{- λ}, λ = I_{0} (\exp (- p_{s 0}))$ [31]

where I₀ refers to the quantity of incoming X-ray photons, $k$ signifies the identifier for the projection detector, $p_{s 0}$ and $p_{s i}$ represents the logarithmic projection data and the count of added noise photons detected by the i-th detector unit, respectively. For this study, we have set I₀=1×10⁴. The reconstructed image (Figure 2C) obtained using the SART method under noise-free projection conditions serves as the reference image for subsequent MECT analysis. We have selected the region of interest (ROI) marked by the yellow square on this reconstructed image for detailed comparisons between different methods, and also presented the profile views of the red lines within.

The reconstruction results of digital walnut phantom and their corresponding difference images from different methods are shown in Figure 3, where columns (A) to (F) represent the outcomes achieved by FBP, TV, SBM_L0, FTNN, ITS_TV, and the proposed SGNL_TV method. Rows 1 to 3 represent the reconstruction results of three representative energy bins (1st, 3rd, and 6th) by different methods. Rows 4 to 6 represent the difference images of the results in the first three rows for the reference image reconstructed using the SART algorithm from noise-free projections. To compare image quality, many combinations of the parameters are tested for the proposed and competing methods, and the best results in terms of RMSE are selected for further comparison. In addition, the extracted ROI denoted by the yellow square in Figure 2C is magnified for detailed comparisons across different methods in Figure 4. As shown in Figures 3,4, the image quality from FBP imaging is significantly compromised due to reconstruction noise interference, which obscures the ability to discern internal structures. Although the TV method effectively mitigates this reconstruction noise to produce clearer CT images, it still falls short in retaining intricate details and subtle features. Compared to the FBP and TV methods, the SBM_L0, FTNN, and ITS_TV methods not only enhance structural representation but also markedly reduce noise. However, as indicated by the red arrow in Figure 4, these methods often overlook the finer details of the object. In contrast to these methods, our proposed SGNL_TV technique stands out by effectively mitigating noise artifacts and adeptly preserving fine structural details.

Figure 3 Numerical simulation result with the walnut phantom. The first three rows show the reconstruction images. From the (a) to (f) columns, the images are reconstructed from a noisy dataset using FBP, TV, SBM_L0, FTNN, ITS_TV, and the proposed algorithm SGNL_TV respectively. From the top to the third row, the images correspond to three representative channels (1st, 3rd, 6th), the corresponding display windows are [0, 0.13], [0, 0.08] and [0, 0.08] mm⁻¹, respectively. Although the last three rows show the difference images of the results in the first three rows for the noise-free SART reconstruction images, the rows and columns correspond to that of the first three rows, and the display window is [−0.08, 0.08] mm⁻¹. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization; SART, simultaneous algebraic reconstruction technique.

Figure 4 Reconstructed image of magnified ROIs (denoted by the yellow rectangle in Figure 2C). From left to right, columns represent the reconstructions of reference, FBP, TV, SBM_L0, FTNN, ITS_TV, and SGNL_TV method, respectively. Rows 1 to 3 denote the first, third, and sixth energy channels and display windows are [0, 0.13], [0, 0.08], and [0, 0.08] mm⁻¹, respectively. The red arrows are used to indicate the comparison between algorithm ITS_TV and algorithm SGNL_TV in terms of image details. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization; ROIs, regions of interest.

Table 3 shows the quantitative assessments of the reconstructed results by different methods; it is observed that the proposed SGNL_TV outperforms other compared methods across all quantitative quality metrics. The only exception is that the SSIM value for the first energy bin in the proposed method is slightly lower than that of the ITS_TV method, which may be due to the presence of subtle artifacts in the smooth region in the first energy channel by the proposed algorithm. Notably, the represented algorithm achieves the highest PSNR, SSIM and the lowest RMSE among all the competing methods for a full energy bin. In average PSNR gains of the proposed SGNL_TV over the FBP, TV, SBM_L0, FTNN, and ITS_TV methods are 18.19, 15.83, 8.42, 4.75, and 1.50 dB, respectively. Meanwhile, the proposed SGNL_TV enjoys a RMSE reduction over FBP by 87.74%, over TV by 86.88%, over SBM_L0 by 67.01%, over FTNN by 46.42%, and over ITS_TV by 13.51%.

Table 3

Quality results for the reconstructed walnut phantom by different methods

Methods	Reconstructed images (energy bins)
Methods	Bin 1	Bin 2	Bin 3	Bin 4	Bin 5	Bin 6	Full bin
RMSE
FBP	0.0446	0.0265	0.0197	0.0170	0.0172	0.0210	0.0261
TV	0.0470	0.0260	0.0180	0.0131	0.0104	0.0092	0.0244
SBM_L0	0.0185	0.0102	0.0072	0.0057	0.0049	0.0046	0.0097
FTNN	0.0098	0.0056	0.0040	0.0036	0.0037	0.0043	0.0056
ITS_TV	0.0061	0.0039	0.0027	0.0025	0.0027	0.0030	0.0037
SGNL_TV	0.0055	0.0033	0.0023	0.0022	0.0022	0.0023	0.0032
PSNR
FBP	27.0111	31.5190	34.0886	35.3773	35.2486	33.5263	32.7951
TV	26.5471	31.6727	34.8586	37.6126	39.5853	40.6494	35.1542
SBM_L0	34.6768	39.8457	42.8929	44.9165	46.2419	46.8074	42.5635
FTNN	40.1238	45.0110	47.7954	48.7214	48.5354	47.2136	46.2334
ITS_TV	44.2243	48.0288	51.2590	51.7981	51.3023	50.2775	49.4816
SGNL_TV	45.1226	49.5569	52.4091	52.9932	53.1104	52.7272	50.9866
SSIM
FBP	0.3649	0.5547	0.6674	0.7382	0.7642	0.7640	0.6422
TV	0.9378	0.9498	0.9618	0.9660	0.9664	0.9672	0.9582
SBM_L0	0.9716	0.9860	0.9894	0.9914	0.9917	0.9846	0.9858
FTNN	0.9714	0.9898	0.9946	0.9945	0.9943	0.9927	0.9896
ITS_TV	0.9882	0.9898	0.9955	0.9955	0.9955	0.9939	0.9931
SGNL_TV	0.9855	0.9927	0.9961	0.9963	0.9965	0.9958	0.9939

RMSE, root mean squared error; PSNR, peak signal-to-noise ratio; SSIM, structural similarity; FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization.

Line profiles between the 180th and 290th pixels along the red dashed line in Figure 2C are depicted in Figure 5. It is evident that the line profiles of FBP and TV exhibit significant fluctuations, deviating notably from the true values. Conversely, SBM_L0, FTNN and ITS_TV demonstrate improved results, albeit with some areas remaining incompletely restored. Notably, SGNL_TV line profiles closely align with the true values, displaying minimal deviation from the ground truth. Moreover, SGNL_TV outperforms other algorithms, especially in regions characterized by intricate structures (indicated by the black arrows).

Figure 5 Reconstruction line profiles from different methods. Top-to-bottom rows correspond to results from six channels. Results of images from the black (reference), gray (FBP), green (TV), purple (subspace-based), pink (FTNN), blue (ITS_TV), and proposed SGNL_TV methods (red lines) are also shown. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization.

The decomposition results of different reconstruction methods are shown in Figure 6. As can be seen, the decompositions of the FBP, TV, and SBM_L0 methods still have obvious noise, indicating a limited capacity for maintaining clarity in the internal structure. The FTNN and ITS_TV methods obtain improved decomposition results, but there are still some fine tissue structures lost. Conversely, the SGNL_TV method we propose delivers superior decomposition outcomes compared to all other algorithms, further confirming its effectiveness in MECT reconstruction.

Figure 6 Materials decomposition results based on the reconstruction images from different methods. The first, second, and fourth rows represent the basis material image of bone, tissue, and iodine, whereas the third row represents the magnified ROI on tissue material images (indicated by a red rectangle), respectively. From (a) to (f), the columns represent the decomposition results based on the reference and the reconstructed images with the FBP, TV, SBM_L0, FTNN, ITS_TV, and the proposed SGNL_TV method. The display windows of all figures are [0.1, 1.0] mm⁻¹, respectively. ROI, region of interest; FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization.

Computational cost is also a crucial evaluation factor for algorithm design. In this work, all the algorithms are programmed by Matlab (2022a; MathWorks, Natick, MA, USA) on an Intel (R) Core (TM) I9-9900 CPU, 3.10 GHz, 32 GB of RAM, and PC platform. Table 4 presents the time consumption and physical memory usage for a single iteration of each iterative comparison algorithm. The TV algorithm requires less time because it does not involve tensor block extraction operations. The SBM_L0 method utilizes subspace decomposition techniques, hence requiring the least amount of time compared to others. The FTNN, ITS_TV, and the proposed SGNL_TV method all involve tensor block extraction, resulting in higher time consumption. Although these three algorithms show similar performance visually in image reconstruction quality, and they also occupy memory at the same level, the presented SGNL_TV demonstrates a significant advantage in terms of time consumption over the other two algorithms, offering more practical value.

Table 4

Computational costs for one iteration and the memory usage of iterative reconstruction methods

Methods	TV	SBM_L0	FTNN	ITS_TV	SGNL_TV
Time costs (s)	25.25	17.30	410.8	125.9	32.57
Physical memory (GB)	8.4	9.7	17.5	17.9	17.1

TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted global, nonlocal sparsity and local TV regularization.

In this subsection, we conducted experiments on projection datasets at various noise levels to further illustrate the practicality of our algorithm. The noise model is the same as Eq. [31], but the initial number of photons I0 is set to 5×10³; the parameters of the algorithm we proposed and the comparative algorithms remain unchanged. The maximum number of iterations for each iterative algorithm is set to 100 rounds, and the best results of each algorithm are selected based on the quantitative metric RMSE.

The visual comparison results of digital simulated walnut and their corresponding difference images with high noise I₀ =5×10³ are shown in Figure 7. To more clearly display the reconstruction outcomes of all competing methods, we have presented sub-regions of each denoised image in Figure 8 for comparison. Table 5 provides quantitative assessment metrics for the reconstruction results of different algorithms, where RMSE, PSNR, and SSIM metrics are averaged over six channels. We can observe that due to the increase in projection data noise, the image quality and evaluation metrics generated by different algorithms have degraded to some extent compared to the results under low noise (I₀ =1×10⁴) conditions. According to the difference images in Figure 7 and the areas indicated by the yellow arrows in Figure 8, the proposed SGNL_TV algorithm still maintains its advantage in noise suppression and detail preservation.

Figure 7 Numerical simulation result with high noise I₀ =5×10³. The first three rows show the reconstruction images. From columns (a) to (f), the images are reconstructed from the noisy dataset using FBP, TV, SBM_L0, FTNN, ITS_TV, and the proposed algorithm SGNL_TV, respectively. From the top to the third row, the images correspond to three representative channels (1st, 3rd, 6th), the corresponding display windows are [0, 0.13], [0, 0.08], and [0, 0.08] mm⁻¹, respectively. While the last three rows show the difference images of the results in the first three rows for the noise-free SART reconstruction images, the rows and columns correspond to that of the first three rows, and the display window is [−0.08, 0.08] mm⁻¹. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization; SART, simultaneous algebraic reconstruction technique.

Figure 8 Reconstructed image of magnified ROI with the high noise (denoted by the yellow rectangle in Figure 2C). From left to right columns represent the reconstructions of reference, FBP, TV, SBM_L0, FTNN, ITS_TV, and SGNL_TV method, respectively. Rows 1 to 3 denote the first, third, and sixth energy channels and the display window are [0, 0.13], [0, 0.08], [0, 0.08] mm⁻¹, respectively. The yellow arrows are used to indicate the comparison between algorithm ITS_TV and algorithm SGNL_TV in terms of image details. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted globe, nonlocal sparsity and local TV regularization; ROI, region of interest.

Table 5

Quality results for the walnut phantom by different methods with the high noise

Methods	Noise level N₀ =5×10³
Methods	RMSE	PSNR	SSIM
FBP	0.0381	30.21	0.5100
TV	0.0244	35.14	0.9564
SBM_L0	0.0118	41.12	0.9775
FTNN	0.0072	44.66	0.9733
ITS_TV	0.0043	49.04	0.9879
SGNL_TV	0.0042	49.04	0.9928

RMSE, root mean squared error; PSNR, peak signal-to-noise ratio; SSIM, structural similarity; FBP, filtered back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with block-matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted global nonlocal sparsity and local TV regularization.

Real data experiments

The proposed method was further tested using actual mouse data obtained from a spectral CT system developed by the Henan Key Laboratory of Imaging and Intelligent Processing, Information Engineering University, as illustrated in Figure 2D. Experiments were performed under a project license (No. IHEPLLSC202006) granted by the Ethics Committee of Information Engineering University, in compliance with Information Engineering University guidelines for the care and use of animals.The tube parameters are configured at 60 kVp and 72 mAs, with energy thresholds set at 12, 26, 34, and 42 keV. MECT reconstruction is conducted using four energy bins. The source-to-object and source-to-detector distances were 200.8 and 362.8 mm, respectively. The detector, with a resolution of 0.4×0.4 mm, was configured into a 512×15 bin array. We captured 1,080 projections spanning 360°, from which the central slice of each was extracted for two-dimensional (2D) spectral CT image reconstruction. The dimensions of each channel’s image were 512×512 pixels. For a clear assessment of the mouse body, the reconstructed images from all methods are displayed horizontally with a resolution of 235×300 pixels. This study encompasses CT images from four different channels. To assess and compare the noise reduction and detail preservation capabilities of the various methods, a ROI in the mouse trunk has been magnified. Additionally, to substantiate the proposed methods’ efficacy, the decomposition of the three materials in the real mouse data is also performed.

Figure 9 illustrates the reconstruction outcomes of real mouse data obtained through different methods. The columns, from left to right, represent the results of FBP, TV, SBM_L0, FTNN, ITS_TV, and the proposed SGNL_TV method. Rows 1 to 4 correspond to the four spectral channels. To provide a more in-depth view of the structures, the ROI, selected from the yellow square in Figure 9, a1, is magnified in Figure 10. As shown in Figures 9,10, it is evident that the FBP method exhibits significant noise, likely stemming from scanning artifacts. This noise greatly hampers the clarity of details within the ROI, rendering them nearly indiscernible. Although the TV method can reduce noise, it struggles to reconstruct the intricate details within the ROI. On the other hand, the SBM_L0, FTNN, and ITS_TV methods excel in preserving image details and enhancing reconstruction quality compared to the former two methods. Nonetheless, some edge details appear blurred, impacting the material decomposition process. In contrast, the proposed SGNL_TV method proves more adept at preserving details and suppressing noise when compared to all the aforementioned methods.

Figure 9 Reconstruction results of mouse data from different methods. The reconstruction results of the FBP, TV, SBM_L0, FTNN, ITS_TV, and SGNL_TV are depicted from column (a) to (f). Rows 1 to 4 denote the first to the fourth channel of the reconstructed images. The display windows are [0, 0.08], [0, 0.08], [0, 0.07] and [0, 0.06] mm⁻¹, respectively. The ROI outlined by a yellow dashed rectangle in (a1) is used for detailed comparison, which will be shown in Figure 10. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted global nonlocal sparsity and local TV regularization; ROI, region of interest.

Figure 10 Real mouse data reconstruction ROI results (denoted by the yellow rectangle in Figure 9, a1) from different algorithms. Columns (a) to (f) represent the reconstructions of the FBP, TV, SBM_L0, FTNN, ITS_TV, and SGNL_TV methods, respectively. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted global nonlocal sparsity and local TV regularization; ROI, region of interest.

The material decompositions of reconstructed mouse images using different methods are illustrated in Figure 11. The noise present in the reconstructed image from the FBP method is notably amplified. Although the TV-based method effectively diminishes noise, numerous artifacts are visible in the three decomposed materials, hindering a clear distinction of tissue details. Although the SBM_L0 method has achieved visually acceptable reconstruction results in Figure 9, the material decomposition results are not very satisfactory. The FTNN and ITS_TV achieve relatively satisfactory decomposition results. However, artifacts persist at the edges of the decomposed bones, and the tissue decomposition lacks clarity, with some blurriness along the edges. In comparison to the aforementioned methods, the proposed approach showcases superior image quality in the material decomposition process. It excels in reducing noise and preserving the internal tissue structure of the mouse, outperforming the other methods in this regard.

Figure 11 Real mouse data decomposition results from different algorithms. Basis material images of bone, tissue, and iodine are depicted in rows 1 to 4, and the display window of bone, tissue, and iodine material is [0, 0.8], [0.2, 1.2], and [0, 1.0] mm⁻¹, respectively. Columns from (a) to (f) represent the reconstruction of the FBP, TV, SBM_L0, FTNN, ITS_TV, and SGNL_TV methods, respectively. FBP, filtered-back projection; TV, total variation; SBM_L0, subspace decomposition coming block-matching method; FTNN, framelet tensor sparsity with blocking matching method; ITS_TV, tensor nonlocal similarity and local TV sparsity method; SGNL_TV, subspace-assisted global nonlocal sparsity and local TV regularization.

Discussion

This section provides a detailed account of the key factors that influenced the implementation of the proposed algorithm, such as the dimensions of subspace, GBM, non-convex ℓp minimization, the influence of the pixel variances σ, and the convergence analysis.

Effectiveness of the dimensions of subspace

The impact of the subspace dimensions M on the MECT reconstruction is examined using simulated walnut data with six energy channels. The analysis includes a detailed evaluation of RMSE, SSIM, PSNR, and computational costs across different subspace dimensions, as depicted in Figure 12. Notably, the quantitative evaluation metrics exhibit notably inferior performance when the subspace dimension is set to 1, hence these results are omitted. To further illustrate the influence of different subspace dimensions on the quality of reconstruction, Figure 13 presents the reconstruction images and corresponding difference images of the representative channel (3-th channel) for different subspace dimensions. As depicted in Figure 12 and Figure 13, it becomes apparent that with an increase in subspace dimension, the image quality shows improvement. At M =5, both PSNR and SSIM peaks, although RMSE slightly lags compared to M =6. Taking into account time consumption and quantization results, in the simulation experiment, a subspace dimension of 5 is the best choice; this results in superior performance across all metrics when compared to the algorithms mentioned above. However, in practice, when the dimension is greater than or equal to 3, the reconstruction quality and the corresponding decomposed material image have already achieved visually pleasing results.

Figure 12 RMSE, SSIM, and PSNR curves, and computational cost (unit: seconds) for 1 iteration with different dimensions of subspace. RMSE, root mean squared error; PSNR, peak signal-to-noise ratio; SSIM, structural similarity.

Figure 13 The reconstruction images and corresponding difference images of the representative channel (3-th channel) for different dimensions of subspace. From (a) to (f) are the reference, M =2, M =3, M =4, M =5, and M =6. The display windows are [0, 0.08], and [−0.05, 0.05] mm⁻¹, respectively.

Effectiveness of the proposed GBM and ℓp LR regularization

To validate the performance of the GBM coupled with ℓp LR regularization (57), we have developed four variations of our proposed approach: (I) without GBM and p =1; (II) without GBM and p =0.95; (III) with GBM and p =1; and (IV) with GBM and p =0.95 (proposed). The PSNR, SSIM, and RMSE comparisons of different versions are illustrated in Table 6. It indicates that performing BM in the graph domain and ℓp (0< p ≤1) LR regularization can preserve image structures, enabling the high performance of the whole framework, indicating a promising potential for MECT image reconstruction.

Table 6

Comparison for the effects of the proposed prior

Methods	RMSE	PSNR	SSIM
p =1, BM	0.0033	50.6838	0.9931
p =1, GBM	0.0033	50.7421	0.9934
p =0.95, BM	0.0033	50.9179	0.9937
p =0.95, GBM (proposed)	0.0032	50.9866	0.9939

RMSE, root mean squared error; PSNR, peak signal-to-noise ratio; SSIM, structural similarity; BM, block-matching; GBM, graph-based block-matching.

Effectiveness of the pixel variances σ

In this subsection, we use the reconstructed image from the first channel of the walnut dataset to assess how different parameters σ affect the reconstruction outcomes. The parameter σ is set at 0.1/255, 0.5/255, 1/255, 2/255, 3/255, 5/255. Figure 14 and Figure 15 respectively display the convergence lines of RMSE and the reconstruction walnut image of the proposed method at different σ values. Obviously, a large value of σ will make the image too smooth, leading to a loss of image details. When σ=1/255, both the reconstructed image and the RMSE curve reach their optimal values.

Figure 14 Convergence lines of RMSE for the proposed method, with distinct colors representing results at various parameter settings. RMSE, root mean squared error.

Figure 15 Reconstruction results from the first channel of the walnut data with different parameter σ values. From left to right represents the results with parameter value 0.1/255, 0.5/255, 1/255, 2/255, 3/255, and 5/255, respectively. The second line represents the magnified ROI for detailed comparison. The display window of the images and their ROIs are [0, 0.12] mm⁻¹. ROI, region of interest.

Convergence analysis

Providing a theoretical convergence proof for our proposed SGNL_TV reconstruction algorithm is challenging, primarily due to the inclusion of the GBM operation and the non-convex ℓp minimization process (58). Consequently, we have relied solely on empirical evidence to illustrate the algorithm’s convergence behavior. Figure 12 displays the RMSE’s evolution across different subspace dimensions for walnut data for iteration numbers. Notably, with each iteration, all RMSE curves consistently show a monotonic decrease, indicating favorable convergence properties.

Conclusions

In this paper, we introduce a spectral CT reconstruction algorithm that leverages tensor-based global non-local similarity and spatial sparsity to surpass the limitations in PCDs. This innovative approach significantly enhances the quality of spectral CT reconstructions over traditional methods. It efficiently utilizes subspace decomposition to project multi-channel spectral CT data into a lower-dimensional eigenimage space. Furthermore, it captures the characteristics of the eigenimage tensor through a NSS prior, eliminating the need for block extraction across all channels in each iteration and significantly enhancing computational efficiency.

Although the algorithm excels in MECT image reconstruction, it may encounter challenges when applied in real-world scenarios. Firstly, the algorithm presented in this paper primarily focuses on research for noise reduction in photon-counting spectral CT. For other low-dose spectral CT reconstruction issues, such as sparse-angle spectral CT reconstruction, further validation and research are needed to assess the performance of the proposed algorithm. Secondly, the integration of two regularization techniques enhances the method’s effectiveness. Subspace-assisted global nonlocal sparsity regularization is crucial for capturing interchannel image similarity, playing a vital role in preserving detail in MECT reconstruction. TV regularization complements this by improving noise suppression in single-channel images, particularly for those with piecewise smoothness. In situations where there is access to large datasets and powerful computational capabilities, DL-based methods may be able to provide competitive results. In future work, we could replace the TV operation with a plug-and-play network (59,60) that leverages DL priors into our framework to reduce noise.

In summary, the algorithm presented in this paper effectively reduces the significant computational costs associated with spectral CT reconstruction, while enhancing the performance of photon-counting spectral CT image reconstruction and material decomposition quality.

Acknowledgments

Funding: This study was supported by the National Natural Science Foundation of China (Nos. 62271504, 62101596, and 62201616), Technology Innovation Leading Talent Project of Zhongyuan (No. 244200510015), and China Postdoctoral Science Foundation (No. 2023T160792).

Footnote

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1248/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Experiments were performed under a project license (No. IHEPLLSC202006) granted by the Ethics Committee of Information Engineering University, in compliance with Information Engineering University guidelines for the care and use of animals.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Wu Y, Ye Z, Chen J, Deng L, Song B. Photon Counting CT: Technical Principles, Clinical Applications, and Future Prospects. Acad Radiol 2023;30:2362-82. [Crossref] [PubMed]
Wu W, Chen P, Wang S, Vardhanabhuti V, Liu F, Yu H. Image-domain Material Decomposition for Spectral CT using a Generalized Dictionary Learning. IEEE Trans Radiat Plasma Med Sci 2021;5:537-47. [Crossref] [PubMed]
Shi Z, Kong F, Cheng M, Cao H, Ouyang S, Cao Q. Multi-energy CT material decomposition using graph model improved CNN. Med Biol Eng Comput 2024;62:1213-28. [Crossref] [PubMed]
Jumanazarov D, Alimova A, Abdikarimov A, Koo J, Poulsen HF, Olsen UL, Iovea M. Material classification using basis material decomposition from spectral X-ray CT. Nucl Instrum Methods Phys Res A 2023;1056:168637.
Flohr T, Petersilka M, Henning A, Ulzheimer S, Ferda J, Schmidt B. Photon-counting CT review. Phys Med 2020;79:126-36. [Crossref] [PubMed]
Zhang L, Vandenberghe S, Staelens S, Lemahieu I. A penalized algebraic reconstruction technique (pART) for PET image reconstruction. In: Proceedings of the 2007 IEEE Nuclear Science Symposium Conference Record; Honolulu, HI, USA. IEEE; 2007:3859-64.
Liu J, Ding H, Molloi S, Zhang X, Gao H. TICMR: Total Image Constrained Material Reconstruction via Nonlocal Total Variation Regularization for Spectral CT. IEEE Trans Med Imaging 2016;35:2578-86. [Crossref] [PubMed]
Zhao B, Ding H, Lu Y, Wang G, Zhao J, Molloi S. Dual-dictionary learning-based iterative image reconstruction for spectral computed tomography application. Phys Med Biol 2012;57:8217-29. [Crossref] [PubMed]
Zhao B, Gao H, Ding H, Molloi S. Tight-frame based iterative image reconstruction for spectral breast CT. Med Phys 2013;40:031905. [Crossref] [PubMed]
Li L, Chen Z. IEEE Trans Med Imaging 2015;34:716-28. [Crossref] [PubMed]
Kilmer ME, Martin CD. Factorization strategies for third order tensors. Linear Algebra Appl 2011;435:641-58.
Semerci O, Hao N, Kilmer ME, Miller EL. Tensor Based Formulation and Nuclear Norm Regularization for Multienergy Computed Tomography. IEEE Trans Image Process 2014;23:1678-93. [Crossref] [PubMed]
Rigie DS, La Rivière PJ. Joint Reconstruction of Multichannel, Spectral CT Data via Constrained Total Nuclear Variation Minimization. Phys Med Biol 2015;60:1741-62. [Crossref] [PubMed]
Zhang Y, Mou X, Wang G, Yu H. Tensor-Based Dictionary Learning for Spectral CT Reconstruction. IEEE Trans Med Imaging 2017;36:142-54. [Crossref] [PubMed]
Xie Q, Zhao Q, Meng D, Xu Z. Kronecker-Basis-Representation Based Tensor Sparsity and Its Applications to Tensor Recovery. IEEE Trans Pattern Anal Mach Intell 2018;40:1888-902. [Crossref] [PubMed]
Ge Q, Jing XY, Wu F, Wei ZH, Xiao L, Shao WZ, Yue D, Li HB. Structure-Based Low-Rank Model With Graph Nuclear Norm Regularization for Noise Removal. IEEE Trans Image Process 2017;26:3098-112. [Crossref] [PubMed]
Zhuang L, Fu X, Ng MK, Bioucas-Dias JM. Hyperspectral Image Denoising Based on Global and Nonlocal Low-Rank Factorizations. IEEE Trans Geosci Remote Sens 2021;59:10438-54.
Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans Image Process 2007;16:2080-95. [Crossref] [PubMed]
Xu Q, Liu H, Yu H, Wang G, Xing L. Dictionary learning based reconstruction with low-rank constraint for low dose spectral CT. Med Phys 2016;43:3701.
Xia W, Wu W, Niu S, Liu F, Zhou J, Yu H, Zhang Y, Spectral CT. Reconstruction—ASSIST: Aided by Self-Similarity in Image-Spectral Tensors. IEEE Trans Comput Imaging 2019;5:420-36.
Chen X, Xia W, Liu Y, Chen H, Zhou J, Zhang Y. Fourth- Order Nonlocal Tensor Decomposition Model For Spectral Computed Tomography. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI); 13-16 April 2021; Nice, France. IEEE; 2021:1841-5.
Yu X, Cai A, Wang L, Zheng Z, Wang Y, Wang Z, Li L, Yan B. Framelet tensor sparsity with block matching for spectral CT reconstruction. Med Phys 2022;49:2486-501. [Crossref] [PubMed]
Zhang W, Liang N, Wang Z, Cai A, Wang L, Tang C, Zheng Z, Li L, Yan B, Hu G. Multi-energy CT reconstruction using tensor nonlocal similarity and spatial sparsity regularization. Quant Imaging Med Surg 2020;10:1940-60. [Crossref] [PubMed]
Wu W, Liu F, Zhang Y, Wang Q, Yu H. Non-Local Low-Rank Cube-Based Tensor Factorization for Spectral CT Reconstruction. IEEE Trans Med Imaging 2019;38:1079-93. [Crossref] [PubMed]
Wu W, Hu D, An K, Wang S, Luo F. A High-Quality Photon-Counting CT Technique Based on Weight Adaptive Total-Variation and Image-Spectral Tensor Factorization for Small Animals Imaging. IEEE Trans Instrum Meas 2020;70:1-14.
Wang S, Wu W, Cai A, Xu Y, Vardhanabhuti V, Liu F, Yu H. Image-spectral decomposition extended-learning assisted by sparsity for multi-energy computed tomography reconstruction. Quant Imaging Med Surg 2023;13:610-30. [Crossref] [PubMed]
Yu X, Cai A, Li L, Jiao Z, Yan B. Low-dose spectral reconstruction with global, local, and nonlocal priors based on subspace decomposition. Quant Imaging Med Surg 2023;13:889-911. [Crossref] [PubMed]
Kyong Hwan Jin. McCann MT, Froustey E, Unser M. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Trans Image Process 2017;26:4509-22. [Crossref] [PubMed]
Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555:487-92. [Crossref] [PubMed]
Wu W, Hu D, Niu C, Broeke LV, Butler APH, Cao P, Atlas J, Chernoglazov A, Vardhanabhuti V, Wang G. Deep learning based spectral CT imaging. Neural Netw 2021;144:342-58. [Crossref] [PubMed]
Chen X, Xia W, Yang Z, Chen H, Liu Y, Zhou J, Wang Z, Chen Y, Wen B, Zhang Y. SOUL-Net: A Sparse and Low-Rank Unrolling Network for Spectral CT Image Reconstruction. IEEE Trans Neural Netw Learn Syst 2023; Epub ahead of print. [Crossref]
Wang Y, Ren J, Cai A, Wang S, Liang N, Li L, Yan B. Hybrid-Domain Integrative Transformer Iterative Network for Spectral CT Imaging. IEEE Transactions on Instrumentation and Measurement 2024;73:4504513.
Guo X, Li Y, Chang D, He P, Feng P, Yu H, Wu W. Spectral2Spectral: Image-spectral Similarity Assisted Spectral CT Deep Reconstruction without Reference. IEEE Trans Comput Imaging 2023;9:1031-42.
Zhuang L, Bioucas-Dias JM. Fast Hyperspectral Image Denoising and Inpainting Based on Low-Rank and Sparse Representations. IEEE J Sel Topics Appl Earth Obs Remote Sens 2018;11:730-42.
Sun L, He C, Zheng Y, Tang S. SLRL4D: Joint Restoration of Subspace Low-Rank Learning and Non-Local 4-D Transform Filtering for Hyperspectral Image. Remote Sens 2020;12:2979.
Zha Z, Wen B, Yuan X, Zhang J, Zhou J, Lu Y, Zhu C. Nonlocal Structured Sparsity Regularization Modeling for Hyperspectral Image Denoising. IEEE Trans Geosci Remote Sens 2023;61:1-16.
Kolda TG, Bader BW. Tensor Decompositions and Applications. SIAM Rev 2009;51:455-500.
Lin J, Huang TZ, Zhao XL, Jiang TX, Zhuang L. A tensor subspace representation-based method for hyperspectral image denoising. IEEE Trans Geosci Remote Sens 2020;58:7739-57.
Geng X, Ji L, Zhao Y, Wang F. A Small Target Detection Method for the Hyperspectral Image Based on Higher Order Singular Value Decomposition (HOSVD). IEEE Geosci Remote Sens Lett 2013;10:1305-8.
Mahmood SZ, Afzal H, Mufti MR, Akhtar N, Habib A, Hussain S. A Novel Method of Image Denoising: New Variant of Block Matching and 3D. J Med Imaging Health Inform 2020;10:2490-500.
Zhu W, Ding W, Xu J, Shi Y, Yin B. Hash-Based Block Matching for Screen Content Coding. IEEE Trans Multimedia 2015;17:935-44.
Dong W, Shi G, Ma Y, Li X. Image Restoration via Simultaneous Sparse Coding: Where Structured Sparsity Meets Gaussian Scale Mixture. Int J Comput Vis 2015;114:217-32.
Liu J, Yang W, Zhang X, Guo Z. Retrieval Compensated Group Structured Sparsity for Image Super-Resolution. IEEE Trans Multimedia 2016;19:302-16.
Zha Z, Yuan X, Zhou J, Zhu C, Wen B. Image Restoration via Simultaneous Nonlocal Self-Similarity Priors. IEEE Trans Image Process 2020;29:8561-76. [Crossref] [PubMed]
Hu W, Cheung G, Ortega A, Au OC. Multiresolution graph Fourier transform for compression of piecewise smooth images. IEEE Trans Image Process 2015;24:419-33. [Crossref] [PubMed]
Cheung G, Magli E, Tanaka Y, Ng MK. Graph Spectral Image Processing. Proceedings of the IEEE 2018;106:907-30.
Bioucas-Dias JM, Plaza A. Hyperspectral unmixing: geometrical, statistical, and sparse regression-based approaches. In: Image and signal processing for remote sensing XVI. Vol. 7830. SPIE; 2010:79-93.
Cai A, Li L, Zheng Z, Wang L, Yan B. Block-matching sparsity regularization-based image reconstruction for low dose computed tomography. Med Phys 2018;45:2439-52. [Crossref] [PubMed]
Yu H, Wang G. Compressed sensing based interior tomography. Phys Med Biol 2009;54:2791-805. [Crossref] [PubMed]
He W, Yao Q, Li C, Yokoya N, Zhao Q, Zhang H, Zhang L. Non-local Meets Global: An Integrated Paradigm for Hyperspectral Image Restoration. IEEE Trans Pattern Anal Mach Intell 2020;44:2089-107. [Crossref] [PubMed]
Zuo W, Meng D, Zhang L, Feng X, Zhang D. A generalized iterated shrinkage algorithm for non-convex sparse coding. 2013 IEEE International Conference on Computer Vision; 01-08 December 2013; Sydney, NSW, Australia. IEEE; 2013:217-24.
Candes EJ, Wakin MB, Boyd SP. Enhancing sparsity by reweighted ℓ1 minimization. J Fourier Anal Appl 2008;14:877-905.
Chang SG, Yu B, Vetterli M. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans Image Process 2000;9:1532-46. [Crossref] [PubMed]
Li C, Yin W, Jiang H, Zhang Y. An efficient augmented Lagrangian method with applications to total variation minimization. Comput Optim Appl 2013;56:507-30.
Mendonca PRS, Lamb P, Sahani DV. A flexible method for multimaterial decomposition of dual-energy CT images. IEEE Trans Med Imaging 2014;33:99-116. [Crossref] [PubMed]
Jørgensen JS, Sidky EY. How little data is enough? Phase-diagram analysis of sparsity-regularized X-ray computed tomography. Philos Trans A Math Phys Eng Sci 2015;373:20140387. [Crossref] [PubMed]
Mu J, Xiong R, Fan X, Liu D, Wu F, Gao W. Graph-Based Non-Convex Low-Rank Regularization for Image Compression Artifact Reduction. IEEE Trans Image Process 2020;29:5374-85. [Crossref] [PubMed]
Sleem OM, Ashour ME, Aybat NS, Lagoa CM. Lp quasi-norm minimization: algorithm and applications. Eurasip J. Adv. Signal Process 2024;24:22.
Li S, Jiang X, Tivnan M, Gang GJ, Shen Y, Stayman JW. CT reconstruction using diffusion posterior sampling conditioned on a nonlinear measurement model. J Med Imaging (Bellingham) 2024;11:043504. [Crossref] [PubMed]
Kazerouni A, Aghdam EK, Heidari M, Azad R, Fayyaz M, Hacihaliloglu I, Merhof D. Diffusion models in medical imaging: A comprehensive survey. Med Image Anal 2023;88:102846. [Crossref] [PubMed]

Cite this article as: Xing Q, Cai A, Zheng Z, Li L, Yan B. Enhancing photon-counting computed tomography reconstruction via subspace dictionary learning and spatial sparsity regularization. Quant Imaging Med Surg 2025;15(1):581-607. doi: 10.21037/qims-24-1248