Development and evaluation of a deep learning model for multi-frequency Gibbs artifact elimination

Lisong Dai; Dan Wang; Xin Mao; Zhenzhuang Miao; Lei Lu; Yuting Ling; Hanbo Tan; Zhaohui Li; Hongyu Guo; Xiaoyun Liang; Qin Xu; Yuehua Li

doi:10.21037/qims-24-1344

Original Article

Development and evaluation of a deep learning model for multi-frequency Gibbs artifact elimination

Lisong Dai^1# , Dan Wang^1#, Xin Mao², Zhenzhuang Miao³, Lei Lu³, Yuting Ling⁴, Hanbo Tan², Zhaohui Li⁵, Hongyu Guo³, Xiaoyun Liang⁴, Qin Xu³, Yuehua Li¹

¹Institute of Diagnostic and Interventional Radiology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; ²Department of Radiology, Peking University Third Hospital, Beijing, China; ³MRI R&D, Neusoft Medical Systems Co., Ltd., Shanghai, China; ⁴Institute of Research and Clinical Innovation, Neusoft Medical Systems Co., Ltd., Shanghai, China; ⁵Department of Radiology, Wuhan Hankou Hospital, Wuhan, China

Contributions: (I) Conception and design: L Dai, D Wang, Z Miao, L Lu; (II) Administrative support: Y Li, Q Xu; (III) Provision of study materials or patients: L Dai, D Wang, L Lu, Y Ling, Z Miao; (IV) Collection and assembly of data: L Dai, D Wang, X Mao, L Lu; (V) Data analysis and interpretation: L Dai, D Wang, L Lu, Y Ling, Z Miao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Yuehua Li, MD, PhD. Institute of Diagnostic and Interventional Radiology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, No. 600, Yishan Road, Xuhui District, Shanghai 200233, China. Email: liyuehua0529@163.com.

Background: Gibbs artifacts frequently occur as a result of truncation in the frequency domain (k-space). Gibbs artifacts can degrade image quality and may be misinterpreted as syrinx, thereby complicating the diagnosis. This study aimed to develop and evaluate a robust deep learning (DL) model that eliminates multi-frequency Gibbs artifacts.

Methods: We retrospectively collected 290,940 magnetic resonance imaging (MRI) images from 4,936 scans, encompassing 5 anatomical regions and 67 MRI sequences, to develop a DL model for Gibbs artifact removal. This model was trained using artificially generated Gibbs artifacts, featuring various truncation ratios as input data, and its performance in artifact removal was evaluated across different anatomical regions, MRI sequences, and levels of Gibbs artifact severity. For external validation, we prospectively collected data from 20 healthy adults and 10 syrinx patients, comparing radiologists’ diagnostic accuracy with area under the receiver operating characteristic curves (AUC) on images before and after artifact removal to assess the model’s impact on syrinx diagnosis.

Results: The images processed by our model demonstrated a statistically significantly higher image quality score than the original images and those processed by conventional filtering algorithms (all P<0.05). Moreover, the model enables greater confidence in identifying syrinx compared to the original images [AUC: 0.95, 95% confidence interval (CI): 0.92–0.99] versus 0.90 (95% CI: 0.86–0.95) (P=0.04).

Conclusions: The model demonstrates excellent performance and robustness in eliminating Gibbs artifacts and may hold the potential for improving syrinx identification.

Keywords: Gibbs artifact; artifact removal; deep learning (DL); convolutional neural network (CNN)

Submitted Jul 02, 2024. Accepted for publication Dec 28, 2024. Published online Jan 22, 2025.

doi: 10.21037/qims-24-1344

Introduction

The Gibbs artifact, also known as truncation or ringing artifacts, is a common type of artifact in magnetic resonance imaging (MRI) scans caused by signal processing. It appears as bright or dark striations (“ringing” effects) that run parallel to high-contrast tissue interfaces, such as the spinal cord-cerebrospinal fluid or brain-skull interface (1,2). The frequency distribution of an image with finite dimensions in the frequency domain, also known as k-space, is characterized by an infinite range. However, due to practical limitations, only a limited amount of k-space signal can be collected, resulting in signal truncation errors; this truncation gives rise to the manifestation of the Gibbs artifact.

Gibbs artifacts may lead to misinterpretations and inaccurate measurements of signals in clinical practice, resulting in flawed conclusions and complicating diagnosis (3). For example, the dark rim artifacts in contrast-enhanced dynamic myocardial imaging are caused by ringing and may impede the detection of mild perfusion defects in the subendocardium (4,5). In diffusion-weighted images, the oscillating intensity of these kinds of artifacts can potentially impact the quantification of diffusion-related parameters (6,7). In spinal MRI, the presence of Gibbs artifacts can result in a decline in image quality and be misidentified as syrinx, producing a false-positive diagnosis (8-10), thus compromising further treatment plans.

Gibbs ringing artifacts can be reduced by using several approaches. One approach is to use a full-sampling MRI strategy that increases the acquisition resolution at the expense of a significant increase in scanning time. Another approach is to multiply the acquired k-space signal by low-pass filters such as Tukey and Hanning windows, which, however, may reduce spatial resolution and lead to image blurring. Alternatively, more advanced extrapolation approaches have been developed without sacrificing fine image details (11,12). However, such approaches are more computationally intensive and less robust than their peers, relying solely on spatial information due to the complexity of k-space signals. Deep learning (DL) has been intensively explored and employed in the medical imaging field for image processing and analysis, including Gibbs-ringing artifact removal (13,14). Zhao et al. (15) introduced a convolutional neural network (CNN) derived from enhanced deep residual networks to suppress Gibbs-ringing artifacts in MRI images. Zhang et al. (16) proposed a CNN model to estimate and subtract the Gibbs artifact map from the original image. Although Zhang et al.’s CNN model outperformed other approaches, the training data were generated by simply zero-padding the k-space, namely, setting a certain percentage of high-frequency components to zero in the k-space domain, which, however, does not compensate the loss of frequency content per se.

In routine clinical MRI scanning, the application of different truncation ratios to k-space frequencies could produce varying data characteristics in the frequency domain, leading to different severity and manifestation of Gibbs artifacts in the image domain upon the application of inverse fast Fourier transform (FFT). Additionally, the inherent manufacturer variability could inevitably contribute to the variability of Gibbs artifacts in routine clinical MRI images. Thus, the direct utilization of MRI images from routine for network training would pose a huge challenge for the model to recognize and learn Gibbs artifact features in a reliable manner.

To address such an issue, we propose a DL model trained on artificially generated Gibbs artifacts with varying truncation ratios for efficient elimination of multi-frequency Gibbs artifacts. We present this article in accordance with the TRIPOD + AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1344/rc).

Methods

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Board of Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (IRB No. 2022-KY-130[K]). Anonymous retrospective data collection did not impact patient privacy and was thus exempted from informed consent. Informed consent was provided by all prospective individual participants.

We firstly developed a Gibbs artifact generator (GAG) to artificially create multi-frequency Gibbs artifact, where Gibbs artifacts were divided into subtypes based on the truncation ratio of k-space that caused artifacts. Each subtype encompassed both highly specific characteristics in the image domain and features in the frequency domain, thus facilitating easier artifact identification and learning by using the model. Subsequently, a DL model was constructed to learn the features of the Gibbs artifacts from each truncation frequency, facilitating the artifact identification and removal. Finally, we assessed the performance and robustness of the model, as well as its effectiveness in enhancing the ability of radiologists to identify spinal syrinx.

Data collection

The data collection for this study involved a retrospective study and a prospective study. The retrospective study provided data for model training, validation, and internal testing, whereas the prospective study focused on diagnostic assessment.

The retrospective datasets were obtained in June 2023 from an MRI scanner (NeuMR Universal, Neusoft Medical Systems, Shenyang, China), resulting in a total of 290,940 Digital Imaging and Communications in Medicine (DICOM) images from 4,936 scans. These scans encompassed 5 anatomical regions—head, spine, abdomen, pelvis, and joints—and included 67 unique MRI sequences, detailed in Table S1. The prospective diagnostic study consisted of two groups: a syrinx patient group and a healthy control group. The syrinx patient group included individuals clinically diagnosed with syrinx, whereas the control group consisted of individuals without neurological or spinal disorders, excluding individuals with spinal surgery history. This dataset comprised MRI scans from 30 volunteers—20 healthy adults and 10 syrinx patients—collected between November 2022 and May 2023 using 3 MRI scanners from different manufacturers (Siemens Magnetom Prisma, Siemens Healthineers, Germany; Philips Ingenia Elition X, Philips Healthcare, Chicago, IL, USA; and uMR 780, United Imaging, Shanghai, China) at Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine. Specific scanning parameters are outlined in the supplemental materials (Appendix 1).

GAG

In the present study, the GAG, a k-space signal truncation algorithm that artificially creates truncation effects at specific frequencies in the k-space, was used to obtain images containing Gibbs artifacts with different frequencies and visible artifact features (Figure 1). The Gibbs artifact enhancement algorithm was then used to enhance the artifact on the image, further improving the Gibbs artifact features, which allowed the network to learn the multi-frequency Gibbs artifact features more effectively.

Figure 1 Examples of Gibbs artifacts produced by truncating k-space at different frequencies. (A) MR image with truncation radio =0.675; (B) MR image with truncation radio =0.775; (C) MR image with truncation radio =0.875. MR, magnetic resonance.

In this study, the initial image was generated in the complex domain, with signals acquired through the radiofrequency coil. The complex signals were then transformed to k-space using magnetic field gradients and subsequently converted to a complex image using FFT. The principle of GAG is shown in Figure 2. In the first stage, the complex k-space matrix was split to two k-space matrices (Figure 2A,2B) in the same size. The Figure 2A matrix is obtained by retaining 70% signal from the original k-space with 30% zero filling. The Figure 2B matrix is obtained by retaining 30% signal from the rest of the original k-space with 70% zero filling. That is to say, Figure 2A matrix is a 70%-truncated signal from the original k-space signal, and Figure 2B matrix is a 30%-truncated signal correspondingly. Figure 2A,2B contains Gibbs artifacts with equal intensity but opposite direction. If Figure 2A,2B were combined directly, the truncation effect would be offset. Hereby, the signals in Figure 1B were multiplied by −1, with the results displayed in Figure 2C. Notably, the direction of subject of interest in the image in Figure 2C was simultaneously changed with that of Gibbs artifacts. To compromise this change, Figure 2B was filtered with Tukey window and multiplied by 2. Lastly, Figure 2E was obtained by adding Figure 2A,2C,2D together, and was then transformed back to image domain containing visible Gibbs artifact at certain frequency. It is important to note that the final image after applying GAG was recorded as a magnitude image, which is routinely used for clinical diagnosis.

Figure 2 Signal processing and artifact correction. (A) 70%-truncated k-space signal with Gibbs artifacts; (B) 30%-truncated k-space signal with opposite-direction Gibbs artifacts; (C) inverted signals from (B), altering artifact direction; (D) Tukey window-filtered and amplified version of (B); (E) final image after combining (A), (C), and (D), with visible Gibbs artifacts. FFT, fast Fourier transform; IFFT, inverse fast Fourier transform.

Model architecture

In this study, we developed a DL network (GibbsCut), which consists of three identical blocks (U-Net-GRU block) including a U-Net (17) and a GRU (18) module in each block. As shown in Figure 3A, in the “Data Preprocessing” stage, the fully-sampled MRI raw data were obtained using various MRI sequences from different anatomical regions. Multi-channel complex signals were combined using the SENSE method, a parallel imaging technique that utilizes coil sensitivity information. The combined complex k-space data were then transformed into the image domain using FFT. Only the magnitude component was retained, serving as the label data. For training data, simulated Gibbs artifacts were introduced using our GAG, and the resulting artifact-affected complex images were also transformed to magnitude images. This consistent preprocessing provided robust inputs for training the model. In the “Data training” stage, the input image passes through three U-NET-GRU blocks. Additionally, at the output layer of each U-NET-GRU block, the output is connected to the input image, allowing the network to directly learn residual features. Such design addresses the issue of network performance degradation, while also preventing gradient vanishing or exploding. Furthermore, it improves the overall network performance.

Figure 3 Model architecture diagrams. (A) A three-layer residual network based on U-Net and GRU. (B) The U-Net contains two downsampling and two upsampling blocks. (C) GRU. MR, magnetic resonance; FFT, fast Fourier transform; GRU, gate recurrent unit.

The U-Net module (Figure 3B), consists of an input layer, followed by 2 downsampling modules. Each downsampling module consists of a 3×3 convolutional layer, a rectified linear unit (ReLU) activation layer, and a 2×2 max pooling layer. In Figure 3B, 3×3×32 represents the size of the convolutional kernel (3×3) and the number of channels [32]. After passing through a downsampling module, the size of the image is halved, whereas the number of channels doubles. Next, there are two upsampling modules, each consisting of an upsampling convolutional layer, feature concatenation, and a 3×3 convolutional layer. Through downsampling, the receptive field of the network gradually expands, compressing the image and increasing the area that can be perceived within a unit area. This allows the network to perceive more low-frequency information in the image. The image is then restored through upsampling. During the upsampling process, the network architecture includes skip connections, which help to integrate information from the downsampling stages during upsampling.

Gate recurrent unit (GRU) (Figure 3C) is a type of recurrent neural network. Its performance is similar to that of long-short term memory but is computationally more efficient. It was primarily developed to address issues such as long-term memory and gradient propagation in reverse. The input-output structure of GRU is similar to that of a regular RNN, with a current input (x^t) and previous hidden state (h^t−1). It also mainly consists of reset gate, update gate, and candidate activation vector. GRU produces the current hidden node output (y^t) and passes it to the next hidden node state (h^t) to determine whether to retain the Gibbs artifact features learned in the U-Net module within the network.

In summary, in the upper layers of the U-Net network, local Gibbs artifact details are obtained (as the image resolution is high and many Gibbs artifact details can be perceived). In the lower layers of the network, low-frequency information from the image is obtained (as the receptive field is large, making it easier to learn global Gibbs artifact features). The skip connections preserve the learned information from different levels. This enables the entire U-Net network to effectively learn all the Gibbs artifact features in the image, ultimately achieving Gibbs artifact suppression.

The training process of the model began with data preparation, where a candidate data pool was created containing pairs of images with or without Gibbs ring artifacts, obtained from clinically collected data after processing by GAG. From the candidate data pool, pairs of images were randomly selected for data selection, with images containing Gibbs ring artifacts as input and those without Gibbs ring artifacts as labels, based on which the network model parameters were trained. The Adam algorithm, with parameters beta 1 =0.9 and beta 2 =0.999, initial learning rate of 0.001, and a loss function of L1-norm, was used for optimization. Network training involved comparing the trend curves of the loss functions of the training set and the tuning set during the tuning process, since the network training task was a regression task. When the trend curves of the loss functions of the training set and the tuning set tend to be the same and stable, it can be considered that the deep neural network is approaching a convergent state and obtaining an optimal network model.

Experiments

The efficacy, robustness, generalizability, and clinical applicability of GibbsCut were assessed by conducting a phantom study, a retrospective volunteer study, and a prospective diagnostic study. The severity of Gibbs artifacts was compared between the original image without applying any Gibbs artifact suppression algorithm, the image processed by the Tukey algorithm for k-space filtering (19), the image processed by the Gibbs elimination algorithm (GEA) for k-space filtering (20), and the image processed by the GibbsCut algorithm.

The phantom study

The phantom study aimed to assess the performance of the proposed model in removing artifacts on phantom scanning images. A standard System Phantom Model 130 (High Precision Devices, Boulder, CO, USA) underwent scanning using various scan sequences (see Appendix 1).

The retrospective volunteer study

To assess the robustness of the proposed model, we conducted a series of cross-validation experiments to examine the impact of imaging sequence, anatomical region, and severity of Gibbs artifact on the performance of the proposed model in removing Gibbs artifacts. Firstly, to assess the robustness on different imaging sequences and anatomic regions, two models were trained for each anatomical region and imaging sequence: the training and test sets of Model 1 and Model 2 shared the same sample size, but differed in image composition. Specifically, using the abdomen as an example, Model 1 was trained using MRI images randomly selected from the training sets of four other anatomical regions (head, spine, pelvis, and joints), excluding any images from the abdomen. The abdomen images were then used as the test set to evaluate performance. In contrast, Model 2 was trained using only abdomen training set data and tested on the abdomen images, allowing the model to specifically learn features related to that region (detailed in Table S2). Then, we constructed an index of Gibbs artifact intensity (GAI) to quantify the intensity of Gibbs artifacts (detailed in Figure S1). Finally, we calculated the correlation between the GAI and the mean square error (MSE) of the images (detailed in Table S2).

Additionally, to compare the total duration of full-sampling scans with the combined duration of partial-sampling scans and GibbsCut processing time, we performed full-sampling and partial-sampling scans across five distinct anatomical regions (the head, cervical spine, abdomen, pelvis, and joints) utilizing MRI scanners from four different manufacturers. The scan duration for each procedure was documented, and the computational time for GibbsCut processing of the partial-sampling data was also recorded. We then compared the total duration of the full-sampling scans with the combined duration of the partial-sampling scans and GibbsCut processing.

The prospective diagnostic study

In a random manner, four radiologists with over 5 years of experience were assigned to two groups: the original image group including original spinal MRI images, and the GibbsCut image group including spinal MRI images generated by GibbsCut. All radiologists were instructed to independently review the MRI image they received and make a diagnosis of the patient’s condition, namely, with or without syrinx.

Statistical analysis

Gibbs artifacts can be detected by radiologists as multiple parallel lines that are adjacent to high-contrast tissue boundaries. The severity of these artifacts can be assessed by analyzing the number, spacing, and amplitude of the parallel lines. The severity of Gibbs artifacts of the 500 randomly sampled MRI images was examined independently by two radiologists (both with 3 years of experience) in a double-blind manner and the image quality was scored according to the 5-point Likert scale as follows: 5= negligible presence of artifacts; 4= minor presence of artifacts; 3= moderate presence of artifacts; 2= significant presence of artifacts; 1= severe presence of artifacts (21). Disagreement was resolved by a senior radiologist (with 18 years of experience). Analysis of variance (ANOVA) and post-hoc analysis (Tukey test) were used to compare image quality scores. The area under the receiver operating characteristic curve (AUC) and DeLong tests were used to compare the diagnostic performance between original image and GibbsCut image groups. Cohen’s kappa was used for calculating the inter-observer and the intra-observer agreement. Kappa values were classified into four categories: slight to fair (0–0.4), moderate (0.41–0.6), substantial (0.61–0.8), and excellent (0.81–1.0) agreement. All statistical analysis were performed using R (version 4.3.0) and RStudio (version 2023.03.0).

Results

Phantom study

Figure 4 demonstrates that GibbsCut was more effective than Tukey or GEA in suppressing Gibbs artifacts in the quantitative phantom. Although some Gibbs artifacts still appeared at the edges of certain structures when the traditional k-space filtering algorithm was applied, GibbsCut suppressed Gibbs artifacts more thoroughly in the same locations.

Figure 4 Phantom scanning images before artifact removal (original image) and image processed by Tukey algorithm, GEA, and GibbsCut. GEA, Gibbs elimination algorithm.

Retrospective volunteer study

Image quality assessment

The ANOVA (Table S3) and the subsequent post hoc analysis (Table 1) showed that the GibbsCut algorithm produced higher image quality scores. These scores surpassed those obtained with conventional k-space filtering algorithms, such as Tukey and GEA, as well as the original images. This improvement was consistent across nearly all anatomical regions and for both internal and external testing sets (Figure 5A,5B).

Table 1

Post hoc analysis of the image quality scores of images processed using different methods for eliminating Gibbs artifacts

Anatomic region	Image group	Internal testing set		External testing set
Anatomic region	Image group	Mean difference	Adjusted P value	Mean difference	Adjusted P value
Head	GEA-Origin	0.700	<0.001	0.533	0.007
	GEA-Tukey	0.033	0.995	−0.300	0.251
	Tukey-Origin	0.667	<0.001	0.833	<0.001
	GibbsCut-Origin	1.267	<0.001	1.300	<0.001
	GibbsCut-Tukey	0.600	<0.001	0.467	0.023
	GibbsCut-GEA	0.567	0.001	0.767	<0.001
Abdomen	GEA-Origin	0.800	<0.001	0.567	0.057
	GEA-Tukey	−0.133	0.886	0.033	0.999
	Tukey-Origin	0.933	<0.001	0.533	0.082
	GibbsCut-Origin	0.800	<0.001	1.433	<0.001
	GibbsCut-Tukey	0.533	0.022	0.900	0.001
	GibbsCut-GEA	0.667	0.002	0.867	0.001
Spine	GEA-Origin	0.933	<0.001	0.567	0.004
	GEA-Tukey	0.200	0.606	0.167	0.740
	Tukey-Origin	0.733	<0.001	0.400	0.075
	GibbsCut-Origin	1.267	<0.001	1.233	<0.001
	GibbsCut-Tukey	0.533	<0.001	0.833	<0.001
	GibbsCut-GEA	0.333	0.173	0.667	<0.001

GEA, Gibbs elimination algorithm.

Figure 5 Evaluation of the performance and robustness of GibbsCut in removing Gibbs artifacts. (A,B) Comparison of the performance of GibbsCut artifact removal with that of traditional algorithms on the (A) internal and (B) external test sets, respectively. (C,D) Comparison of artifact removal performance of GibbsCut on (C) different MRI sequences and (D) different anatomical regions, respectively. (E) The Pearson correlation coefficients between Gibbs artifact intensity of original image and mean square error between GibbsCut processed image and reference image, and (F) the Pearson correlation coefficients between Gibbs artifact intensity of original image and image quality score of GibbsCut processed image, respectively. GEA, Gibbs elimination algorithm; SE, spin echo; TSE, turbo spin echo; FLAIR, fluid-attenuated inversion recovery; MRI, magnetic resonance imaging.

Model robustness assessment

The robustness of GibbsCut was assessed through cross-validation experiments on various imaging sequences and anatomical regions. The ANOVA (Tables S4,S5) and subsequent post hoc analysis (Table 2) found that the image quality scores of the images processed by Model 1 and Model 2 were higher than those of the original images. No significant differences in image quality scores were observed between Model 1 and Model 2 for all imaging sequences or anatomical regions (all P>0.05) (Figure 5C,5D).

Table 2

Post-hoc analysis of image quality score of images of different sequences and anatomic region processed by GibbsCut

Variables	Image group	Mean difference	Adjusted P value
Sequence
T1 SE axial	Model 1-Origin	1.533	<0.001
	Model 2-Origin	1.433	<0.001
	Model 2-Model 1	−0.1	0.805
T2 TSE axial	Model 1-Origin	1.267	<0.001
	Model 2-Origin	1.4	<0.001
	Model 2-Model 1	0.133	0.599
T2 TSE FLAIR axial	Model 1-Origin	1.533	<0.001
	Model 2-Origin	1.4	<0.001
	Model 2-Model 1	−0.133	0.659
T2 TSE sagittal	Model 1-Origin	1.767	<0.001
	Model 2-Origin	1.433	<0.001
	Model 2-Model 1	−0.333	0.125
T2 TSE coronal	Model 1-Origin	1.167	<0.001
	Model 2-Origin	1.167	<0.001
	Model 2-Model 1	0	>0.999
Anatomic region
Head	Model 1-Origin	1.367	<0.001
	Model 2-Origin	1.233	<0.001
	Model 2-Model 1	−0.133	0.760
Spine	Model 1-Origin	1.467	<0.001
	Model 2-Origin	1.3	<0.001
	Model 2-Model 1	−0.167	0.381
Abdomen	Model 1-Origin	1.7	<0.001
	Model 2-Origin	2.1	<0.001
	Model 2-Model 1	0.4	0.080
Pelvis	Model 1-Origin	1.567	<0.001
	Model 2-Origin	1.533	<0.001
	Model 2-Model 1	−0.033	0.980
Joints	Model 1-Origin	1.267	<0.001
	Model 2-Origin	1.233	<0.001
	Model 2-Model 1	−0.033	0.979

SE, spin echo; TSE, turbo spin echo; FLAIR, fluid-attenuated inversion recovery.

Furthermore, there was no statistically significant correlation between the GAI of the original images and either the MSE between the output and label images or the image quality score of the output image (Figure 5E,5F). The Pearson correlation coefficients between GAI and MSE were 0.14, 0.15, 0.02, and 0.09 for the abdomen, the head, the spine, and all anatomical regions, respectively (Figure 5E). Similarly, the Pearson correlation coefficients between GAI and image quality score were 0.14, 0.03, 0.04, and 0.04 for the abdomen, the head, the spine, and all anatomical regions, respectively (Figure 5F).

Figure 6 shows images of the head, abdomen, cervical spine, and thoracic spine before and after the removal of artifacts through the utilization of GibbsCut and conventional k-space filtering algorithms.

Figure 6 Examples before (left) and after (right) artifacts elimination by using GibbsCut. (A) Brain; (B) cervical spine; (C) liver; (D) thoracic spine.

Figure 7 shows examples of GibbsCut’s performance on MRI images with different severity of Gibbs artifacts.

Figure 7 Comparison of images with different GAIs before and after artifact removal through GibbsCut. Original image: the image before artifact removal. Label image: the reference image. Output image: the image after artifact removal by using GibbsCut. GAI, Gibbs artifact intensity, indicating the severity of Gibbs artifact; MSE, mean squared error, indicating the difference between the output image and the reference image.

Comparison of time consumption

Table 3 presents a comparison of two approaches for mitigating Gibbs artifacts in scans performed across four MRI scanners from different manufacturers and five anatomical regions, including abdomen, head, knee, pelvis, and cervical spine. The results indicated that the approach employing GibbsCut to eliminate artifacts in standard resolution scans required considerably less time. The implementation of post-processing of GibbsCut images after partial sampling can lead to a reduction in average scanning time by 26.45–44.90%.

Table 3

Comparison of scanning time (minute: second) between full-sampling and partial-sampling strategy

Anatomic region	Device	Full-sampling scan	Partial-sampling scan	GibbsCut algorithm	Time difference^†	Difference ratio^‡	Mean difference ratio
Head	Device A	05:12	02:18	00:03	02:51	54.81%	44.90%
	Device B	06:08	02:59	00:05	03:04	50.00%
	Device C	05:10	02:48	00:03	02:19	44.84%
	Device D	05:24	03:43	00:04	01:37	29.94%
Cervical spine	Device A	04:16	02:04	00:02	02:10	50.78%	43.90%
	Device B	04:32	02:30	00:02	02:00	44.12%
	Device C	04:10	02:24	00:03	01:43	41.20%
	Device D	06:30	03:54	00:02	02:34	39.49%
Abdomen	Device A	04:02	02:25	00:08	01:29	36.78%	26.45%
	Device B	03:58	02:58	00:05	00:55	23.11%
	Device C	04:14	03:43	00:06	00:25	9.84%
	Device D	02:13	01:18	00:04	00:48	36.09%
Pelvis	Device A	05:03	02:06	00:05	02:52	56.77%	43.27%
	Device B	04:51	03:26	00:04	01:21	27.84%
	Device C	05:06	02:24	00:04	02:38	51.63%
	Device D	08:11	05:05	00:05	03:01	36.86%
Joints	Device A	06:17	02:42	00:05	03:30	55.70%	39.83%
	Device B	05:18	03:19	00:03	01:56	36.48%
	Device C	06:55	04:20	00:04	02:31	36.39%
	Device D	09:00	06:09	00:05	02:46	30.74%

^†, time difference = full-sampling scan – (partial-sampling scan + GibbsCut algorithm); ^‡, difference ratio = (partial-sampling scan + GibbsCut algorithm)/full-sampling scan.

Prospective diagnostic study

Table 4 demonstrates that the diagnostic performance of the GibbsCut image group in identifying syrinx was generally higher than that of the original image group. Specifically, the GibbsCut group achieved an accuracy of 0.95 [95% confidence interval (CI): 0.91–0.98], sensitivity of 0.96 (95% CI: 0.85–0.99), and specificity of 0.95 (95% CI: 0.89–0.98). In comparison, the original image group showed an accuracy of 0.87 (95% CI: 0.81–0.92), sensitivity of 0.98 (95% CI: 0.88–1.00), and specificity of 0.83 (95% CI: 0.74–0.90). Additionally, the AUC for the GibbsCut group was significantly higher at 0.95 (95% CI: 0.92–0.99) than the AUC for the original MRI images, which was 0.90 (95% CI: 0.86–0.95) (DeLong test, P=0.04).

Table 4

Diagnostic performance comparison for syrinx identification between original and GibbsCut image groups

Metrics	Original image group	Turkey image group	GEA image group	GibbsCut image group
Sensitivity	0.98 (0.88–1.00)	0.96 (0.84–0.97)	0.96 (0.85–0.96)	0.96 (0.85–0.99)
Specificity	0.83 (0.74–0.90)	0.88 (0.82–0.92)	0.89 (0.83–0.91)	0.95 (0.89–0.98)
PPV	0.71 (0.58–0.82)	0.83 (0.75–0.88)	0.86 (0.79–0.89)	0.90 (0.77–0.97)
NPV	0.99 (0.94–1.00)	0.97 (0.94–0.98)	0.96 (0.90–0.97)	0.98 (0.93–1.00)
Accuracy	0.87 (0.81–0.92)	0.89 (0.84–0.93)	0.91 (0.84–0.96)	0.95 (0.91–0.98)
AUC	0.90 (0.86–0.95)	0.91 (0.88–0.95)	0.92 (0.87–0.97)	0.95 (0.92–0.99)

Data are presented as number (95% CI). GEA, Gibbs elimination algorithm; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the receiver operating characteristic curve; CI, confidence interval.

Both groups had comparable excellent intra-observer agreement (kappa =0.91 for GibbsCut image group; kappa =0.88 for original image group). The inter-observer agreement was superior in the GibbsCut MRI images group (kappa =0.88) compared to the original MRI images group (kappa =0.75).

Discussion

This study proposed an algorithm for generating high-quality features of multi-frequency Gibbs artifacts for model training. Subsequently, a DL-based model named GibbsCut was proposed and evaluated for the elimination of Gibbs artifacts from MRI images. The results demonstrated that GibbsCut outperformed traditional k-space filtering algorithms in suppressing Gibbs artifacts and generated MRI images with higher image quality scores compared to images processed by traditional k-space filtering algorithms and the original image. Specifically, GibbsCut demonstrated excellent robustness as it was unaffected by anatomical regions or imaging sequences, and it significantly reduced scanning time across various anatomical regions. Furthermore, GibbsCut may enhance diagnostic performance and reduce the incidence of misdiagnosis in spinal MRI.

The partial-sampling scan may not impact the overall contrast of the image, but it can lead to severe Gibbs artifacts along the phase encoding direction. Filtering methods in k-space, such as Tukey or GEA, can reduce spatial resolution and result in image blurring. Advanced extrapolation methods often have high computational complexity and low robustness. In most DL-based approaches (15,16,22-24), the label images of the training data were generated by adjusting sequence parameters and employing filtering methods. The input images were obtained by zero-padding the k-space of the label images. This can adversely affect the neural network’s ability to learn Gibbs artifact features, leading to poor generalization and specificity of the network. Additionally, this approach can only eliminate Gibbs artifacts at a specific truncation frequency, which does not align with clinical reality. Medical institutions often modify their protocols based on their specific requirements, resulting in significant heterogeneity in the truncation ratio of Gibbs artifacts in real-world scenarios. To evaluate the effectiveness of our proposed Gibbs artifact reduction model, we compared it with several well-established algorithms, including the Tukey window and GEA methods. The results, detailed in Figures 4-6, demonstrated that our model outperformed these traditional methods in terms of artifact reduction and image quality preservation. Additionally, other artificial intelligence (AI)-based approaches have shown some promising results. However, due to their experimental nature, these methods are not yet widely adopted in clinical practice, and their reproducibility remains limited. Challenges such as inconsistent implementation details and lack of standardization hinder their direct comparisons.

In the present study, it was shown that the images processed with GibbsCut achieved superior image quality scores across various MRI sequence types and anatomic regions in both internal and external datasets. Additionally, the Pearson correlation coefficients between the MSE and the GAI, as well as between the image quality score and GAI, were remarkably low, suggesting that the severity of Gibbs artifacts, quantified by GAI, does not have a significant impact on the performance of GibbsCut in removing artifacts. In other words, GibbsCut demonstrated strong robustness in effectively removing Gibbs artifacts regardless of artifact severity levels, which may be attributed to the training strategy and data composition method employed by the model. Specifically, a large number of clinical images were collected as labeled data for network training and the Gibbs artifact generation algorithm was employed to create images with visible Gibbs artifacts as inputs to construct the training data. Notably, this strategy offered several advantages. Firstly, the truncation operation between the label data and input data generated Gibbs artifacts with different frequency truncation ratios, allowing the neural network to learn the features of Gibbs artifacts. Secondly, the Gibbs artifact enhancement algorithm further boosted these features, making it easier for the network to identify and eliminate Gibbs artifacts accurately. Thirdly, our training sets possessed a remarkable diversity, encompassing a total of 67 mainstream MRI sequences. These sequences encompassed a wide range of complexities, including complex images, magnitude images, and phase images, in both 2D and 3D directions. Lastly, by categorizing the Gibbs artifacts based on the truncation ratio of k-space, the network’s specificity can be improved, enabling the identification and elimination of Gibbs artifacts with varying truncation ratios of k-space and expanding the range of clinical applications.

Moreover, the proposed GibbsCut approach efficiently suppressed Gibbs artifacts in partial-sampling scan data while reducing average scanning time by 26.45–44.90% compared to full-sampling scans. From a practical perspective, this approach has the potential to decrease the necessary scan time, potentially improving patient comfort and cooperation, particularly for individuals with limited ability to maintain immobility for extended periods. Furthermore, the technique streamlines the scanning process, reducing the workload and the risk of errors for radiological technicians and potentially increasing medical efficiency.

Our prospective diagnostic study demonstrated that GibbsCut MRI images were more capable in detecting spinal cord syrinx compared to original MRI images (AUC: 0.95 vs. 0.90). Radiologists in the GibbsCut group showed significantly higher specificity (0.95 vs. 0.83) and positive predictive value (0.90 vs. 0.71) compared to the original image group. In fact, spinal cord syrinx is characterized by pathologic dilation of the central canal, appearing as elongated signal voids parallel to the spinal cord on MRI images, which bear some resemblance to Gibbs artifacts and may mislead diagnosis. Following GibbsCut processing, it was shown that the reduction of Gibbs artifacts in the images decreased the false-positive rate for radiologists. This is crucial for the diagnosis of primary spinal cord pathologies as any failure to identify spinal cord syrinx may compromise further diagnosis and treatment. Therefore, the use of GibbsCut could facilitate medical decision-making and alleviate unnecessary healthcare burden. In this study, we specifically aimed to address the challenges posed by Gibbs artifacts, which arose during the image reconstruction phase as a result of k-space truncation. It is important to note that systematic errors, such as magnetic field inhomogeneities or gradient imperfections, occur earlier in the imaging process—during signal excitation and acquisition. Given that these two types of artifacts arise from different stages of the imaging workflow, the coincidence between Gibbs artifacts and systematic errors is inherently low. However, both types of artifacts, if not adequately managed, can contribute to reduced image quality and potentially impact diagnostic accuracy (25).

The present study has a few limitations. Firstly, the performance of the GibbsCut technique in eliminating Gibbs artifacts was influenced by the image resolution of the input. Our network has the ability to process input images with varying resolutions, including those with Gibbs artifacts, during network training. However, to ensure practical considerations in clinical data acquisition and network convergence stability, the resolution range of input images was confined to 180–480 pixels (without interpolation). The characteristics of convolution kernels in deep CNN results in a reduction in the network’s ability to identify and eliminate Gibbs artifacts in high-resolution input images that exceed the training image resolution range. This is particularly true when the input image resolution is above 600, which may lead to ineffective Gibbs artifact elimination. Secondly, our study was specifically designed to assess the clinical utility of the GibbsCut algorithm in the context of spinal syrinx identification. This was driven by the recognition that spinal syrinx identification is the most prevalent and prototypical scenario in which Gibbs artifacts significantly impact radiologists’ interpretation of MRI images. It is worth noting that Gibbs artifacts also influence radiologists’ ability to identify other lesions and can have implications for various post-processing and quantitative analysis of MRI images. However, our study did not conduct separate evaluations for these particular scenarios. Thirdly, in the diagnostic experiment, the diagnostic improvement brought to radiologists by GibbsCut, although statistically significant, is still relatively limited. In the future, we will test this Gibbs artifact removal technology in more diagnostic scenarios.

Conclusions

This study introduced the concept of multi-frequency Gibbs artifacts and presented a novel algorithm for generating Gibbs artifacts based on this concept. The generated images characterized by pronounced multi-frequency Gibbs artifacts were used as the training set for the DL model, enabling the model to effectively identify and eliminate Gibbs artifacts in an efficient manner. The results demonstrated the efficacy of the proposed model in successfully removing Gibbs artifacts with high robustness. Furthermore, a prospective diagnostic study revealed that the proposed model could significantly boost the accuracy in detecting syrinx, further showcasing the clinical significance of the proposed model.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD + AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1344/rc

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1344/coif). Z.M., L.L., Yuting Ling, H.G., X.L., and Q.X. are employees of Neusoft Medical Systems Co., Ltd. Z.M. has two patents pending (CN116309917A, CN114140340A), and L.L. has one patent pending (CN116309917A). The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Board of Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (IRB No. 2022-KY-130[K]). Anonymous retrospective data collection did not impact patient privacy and was thus exempted from the requirement for informed consent. Informed consent was provided by all prospective individual participants.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Ades-Aron B, Veraart J, Kochunov P, McGuire S, Sherman P, Kellner E, Novikov DS, Fieremans E. Evaluation of the accuracy and precision of the diffusion parameter EStImation with Gibbs and NoisE removal pipeline. Neuroimage 2018;183:532-43. [Crossref] [PubMed]
Stadler A, Schima W, Ba-Ssalamah A, Kettenbach J, Eisenhuber E. Artifacts in body MR imaging: their appearance and how to eliminate them. Eur Radiol 2007;17:1242-55. [Crossref] [PubMed]
Lerch JP, van der Kouwe AJ, Raznahan A, Paus T, Johansen-Berg H, Miller KL, Smith SM, Fischl B, Sotiropoulos SN. Studying neuroanatomy using MRI. Nat Neurosci 2017;20:314-26. [Crossref] [PubMed]
Ferreira P, Gatehouse P, Kellman P, Bucciarelli-Ducci C, Firmin D. Variability of myocardial perfusion dark rim Gibbs artifacts due to sub-pixel shifts. J Cardiovasc Magn Reson 2009;11:17. [Crossref] [PubMed]
Di Bella EV, Parker DL, Sinusas AJ. On the dark rim artifact in dynamic contrast-enhanced MRI myocardial perfusion studies. Magn Reson Med 2005;54:1295-9. [Crossref] [PubMed]
Perrone D, Aelterman J, Pižurica A, Jeurissen B, Philips W, Leemans A. The effect of Gibbs ringing artifacts on measures derived from diffusion MRI. Neuroimage 2015;120:441-55. [Crossref] [PubMed]
Veraart J, Fieremans E, Jelescu IO, Knoll F, Novikov DS. Gibbs ringing in diffusion MRI. Magn Reson Med 2016;76:301-14. [Crossref] [PubMed]
Bronskill MJ, McVeigh ER, Kucharczyk W, Henkelman RM. Syrinx-like artifacts on MR images of the spinal cord. Radiology 1988;166:485-8. [Crossref] [PubMed]
Levy LM, Di Chiro G, Brooks RA, Dwyer AJ, Wener L, Frank J. Spinal cord artifacts from truncation errors during MR imaging. Radiology 1988;166:479-83. [Crossref] [PubMed]
Phillips C, Bagley B, McDonald MA, Schuster NM. Gibbs or Truncation Artifact on MRI Mimicking Degenerative Cervical Myelopathy. Pain Med 2022;23:857-9. [Crossref] [PubMed]
Archibald R, Gelb A. A method to reduce the Gibbs ringing artifact in MRI scans while keeping tissue boundary integrity. IEEE Trans Med Imaging 2002;21:305-19. [Crossref] [PubMed]
Block KT, Uecker M, Frahm J. Suppression of MRI truncation artifacts using total variation constrained data extrapolation. Int J Biomed Imaging 2008;2008:184123. [Crossref] [PubMed]
Aggarwal K, Manso Jimeno M, Ravi KS, Gonzalez G, Geethanath S. Developing and deploying deep learning models in brain magnetic resonance imaging: A review. NMR Biomed 2023;36:e5014. [Crossref] [PubMed]
Karimi D, Warfield SK. Diffusion MRI with machine learning. Imaging Neuroscience 2024;2:1-55.
Zhao XL, Zhang HL, Zhou YL, Bian W, Zhang T, Zou XM. Gibbs-ringing artifact suppression with knowledge transfer from natural images to MR images. Multimed Tools Appl 2020;79:33711-33.
Zhang Q, Ruan G, Yang W, Liu Y, Zhao K, Feng Q, Chen W, Wu EX, Feng Y. MRI Gibbs-ringing artifact reduction by means of machine learning using convolutional neural networks. Magn Reson Med 2019;82:2133-45. [Crossref] [PubMed]
Ronneberger O, Fischer P, Brox T, editors. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; Cham: Springer International Publishing; 2015.
Cho K, Van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y, editors. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Conference on Empirical Methods in Natural Language Processing; 2014: 1724-34.
Parker DL, Gullberg GT, Frederick PR. Gibbs artifact removal in magnetic resonance imaging. Med Phys 1987;14:640-5. [Crossref] [PubMed]
Constable RT, Henkelman RM. Data extrapolation for truncation artifact removal. Magn Reson Med 1991;17:108-18. [Crossref] [PubMed]
Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med 2005;37:360-3.
Penkin MA, Krylov AS, Khvostikov AV. Hybrid Method for Gibbs-Ringing Artifact Suppression in Magnetic Resonance Images. Programming and Computer Software 2021;47:207-14.
Wang YD, Song Y, Xie HB, Li WJ, Hu BW, Yang G. Reduction of Gibbs Artifacts in Magnetic Resonance Imaging Based on Convolutional Neural Network. 2017 10th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI); 2017.
Muckley MJ, Ades-Aron B, Papaioannou A, Lemberskiy G, Solomon E, Lui YW, Sodickson DK, Fieremans E, Novikov DS, Knoll F. Training a neural network for Gibbs and noise removal in diffusion MRI. Magn Reson Med 2021;85:413-28. [Crossref] [PubMed]
Mazur-Rosmus W, Krzyżak AT. The effect of elimination of gibbs ringing, noise and systematic errors on the DTI metrics and tractography in a rat brain. Sci Rep 2024;14:15010. [Crossref] [PubMed]

Cite this article as: Dai L, Wang D, Mao X, Miao Z, Lu L, Ling Y, Tan H, Li Z, Guo H, Liang X, Xu Q, Li Y. Development and evaluation of a deep learning model for multi-frequency Gibbs artifact elimination. Quant Imaging Med Surg 2025;15(2):1160-1174. doi: 10.21037/qims-24-1344

Development and evaluation of a deep learning model for multi-frequency Gibbs artifact elimination

Introduction

Methods

Data collection

GAG

Model architecture

Experiments

The phantom study

The retrospective volunteer study

The prospective diagnostic study

Statistical analysis

Results

Phantom study

Retrospective volunteer study

Image quality assessment

Table 1

Model robustness assessment

Table 2

Comparison of time consumption

Table 3

Prospective diagnostic study

Table 4

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share