Influence of image preprocessing on reproducibility and longitudinal repeatability analysis of radiomics features in magnetic resonance image-guided accelerator imaging

Hang Yu; Weige Wei; Yuchuan Fu; Bin Tang; Qing Xiao; Xiangbin Zhang; Guyu Dai; Shuangshuang He

doi:10.21037/qims-2025-aw-2151

Original Article

Influence of image preprocessing on reproducibility and longitudinal repeatability analysis of radiomics features in magnetic resonance image-guided accelerator imaging

Hang Yu^1#, Weige Wei^1#, Yuchuan Fu¹, Bin Tang², Qing Xiao¹, Xiangbin Zhang¹, Guyu Dai¹, Shuangshuang He³

¹Department of Radiotherapy Physics & Technology, West China Hospital, Sichuan University, Chengdu, China; ²Department of Radiation Oncology, Radiation Oncology Key Laboratory of Sichuan Province, Sichuan Cancer Hospital & Institute, Chengdu, China; ³Department of Radiation Oncology and Department of Head and Neck Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China

Contributions: (I) Conception and design: H Yu, W Wei; (II) Administrative support: H Yu, Y Fu; (III) Provision of study materials or patients: H Yu, B Tang; (IV) Collection and assembly of data: H Yu, G Dai, S He; (V) Data analysis and interpretation: H Yu, W Wei, Y Fu, X Zhang, Q Xiao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Yuchuan Fu, PhD. Department of Radiotherapy Physics & Technology, West China Hospital, Sichuan University, No. 37, Guoxue Lane, Wuhou District, Chengdu 610041, China. Email: ychfu@hotmail.com.

Background: Radiomics has emerged as a promising approach for extracting quantitative features from medical images to support tumor characterization and treatment response assessment. With the increasing use of magnetic resonance-guided linear accelerator (MR-Linac), ensuring the stability and reproducibility of radiomics features derived from magnetic resonance imaging (MRI) has become critical for reliable clinical applications. However, image preprocessing parameters may substantially influence feature stability. Therefore, this study aimed to evaluate the effects of image preprocessing parameters on radiomics feature stability in MRI acquired on a 1.5T MR-Linac system, with specific assessment of test-retest repeatability, longitudinal repeatability, and inter-platform reproducibility.

Methods: MRI datasets were acquired using the American College of Radiology (ACR) phantom on 1.5T MR-Linac systems for test-retest, longitudinal, and inter-platform analysis. The T1-weighted (T1w), T2-weighted (T2w), and fluid-attenuated inversion recovery (FLAIR) sequences were collected. Five regions of interest were delineated on T1w images and propagated to corresponding T2w and FLAIR sequences. Image preprocessing strategies included voxel resampling (multiple isotropic resolutions), intensity normalization (none or Z-score), and intensity discretization using bin width (BW) or bin number (BN). Feature stability was assessed using the intraclass correlation coefficient (ICC) and coefficient of variation (CV). Features with ICC values >0.9 and CV values <10% were considered robust.

Results: Optimal preprocessing strategies varied across imaging sequences and evaluation tasks. In the test-retest analysis, the proportion of robust features reached 85.87% for T1w, 89.13% for T2w, and 90.22% for FLAIR sequences under optimal settings. In contrast, longitudinal repeatability and inter-platform reproducibility showed substantially lower stability, with robust feature proportions ranging from 42.39% to 76.09% across sequence and preprocessing configurations. Larger voxel sizes (>2 mm isotropic) consistently reduced stability across all tasks. The BN discretization method generally yielded higher proportions of robust features than the BW method; however, this advantage was sequence- and task-dependent. Z-score normalization had minimal effect when applied with BN discretization, but reduced feature stability when combined with the BW discretization.

Conclusions: Image preprocessing parameters significantly influence the stability of radiomics features acquired on MR-Linac. Stability varies considerably between test-retest repeatability and longitudinal repeatability or inter-platform reproducibility. Task- and sequence-specific optimization of preprocessing strategies is therefore essential before clinical implementation. Further validation in clinical datasets is acquired to support robust integration of MR-Linac radiomics into adaptive radiotherapy workflows.

Keywords: Radiomics; image preprocessing; longitudinal repeatability; inter-platform reproducibility; magnetic resonance-guided linear accelerator (MR-Linac)

Submitted Oct 13, 2025. Accepted for publication Mar 09, 2026. Published online Apr 13, 2026.

doi: 10.21037/qims-2025-aw-2151

Introduction

Radiomics is a promising field that goes beyond traditional visual interpretation by extracting quantitative information from medical images using predefined image features to reveal patterns not visible to the human eye (1). With the increasing focus on transforming subjective qualitative assessments into objective quantitative analyses, radiomics has become increasingly utilized in diverse clinical tasks, including diagnosis, tumor grading and staging, as well as predicting prognosis, survival, treatment response, and complications (2-5).

Despite its growing potential in clinical applications, the integration of radiomics into routine clinical workflows remains limited (6,7). A major factor limiting its clinical adoption is the lack of repeatability and reproducibility of radiomics features across different manufacturers, scanners, and acquisition protocols (8-10). Such variations introduce non-biological differences that may obscure true biological signals, thus undermining the reliability of radiomics-based models. In the typical radiomics pipeline, image preprocessing is an intermediate step between segmentation and feature extraction, serving as a key strategy to enhance feature robustness. Numerous studies have demonstrated that the robustness of radiomics features is highly dependent on the choice of image preprocessing strategies, emphasizing the crucial role of preprocessing in radiomics workflows (11,12). Generally, radiomics preprocessing comprises three main steps: voxel resampling, intensity normalization, and intensity discretization. However, the optimal combination of these steps remains unclear, particularly in magnetic resonance imaging (MRI)-based radiomics, which presents unique challenges due to its inherent variability.

In recent years, MRI has attracted increased attention in radiomics research compared to computed tomography (CT) and positron emission tomography-computed tomography (PET-CT) (13). This growing interest is largely due to MRI’s superior soft tissue contrast and its flexibility in supporting a wide range of imaging protocols for both qualitative evaluation and quantitative analysis (14,15). However, MRI is inherently sensitive to multiple technical factors, such as magnetic field strength, gradient system performance, and acquisition parameters (16). Moreover, pixel intensity values in conventional MRI lack standardized physical meaning, and there is no universally accepted framework for their quantitative interpretation. These characteristics significantly reduce image comparability and highlight the need for carefully designed preprocessing strategies to ensure robust MRI-based radiomics features.

Several studies involving phantoms (11,17) or human subjects (14,17-20) have demonstrated that the repeatability and reproducibility of MRI-based radiomics features are sensitive to the specific settings used in each preprocessing step. These findings emphasize the necessity of optimizing preprocessing parameters to ensure reliable radiomics analysis. For example, Wichtmann et al. investigated the effects of resampling, discretization, and rescaling on radiomics features in 3.0T MRI using fruit phantoms, evaluating feature values and test-retest repeatability across T1-weighted (T1w), T2-weighted (T2w), and fluid-attenuated inversion recovery (FLAIR) sequences (11). Marfisi et al. assessed the effects of resampling and discretization on myocardial radiomics features derived from T1 and T2 mapping in patients with hypertrophic cardiomyopathy (14). Li et al. analyzed how intensity normalization reduces scanner-related variability in brain MRI radiomics, using both phantom and clinical datasets (17). Despite these advances, current literature has two key limitations. First, most studies do not systematically and comprehensively evaluate preprocessing strategies, often focusing on isolated aspects while keeping others fixed. Second, assessments are typically limited to cross-sectional analyses focusing on feature values or test-retest repeatability, without addressing longitudinal or inter-platform variability.

Longitudinal repeatability of MRI-based radiomics features is critical, as longitudinal assessments are essential for many clinical applications (21-23). Radiotherapy serves as a typical example, where patients generally undergo multiple treatment sessions over several weeks. In such scenarios, stable and reliable imaging features are essential for monitoring treatment response and guiding clinical decisions throughout therapy. This need has become even more crucial with the advent of magnetic resonance-guided radiotherapy (MRgRT) systems (24,25), which integrate MRI with linear accelerators to provide high-quality imaging during treatment. In addition to tracking tumor morphological and functional changes, MRgRT enables monitoring of radiomics feature alterations, which may facilitate adaptive radiotherapy (26), potentially leading to more precise treatments and improved patient outcomes. Therefore, it is essential to ensure that radiomics features remain stable over time in longitudinal evaluations. Ensuring reproducibility across different scanners and platforms is equally critical, particularly for multicenter studies. Nevertheless, there remains a lack of systematic investigations into how various preprocessing strategies affect both the longitudinal repeatability and cross-platform reproducibility of radiomics features.

In this phantom-based study conducted on magnetic resonance-guided linear accelerator (MR-Linac) systems, the effects of different image preprocessing strategies on radiomics feature stability were systematically evaluated, focusing on three key aspects: test-retest repeatability, longitudinal repeatability, and inter-platform reproducibility. All image preprocessing steps were performed using open-source software tools.

Methods

As this was a phantom-based study, no ethical approval or written informed consent was required. To improve clarity and facilitate understanding of the methodological framework, the overall study workflow is illustrated in Figure 1.

Figure 1 Schematic representation of the study design and workflow. BN, bin number; BW, bin width; CV, coefficient of variation; FLAIR, fluid-attenuated inversion recovery; GLCM, gray-level co-occurrence matrix; GLDM, gray-level dependence matrix; GLRLM, gray-level run length matrix; GLSZM, gray-level size zone matrix; ICC, intraclass corrlation coefficient; MRI, magnetic resonance imaging; NGTDM, neighboring gray-tone difference matrix; ROI, region of interest; T1w, T1-weighted; T2w, T2-weighted.

Data acquisition

The study was based on previously collected data (27). Three commonly used brain MRI sequences for MR-Linac, including T1w, T2w, and FLAIR, were acquired at two independent institutions using two clinical 1.5T Unity MR-Linac systems (System A and System B; Elekta AB, Stockholm, Sweden) (28). Both systems were the same vendor-manufactured model with identical nominal hardware configuration and field strength, differing only in installation site and routine clinical calibration. Imaging protocols were harmonized across sites to minimize variability due to protocols. Minor site-specific calibration and environmental differences may nevertheless exist and reflect real-world multi-center conditions. Detailed acquisition parameters and imaging characteristics are provided in Table 1. All scans were performed using both the built-in posterior coil and a fixed anterior coil. The same American College of Radiology (ACR) phantom and identical MRI protocols were used at both institutions to ensure consistency, as shown in Figure 1A. This anterior coil was positioned as close as possible to the ACR phantom to optimize image quality and simulate the clinical positioning conditions of brain tumor patients undergoing MR-Linac systems.

Table 1

MRI sequence parameters

Parameter	T1w	T2w	FLAIR
Acquisition type	3D-FFE	3D-FSE	3D-IR
TE/TR (ms)	3.6/8	182/2,100	390/4,800
Acquisition orientation	Axial	Axial	Axial
FOV (AP × RL × FH, mm³)	280×202×200	280×203×200	280×203×200
Acquisition voxel size (mm³)	1.1×1.1×2.2	1.21×1.21×2.4	1.3×1.3×2.6
Reconstruction voxel size (mm³)	0.7×0.7×1.1	0.583×0.583×1.2	0.58×0.58×1.3
Actual slice gap (mm)	−1.1	−1.2	−1.3
Echo train length	155	75	180
Flip angle (degree)	8	90	40
Oversample factor	1.7	1.6	1.8
The number of signal averages	1	1	2
Turbo direction	Z	Y	Y
Pixel bandwidth (Hz/pixel)	246.6	690.8	741.9
Profile order	Linear	Linear	Linear
Relative SNR	1.00	1.00	1.00
Scan time	4:58	4:56	5:31

3D, three-dimensional; AP, anterior-posterior; FFE, fast field echo; FH, foot-head; FLAIR, fluid-attenuated inversion recovery; FOV, field of view; FSE, fast spin echo; IR, inversion recovery; MRI, magnetic resonance imaging; RL, right-left; SNR, signal-to-noise ratio; T1w, T1-weighted; T2w, T2-weighted; TE, echo time; TR, repetition time.

System A was used for test-retest and longitudinal repeatability analyses, whereas System B was used for inter-platform reproducibility assessment. For test-retest on System A, the phantom was scanned once and then repositioned and rescanned within the same session; this pair of scans constituted the test-retest dataset. For inter-platform comparison, a single scan was acquired on System B and compared with the corresponding scan for System A; no repositioning was performed on System B. For longitudinal analysis, one scan was acquired per day on System A over 30 consecutive days to simulate the treatment course, resulting in a total of 30 longitudinal scans. To minimize acquisition-related variability in the radiomics analysis, the position of the ACR phantom and the height of the front coil were kept as consistent as possible across all scans.

Image segmentation

All analyses were conducted on reconstructed images, as the original MRI data are k-space measurements not directly interpretable as images. Therefore, image processing was performed based on the reconstructed voxel size rather than the originally acquired voxel size.

Five regions of interest (ROIs) were manually delineated on the T1w images of the scanned phantom using the Monaco v5.40 (Elekta AB, Stockholm, Sweden) treatment planning system (TPS), as shown in Figure 1B. Delineation was performed by a single observer on images by System A on the first day of scanning. ROIs 1–3 were delineated starting at a point 50.6 mm from the head end of the phantom and extending 90.2 mm toward the foot, while ROIs 4 and 5 were delineated within the triangular regions shown in Figure 1B, comprising three slices. These contours, together with the corresponding images, served as the baseline for subsequent processing. The intensity distributions for the five ROIs in the representative T1w sequence are shown in Table S1. Baseline images and ROIs were exported from Monaco (used for initial delineation due to MR-Linac integration) and imported into Raystation v12A (Raysearch, Stockholm, Sweden) for rigid registration and ROI propagation, leveraging its more flexible registration tools. Rigid registration was performed between the baseline images and each target image, after which the original ROI masks were propagated accordingly. No re-segmentation or manual adjustment of the ROIs was performed for any scan. The target images for propagation included (I) other sequences acquired on the same day as the baseline; (II) sequences acquired on different days from System A; and (III) corresponding sequences obtained from System B.

Image preprocessing

Before radiomics feature extraction, all images exported from Raystation TPS underwent standardized preprocessing using Python scripts based on YAML configuration files integrated within the Pyradiomics v3.1.0 framework (29). To ensure spatial consistency, voxel sizes were resampled to isotropic resolutions of 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 3, or 5 mm using the sitkBSpline interpolator. Intensity discretization was performed using two methods: (I) a fixed bin number (BN) set to 8, 16, 32, 64, 128, 256, or 512; and (II) a fixed bin width (BW) set to 1, 5, 10, 20, 25, 50, 75, or 100. Intensity normalization was either applied using Z-score normalization or not applied, as shown in Figure 1C. And a schematic of the full preprocessing workflow has been added to Appendix 1. Due to computational instability, feature extraction failed for specific preprocessing combinations involving a BW of 1, non-Z-score normalization, and voxel resampling to 0.5, 0.75, 1, 1.25, 1.5, or 1.75 mm. These failures were likely caused by computational overload during feature calculation under these conditions. No additional preprocessing steps, such as re-segmentation, were applied beyond resampling, discretization, and normalization. Although the rescale slope varied across phantom images, no correction was applied before BN/BW discretization. This is considered acceptable for homogeneous phantom data, but for patient imaging, slope/intercept correction is recommended.

Radiomics features

Radiomics feature extraction was performed in Python using the PyRadiomics v3.1.0 library, following the guidelines of the Image Biomarker Standardisation Initiative (IBSI) (30), on a workstation equipped with an 11^th Gen Intel Core i7-11800H CPU @ 2.30 GHz and 16 GB of RAM. Features were extracted separately for each imaging sequence. The extracted features included 18 first-order features, 23 gray-level co-occurrence matrix (GLCM) features, 16 gray-level run length matrix (GLRLM) features, 16 gray-level size zone matrix (GLSZM) features, 14 gray-level dependence matrix (GLDM) features, and 5 neighboring gray-tone difference matrix (NGTDM) features. Shape features were not included in the analysis, as the ROI masks were identical across all datasets. A complete list of extracted features is provided in Table S2.

Statistical analysis

This study assessed the effects of preprocessing on radiomics features estimation in terms of test-retest repeatability, longitudinal repeatability, and inter-platform reproducibility. The effects of preprocessing on radiomics feature stability were systematically analyzed across combinations of three key parameters: (I) voxel resampling size; (II) intensity discretization method and value, either fixed BN or fixed BW; and (III) use or absence of Z-score normalization. The intraclass correlation coefficient (ICC) and coefficient of variation (CV) were employed to quantify repeatability and reproducibility. A two-way mixed-effects model was applied for repeatability assessments, reflecting repeated measurements under identical conditions. A two-way random-effects model was used for reproducibility analysis, treating scanners as a random sample from a larger population to allow generalization of the findings to other scanners with similar characteristics. Both models were based on single-rater measurements with absolute agreement (31). ICC values were interpreted as follows: excellent (ICC >0.9), good (0.75< ICC ≤0.9), moderate (0.5< ICC ≤0.75), and poor (ICC ≤0.5) (31). The CV, defined as the ratio of the standard deviation to the mean, was included because relying solely on ICC may not provide a comprehensive evaluation, given its sensitivity to data variance (32). CV values were categorized as excellent (CV <10%), good (10%≤ CV <20%), moderate (20%≤ CV <30%), and poor (CV ≥30%) (33). Radiomics features were considered highly repeatable or reproducible if they met both ICC >0.9 and a CV <10%. Features with ICC <0.5 or CV ≥30% were considered poorly repeatable or reproducible. Features falling between these thresholds were classified as having moderate to good repeatability or reproducibility. The ICC was calculated in Python (version 3.7.16) using the Pingouin library (version 0.5.3), following its built-in standard statistical algorithms.

Results

General observations

Table 2 summarizes the optimal preprocessing configurations for each MRI sequence and evaluation task, defined as the combinations yielding the highest proportion of stable radiomics features. These configurations can therefore be considered recommended settings for assessing repeatability or reproducibility under the respective experimental conditions. Overall, the test-retest task yielded a substantially higher proportion of stable features compared to the longitudinal and inter-platform tasks. Among the three sequences, FLAIR images exhibited the highest stability in test-retest analysis, with up to 90.22% of features classified as stable. Notably, for the T1w and FLAIR sequences, inter-platform reproducibility was higher than longitudinal repeatability under their respective optimal preprocessing settings. In contrast, for T2w images, longitudinal repeatability reached 63.04%, exceeding the inter-platform reproducibility of 50.00%.

Table 2

Optimal image preprocessing settings yielding the highest proportion of repeatable or reproducible radiomics features across different imaging sequences and evaluation tasks

Task & sequence	Stable features	Resampling	Normalization	Discretization
Test-retest T1w	85.87% repeatable	0.75 mm isotropic	None	BN =512
Test-retest T2w	89.13% repeatable	0.5 mm isotropic	None/Z-score	BN =512
Test-retest FLAIR	90.22% repeatable	1 mm isotropic	None	BW =5
Longitudinal T1w	42.39% repeatable	No spatial resampling	None	BW =5
Longitudinal T2w	63.04% repeatable	No spatial resampling	None	BN =512
Longitudinal FLAIR	45.65% repeatable	0.5 mm isotropic	Z-score	BW =100
Inter-platform T1w	66.30% reproducible	1.5 mm isotropic	None/Z-score	BN =512
Inter-platform T2w	50.00% reproducible	1.5 mm or no resampling	None	BW =5/10/25
Inter-platform FLAIR	76.09% reproducible	0.5 mm isotropic	None	BN =256

BN, bin number; BW, bin width; FLAIR, fluid-attenuated inversion recovery; T1w, T1-weighted; T2w, T2-weighted.

The best preprocessing strategies showed the following trends: for resampling, either no spatial resampling or voxel sizes no larger than 1.5 mm produced better results. For discretization, both the BW and BN approaches yielded optimal results depending on the task and image sequence. For BN-based discretization, values greater than or equal to 256, especially 512, were generally preferred. For BW-based discretization, values less than 25 were typically optimal, with 5 showing the best performance in most cases, except for the longitudinal FLAIR task. Regarding normalization, the absence of Z-score normalization generally led to better performance. However, in the longitudinal FLAIR task, the best results were obtained with Z-score normalization combined with a BW of 100.

Influence of image preprocessing on test-retest repeatability

Table 3 and Figure S1 summarize the effects of various preprocessing strategies on the test-retest repeatability of radiomics features across different MRI sequences. For T1w images, the combination of Z-score normalization and BN discretization yielded the highest repeatability for both first-order and textural features. In T2w images, Z-score normalization with BN discretization performed best for first-order features, while BW discretization without normalization was more favorable for textural features. For FLAIR sequences, both Z-score normalization with BN and with BW discretization produced high repeatability for first-order features. In contrast, BN discretization (regardless of normalization) was optimal for textural features.

Table 3

Median (Q1, Q3) values of radiomics features repeatability metrics (ICC and CV) under different preprocessing strategies for test-retest analysis across MRI sequences

Preprocessing strategy	Feature type	T1w ICC	T1w CV (%)	T2w ICC	T2w CV (%)	FLAIR ICC	FALIR CV (%)
Z-score + BN	First-order	0.99 (0.97, 1.00)	4.13 (1.39, 9.43)	0.99 (0.98, 1.00)	4.07 (2.35, 7.56)	0.97 (0.95, 0.99)	5.25 (3.67, 10.28)
Z-score + BN	Textural	0.99 (0.96, 1.00)	4.76 (1.97, 9.73)	0.98 (0.93, 0.99)	5.79 (2.27, 10.81)	0.96 (0.88, 0.99)	6.47 (2.35, 12.14)
Z-Score + BW	First-order	0.98 (0.97, 1.00)	4.21 (1.47, 9.54)	0.99 (0.98, 1.00)	4.10 (2.36, 7.62)	0.98 (0.95, 0.99)	5.25 (3.92, 10.28)
Z-Score + BW	Textural	0.97 (0.88, 1.00)	7.14 (2.43, 16.61)	0.97 (0.89, 0.99)	7.39 (3.02, 15.84)	0.97 (0.92, 0.99)	8.21 (3.11, 14.41)
Non-Z-score + BN	First-order	0.99 (0.97, 1.00)	4.90 (2.00, 9.07)	0.99 (0.98, 1.00)	4.28 (2.20, 7.09)	0.98 (0.96, 0.99)	5.56 (3.96, 9.86)
Non-Z-score + BN	Textural	0.99 (0.96, 1.00)	4.74 (1.96, 9.72)	0.98 (0.93, 0.99)	5.79 (2.26, 10.77)	0.96 (0.88, 0.99)	6.46 (2.34, 12.14)
Non-Z-score + BW	First-order	0.99 (0.97, 1.00)	5.01 (2.17, 9.07)	0.99 (0.98, 1.00)	4.29 (2.20, 7.09)	0.98 (0.96, 0.99)	5.56 (3.94, 9.94)
Non-Z-score + BW	Textural	0.99 (0.95, 1.00)	5.41 (1.91, 11.32)	0.98 (0.95, 1.00)	4.96 (1.79, 10.94)	0.98 (0.95, 0.99)	7.33 (2.32, 12.44)

Where Q1 and Q3 represent the first and third quartiles, respectively. BN, bin number; BW, bin width; CV, coefficient of variation; FLAIR, fluid-attenuated inversion recovery; ICC, interclass correlation coefficient; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Figure 2 illustrates the overall trends associated with different preprocessing strategies. Regarding resampling, the proportion of stable features decreased markedly when the voxel size exceeded 2 mm, with the decline being more pronounced for BW than for BN discretization. Within the range of 0.5–2 mm, BW showed greater fluctuation in feature stability compared to BN. As for normalization, the use or absence of Z-score normalization had minimal impact when combined with BN discretization. Without Z-score normalization, BN and BW yielded similar proportions of stable features at voxel sizes less than or equal to 2 mm. However, when normalization was applied, BN generally outperformed BW across all voxel sizes, particularly for T1w and T2w sequences.

Figure 2 Test-retest repeatability of radiomics features under different MRI sequences, preprocessing settings, and normalization approaches. BN, bin number; BW, bin width; FLAIR, fluid-attenuated inversion recovery; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Overall, increasing BN in BN discretization improved feature stability, while decreasing BW in BW discretization tended to increase the proportion of stable features.

Influence of image preprocessing on longitudinal repeatability

The impacts of preprocessing strategies on the longitudinal repeatability of first-order and textural radiomics features are summarized in Table 4 and Figure S2. For T1w images, BN discretization without Z-score normalization achieved the best performance for first-order features, with a median ICC of 0.94 (0.88, 0.97) and a CV of 15.50% (10.18%, 22.01%). For textural features, BN discretization, with or without normalization, yielded similarly high repeatability. T2w and FLAIR sequences exhibited comparable patterns: Z-score normalization with BN discretization resulted in the best performance for first-order features, while BN discretization, regardless of normalization, was optimal for textural features.

Table 4

Median (Q1, Q3) values of radiomics features repeatability metrics (ICC and CV) under different preprocessing strategies for longitudinal analysis across MRI sequences

Preprocessing strategy	Feature type	T1w ICC	T1w CV (%)	T2w ICC	T2w CV (%)	FLAIR ICC	FALIR CV (%)
Z-score + BN	First-order	0.93 (0.88, 0.97)	16.89 (12.00, 26.77)	0.95 (0.90, 0.97)	10.58 (6.46, 16.95)	0.97 (0.94, 0.98)	10.48 (8.48, 17.32)
Z-score + BN	Textural	0.94 (0.86, 0.98)	14.89 (7.22, 24.60)	0.93 (0.85, 0.97)	10.81 (4.40, 18.22)	0.93 (0.85, 0.97)	15.15 (6.54, 28.22)
Z-Score + BW	First-order	0.93 (0.86, 0.96)	17.17 (12.63, 26.78)	0.95 (0.91, 0.97)	10.66 (6.49, 17.03)	0.97 (0.94, 0.98)	10.54 (8.63, 18.15)
Z-Score + BW	Textural	0.88 (0.74, 0.96)	25.40 (12.59, 38.80)	0.91 (0.78, 0.97)	14.58 (6.16, 25.52)	0.96 (0.86, 0.99)	17.17 (9.28, 33.21)
Non-Z-score + BN	First-order	0.94 (0.88, 0.97)	15.50 (10.18, 22.01)	0.94 (0.90, 0.97)	11.04 (7.31, 16.96)	0.97 (0.94, 0.98)	12.26 (7.45, 17.85)
Non-Z-score + BN	Textural	0.94 (0.86, 0.98)	14.94 (7.22, 24.61)	0.93 (0.85, 0.97)	10.84 (4.39, 18.30)	0.93 (0.85, 0.97)	15.20 (6.53, 28.21)
Non-Z-score + BW	First-order	0.94 (0.88, 0.97)	15.50 (10.47, 22.10)	0.95 (0.92, 0.98)	11.03 (7.04, 17.29)	0.97 (0.94, 0.98)	13.15 (7.52, 18.69)
Non-Z-score + BW	Textural	0.92 (0.79, 0.98)	19.22 (8.39, 31.35)	0.93 (0.84, 0.98)	11.79 (4.28, 23.88)	0.96 (0.88, 0.99)	15.18 (7.11, 33.84)

Where Q1 and Q3 represent the first and third quartiles, respectively. BN, bin number; BW, bin width; CV, coefficient of variation; FLAIR, fluid-attenuated inversion recovery; ICC, interclass correlation coefficient; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Figure 3 illustrates the influence of preprocessing settings on longitudinal repeatability. Feature stability showed varying sensitivity to voxel size across different sequences. For T1w images, the proportion of stable feature fluctuated slightly between 0.5 and 1.75 mm, increased slightly between 1.75 and 2 mm, and then dropped sharply beyond 2 mm, with the poorest results at 3 mm. In T2w images, the trend was inconsistent, indicating greater sensitivity to resampling. For FLAIR images, feature stability was generally maintained or showed a mild decrease as voxel size increased, with minor fluctuations.

Figure 3 Longitudinal repeatability of radiomics features under different MRI sequences, preprocessing settings, and normalization approaches. BN, bin number; BW, bin width; FLAIR, fluid-attenuated inversion recovery; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Regarding intensity normalization, the observed trends were largely consistent with those in the test-retest analysis. BN discretization remained relatively unaffected by normalization status across all sequences, while BW discretization introduced more variability, especially in T1w and T2w images. Overall, the performance patterns of BN and BW in longitudinal analysis were comparable to those in the test-retest setting.

Influence of image preprocessing on inter-platform reproducibility

Table 5 and Figure S3 summarize the effects of different preprocessing combinations on inter-platform reproducibility of radiomics features. For T1w images, Z-score normalization combined with BN discretization yielded the highest proportion of stable first-order features. For textural features, BN discretization performed comparably well with or without normalization. In T2w images, BW discretization without Z-score normalization yielded better stability for both first-order and textural features. For FLAIR images, BN discretization without normalization produced greater stability for first-order features, while textural features showed similar reproducibility under both BN with normalization and BN alone.

Table 5

Median (Q1, Q3) values of radiomics features reproducibility metrics (ICC and CV) under different preprocessing strategies for inter-platform analysis across MRI sequences

Preprocessing strategy	Feature type	T1w ICC	T1w CV (%)	T2w ICC	T2w CV (%)	FLAIR ICC	FLAIR CV (%)
Z-Score + BN	First-order	0.94 (0.79, 0.97)	12.48 (5.14, 21.02)	0.92 (0.76, 0.97)	14.07 (4.55, 21.26)	0.96 (0.88, 0.98)	9.31 (4.40, 18.62)
Z-Score + BN	Textural	0.95 (0.83, 0.98)	11.50 (4.59, 22.03)	0.92 (0.79, 0.97)	11.90 (3.97, 22.22)	0.95 (0.84, 0.99)	8.24 (2.60, 16.27)
Z-Score + BW	First-order	0.93 (0.78, 0.97)	12.98 (6.10, 21.25)	0.92 (0.76, 0.97)	14.51 (5.15, 21.79)	0.96 (0.88, 0.98)	9.60 (4.60, 18.62)
Z-Score + BW	Textural	0.89 (0.66, 0.97)	15.52 (6.48, 27.94)	0.89 (0.64, 0.97)	15.59 (5.53, 28.57)	0.96 (0.83, 0.99)	9.51 (3.54, 17.30)
Non-Z-score + BN	First-order	0.94 (0.79, 0.98)	13.90 (7.54, 20.28)	0.93 (0.78, 0.97)	13.35 (7.25, 19.71)	0.97 (0.89, 0.98)	8.18 (5.31, 14.16)
Non-Z-score + BN	Textural	0.95 (0.83, 0.98)	11.47 (4.55, 22.00)	0.92 (0.79, 0.97)	11.88 (3.96, 22.22)	0.95 (0.83, 0.99)	8.26 (2.56, 16.28)
Non-Z-score + BW	First-order	0.93 (0.79, 0.98)	13.90 (7.88, 20.28)	0.93 (0.80, 0.97)	13.05 (7.11, 19.89)	0.97 (0.88, 0.98)	8.99 (5.44, 14.33)
Non-Z-score + BW	Textural	0.92 (0.71, 0.98)	12.98 (3.81, 28.61)	0.93 (0.72, 0.98)	11.48 (3.17, 25.84)	0.97 (0.86, 0.99)	8.56 (2.66, 16.35)

Where Q1 and Q3 represent the first and third quartiles, respectively. BN, bin number; BW, bin width; CV, coefficient of variation; FLAIR, fluid-attenuated inversion recovery; ICC, interclass correlation coefficient; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Figure 4 illustrates the trends in feature reproducibility under different preprocessing settings. In terms of resampling, the effect of voxel size varied by sequence. In T1w images, BW discretization led to relatively consistent feature stability with minor fluctuations. Whereas BN discretization showed greater variability beyond 1.25 mm, with optimal results at 1.5 mm. For T2w and FLAIR images, the proportion of stable features declined gradually as voxel size increased. Regarding normalization, Z-score normalization had a minimal effect on BN discretization across all sequences. For BW discretization, normalization had little impact on FLAIR images, but reduced reproducibility in T1w and T2w images. Overall, the patterns observed for BN and BW discretization were consistent with those found in the test-retest analysis: increasing the number of bins improved feature stability, while increasing BW generally reduced it.

Figure 4 Inter-platform reproducibility of radiomics features under different MRI sequences, preprocessing settings, and normalization approaches. BN, bin number; BW, bin width; FLAIR, fluid-attenuated inversion recovery; MRI, magnetic resonance imaging; T1w, T1-weighted; T2w, T2-weighted.

Discussion

The repeatability and reproducibility of radiomics features are crucial for their reliable use as imaging biomarkers. This study systematically evaluated the influence of key preprocessing parameters on feature stability using the open-source Pyradiomics platform and three commonly used MRI sequences (T1w, T2w, FLAIR) acquired from two 1.5T MR-Linac systems with a standardized ACR phantom. The preprocessing pipeline incorporated voxel resampling, intensity normalization, and intensity discretization using both BN and BW approaches. Feature repeatability was assessed through test-retest and longitudinal scans, while reproducibility was evaluated via inter-platform comparisons. Our results demonstrate that preprocessing, particularly resampling and discretization, has a significant impact on the robustness of radiomics features. For example, voxel sizes greater than 2 mm may greatly reduce feature stability. The optimal preprocessing configurations varied across image sequences and evaluation tasks, highlighting the need for task-specific optimization in radiomics workflows.

Validating the repeatability of radiomics features derived from qualitative MRI sequences is essential for their clinical use in MR-guided adaptive radiotherapy. In this setting, daily T1w, T2w, or FLAIR images are routinely acquired as the primary imaging basis for patient setup correction, monitoring ROI changes during treatment, and timely adaptation of radiotherapy plans (34). These sequences are preferred because they exhibit relatively low geometric distortion and are well-suited for repeated daily acquisition, whereas diffusion-weighted imaging (DWI) is more susceptible to distortion and therefore less commonly used for this purpose (35). Consequently, observed feature variations should reflect true biological dynamics rather than technical fluctuations. Although phantom experiments in the present study do not replicate tissue contrast or biological heterogeneity, they allow isolation of non-biological sources of variability and the establishment of acceptable stability thresholds. These benchmarks help identify features that are technically reliable and suitable for clinical deployment, forming a necessary foundation for robust radiomics biomarkers in adaptive radiotherapy.

Resampling is recognized as a critical preprocessing step for improving the stability of radiomics features (36). Previous studies have recommended the use of isotropic voxels or uniform pixel spacing to enhance the robustness of texture features (30). Based on this, the present study evaluated multiple commonly used voxel sizes to identify appropriate values across different MRI sequences and analytical tasks. Our findings revealed no consistent trend in feature stability with increasing or decreasing voxel size. However, the most stable results were generally observed at voxel sizes below 2 mm, which is consistent with previous studies (37), though it contrasts slightly with the threshold of 3mm reported by Wichtmann et al. (11). This discrepancy may be attributed to differences in interpolation algorithms. While our study employed the sitkBSpline interpolation algorithm, other investigations have used nearest neighbor or linear interpolation methods. Prior evidence suggests that the choice of interpolation strategy can significantly affect radiomics feature values (37). These results suggest that future research and clinical protocols should consider the impact of interpolation methods when designing preprocessing pipelines for radiomics analysis.

Grayscale discretization is crucial for extracting texture features in MRI. Both relative discretization (BN) and absolute discretization (BW) can significantly affect feature values. However, the optimal choice between BN and BW remains a topic of ongoing debate. In this study, we systematically compared effects from different BN and BW settings across three sequences and three evaluation tasks. The results indicate that the optimal discretization strategy is highly dependent on both the image sequence and the analytical objective. Although some previous studies, such as the IBSI (30) and Dewi et al. (20), supported the use of BN, others, including Koçak et al. (38), recommended BW. Our findings suggest that both approaches can be effective under specific conditions, with neither showing consistent superiority across all scenarios. Generally, higher BN values tended to improve feature stability for certain sequences and tasks, whereas smaller BW values were more advantageous in others. Accordingly, the choice of discretization parameters needs to be adapted to the characteristics of the imaging data and the specific application to maximize feature robustness.

In this work, we employed Z-score normalization, a straightforward and commonly used method that is natively supported by the Pyradiomics platform. Our findings indicated that Z-score normalization had minimal impact on the stability of features derived using BN-based discretization, particularly for textural features, which is consistent with the observations reported by Carré et al. (39). In contrast, under BW-based discretization, Z-score normalization improved the stability of first-order features in certain settings but generally decreased the stability of textural features across most sequences and tasks. This observation is in line with the findings of Hoebel et al. (40), suggesting that the effects of normalization are highly dependent on both the discretization strategy and the feature type.

Compared to previous studies that primarily focused on phantom-based assessments of preprocessing effects or investigated test-retest repeatability using 3.0 T MRI systems, such as Wichtmann et al. (11), which considered only BN discretization and test-retest repeatability, our study extends these investigations in several important ways. First, we assessed both BN and BW methods, providing a broader evaluation of preprocessing strategies. Second, we examined not only test-retest repeatability but also longitudinal repeatability and inter-platform reproducibility, which are particularly relevant for radiotherapy and multi-center studies. Importantly, this work was conducted on MR-Linacs, whose unique hardware configuration differs from conventional MRI and may influence radiomics feature stability and preprocessing sensitivity. By including a standardized phantom design across all experiments, our study offers a more comprehensive analysis of feature reproducibility. Among these metrics, inter-platform reproducibility is especially critical for multi-center studies, reflecting the stability of radiomics features across different hardware systems and institutions, thereby enhancing generalizability for large-scale applications. Although the two systems were vendor-identical, variability in ICC values across preprocessing strategies suggests that feature stability is not solely determined by hardware similarity, but is substantially influenced by methodological choices in image processing. Longitudinal repeatability, which evaluates feature consistency over time, is particularly relevant in radiotherapy, where patients undergo multiple imaging sessions over several weeks. Reliable radiomics features are essential for monitoring treatment response, supporting adaptive radiotherapy, and guiding follow-up evaluations. When tracked over time, radiomics features, either alone or integrated with clinical parameters, have the potential to improve decision-making and optimize personalized treatment planning (41,42).

Our results show that the stability of radiomics features varied across T1w, T2w, and FLAIR images depending on the preprocessing configuration. These sequence-dependent differences likely reflect inherent variations in contrast mechanisms, intensity distributions, and noise characteristics, which influence how features respond to discretization, resampling, and intensity normalization. Similar observations (11,13,43) have been reported in both phantom- and patient-based studies, indicating that no single preprocessing strategy is optimal for all sequences. By highlighting these differences, our study emphasizes the importance of tailoring preprocessing strategies to each MRI sequence to improve the reproducibility and robustness of radiomics analyses in multi-sequence MRI workflows.

In this study, radiomics features were extracted from predefined ROIs rather than the entire phantom. Although whole-image analysis is feasible, it may incorporate background, edge artifacts, and material transitions that distort gray-level statistics. Clinically, radiomics aims to characterize localized pathology rather than global image properties. Therefore, consistent with recommendations from the Image Biomarker Standardization Initiative, an ROI-based strategy was adopted to better reflect clinical practice and avoid dilution of region-specific signals.

The lower proportion of stable features in the longitudinal dataset compared with test-retest likely reflects cumulative day-to-day variations, including minor scanner drift, signal fluctuations, and phantom repositioning. This highlights that longitudinal repeatability provides a more stringent assessment of radiomics robustness over clinically relevant time scales.

This study is limited by the use of Z-score normalization as the sole intensity normalization method. Although both Z-score and Nyul normalization are commonly applied in phantom studies, our analysis was conducted using the Pyradiomics platform, which does not directly support Nyul normalization. To maintain consistency and ensure reproducibility, we used the built-in normalization method. In addition to Nyul normalization (44), several other intensity normalization techniques have been proposed for brain MRI, such as WhiteStripe (45), fuzzy c-means-based, and Gaussian mixture model-based methods (46). However, as our study utilized a standardized ACR phantom without biological heterogeneity, these biologically driven methods were not evaluated. Future research should incorporate volunteer or clinical patient datasets to assess the impact of different standardization strategies under real-world conditions. Nevertheless, the present phantom-based study establishes a methodological benchmark for characterizing standardization-induced variability, thereby providing a solid foundation for the selection and validation of robust radiomics features in subsequent clinical applications.

Another limitation of this study is that it included only qualitative sequences (T1, T2, and FLAIR), without considering quantitative imaging modalities such as DWI or perfusion imaging, which are widely used in neuro-oncology. Unlike qualitative sequences that provide relative signal intensity, quantitative sequences offer absolute numerical values, enabling more consistent comparisons across subjects, scanners, and institutions. The impact of preprocessing steps such as intensity discretization and normalization may differ between qualitative and quantitative sequences. Therefore, future studies involving quantitative MRI modalities are needed to determine whether the observed trends hold and to further optimize preprocessing strategies for broader clinical applications.

This study focused on isolating the effects of voxel resampling, intensity normalization, and gray-level discretization on radiomics feature stability; therefore, bias field correction and explicit denoising were not incorporated to avoid introducing additional confounding factors. The phantom setting provides relatively uniform and stable signal characteristics, minimizing intrinsic intensity inhomogeneity. However, in clinical MRI, particularly at 1.5T, where signal-to-noise ratio (SNR) may be lower, noise and bias field effects could influence feature estimates, especially texture metrics. Future studies will evaluate these effects in patient data.

A further limitation of this study is that we only evaluated different MRI sequences (T1w, T2w, FLAIR) and preprocessing strategies, without assessing the effects of specific sequence parameters such as time of repetition, time of echo, or flip angle (10,47). While prior studies indicate that these parameters can affect feature stability, future work extending phantom experiments to volunteers or patients should examine their impact, especially for longitudinal radiomics.

Conclusions

In this study, we systematically investigated the influence of imaging preprocessing strategies on radiomics features derived from 1.5T MR-Linac systems, focusing on test-retest repeatability, longitudinal repeatability, and inter-platform reproducibility. These findings emphasize that the selection of preprocessing parameters is essential for maintaining the repeatability and reproducibility of radiomics features. Notably, the optimal preprocessing parameters varied depending on the image sequence and the specific evaluation task. These results suggest that task- and sequence-specific optimization of preprocessing strategies is necessary before clinical implementation. To support the robust application of radiomics in clinical practice, further comprehensive studies are needed, particularly those involving clinical datasets, to better understand and validate the repeatability and reproducibility of radiomics features under real-world conditions.

Acknowledgments

None.

Footnote

Funding: This work was supported by the National Natural Science Foundation of China (Nos. 12405390 and 12205209).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2151/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
Fournier L, Costaridou L, Bidaut L, Michoux N, Lecouvet FE, de Geus-Oei LF, et al. Incorporating radiomics into clinical trials: expert consensus endorsed by the European Society of Radiology on considerations for data-driven compared to biologically driven quantitative biomarkers. Eur Radiol 2021;31:6001-12. [Crossref] [PubMed]
Zhang YP, Zhang XY, Cheng YT, Li B, Teng XZ, Zhang J, Lam S, Zhou T, Ma ZR, Sheng JB, Tam VCW, Lee SWY, Ge H, Cai J. Artificial intelligence-driven radiomics study in cancer: the role of feature engineering and modeling. Mil Med Res 2023;10:22. [Crossref] [PubMed]
Truong NCD, Bangalore Yogananda CG, Wagner BC, Holcomb JM, Reddy D, Saadat N, Hatanpaa KJ, Patel TR, Fei B, Lee MD, Jain R, Bruce RJ, Pinho MC, Madhuranthakam AJ, Maldjian JA. Two-Stage Training Framework Using Multicontrast MRI Radiomics for IDH Mutation Status Prediction in Glioma. Radiol Artif Intell 2024;6:e230218. [Crossref] [PubMed]
Osapoetra LO, Dasgupta A, DiCenzo D, Fatima K, Quiaoit K, Saifuddin M, Karam I, Poon I, Husain Z, Tran WT, Sannachi L, Czarnota GJ. Quantitative US Delta Radiomics to Predict Radiation Response in Individuals with Head and Neck Squamous Cell Carcinoma. Radiol Imaging Cancer 2024;6:e230029. [Crossref] [PubMed]
Horvat N, Papanikolaou N, Koh DM. Radiomics Beyond the Hype: A Critical Evaluation Toward Oncologic Clinical Use. Radiol Artif Intell 2024;6:e230437. [Crossref] [PubMed]
Huang EP, O’Connor JPB, McShane LM, Giger ML, Lambin P, Kinahan PE, Siegel EL, Shankar LK. Criteria for the translation of radiomics into clinically useful tests. Nat Rev Clin Oncol 2023;20:69-82. [Crossref] [PubMed]
Rai R, Holloway LC, Brink C, Field M, Christiansen RL, Sun Y, Barton MB, Liney GP. Multicenter evaluation of MRI-based radiomic features: A phantom study. Med Phys 2020;47:3054-63. [Crossref] [PubMed]
Hertel A, Tharmaseelan H, Rotkopf LT, Nörenberg D, Riffel P, Nikolaou K, Weiss J, Bamberg F, Schoenberg SO, Froelich MF, Ayx I. Phantom-based radiomics feature test-retest stability analysis on photon-counting detector CT. Eur Radiol 2023;33:4905-14. [Crossref] [PubMed]
Bologna M, Tenconi C, Corino VDA, Annunziata G, Orlandi E, Calareso G, Pignoli E, Valdagni R, Mainardi LT, Rancati T. Repeatability and reproducibility of MRI-radiomic features: A phantom experiment on a 1.5 T scanner. Med Phys 2023;50:750-62. [Crossref] [PubMed]
Wichtmann BD, Harder FN, Weiss K, Schönberg SO, Attenberger UI, Alkadhi H, Pinto Dos Santos D, Baeßler B. Influence of Image Processing on Radiomic Features From Magnetic Resonance Imaging. Invest Radiol 2023;58:199-208. [Crossref] [PubMed]
Shafiq-Ul-Hassan M, Zhang GG, Latifi K, Ullah G, Hunt DC, Balagurunathan Y, Abdalah MA, Schabath MB, Goldgof DG, Mackin D, Court LE, Gillies RJ, Moros EG. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys 2017;44:1050-62. [Crossref] [PubMed]
Trojani V, Bassi MC, Verzellesi L, Bertolini M. Impact of Preprocessing Parameters in Medical Imaging-Based Radiomic Studies: A Systematic Review. Cancers (Basel) 2024.
Marfisi D, Tessa C, Marzi C, Del Meglio J, Linsalata S, Borgheresi R, Lilli A, Lazzarini R, Salvatori L, Vignali C, Barucci A, Mascalchi M, Casolo G, Diciotti S, Traino AC, Giannelli M. Image resampling and discretization effect on the estimate of myocardial radiomic features from T1 and T2 mapping in hypertrophic cardiomyopathy. Sci Rep 2022;12:10186. [Crossref] [PubMed]
Granziera C, Wuerfel J, Barkhof F, Calabrese M, De Stefano N, Enzinger C, Evangelou N, Filippi M, Geurts JJG, Reich DS, Rocca MA, Ropele S, Rovira À, Sati P, Toosy AT, Vrenken H, Gandini Wheeler-Kingshott CAM, Kappos LMAGNIMS Study Group. Quantitative magnetic resonance imaging towards clinical application in multiple sclerosis. Brain 2021;144:1296-311. [Crossref] [PubMed]
Mayerhoefer ME, Szomolanyi P, Jirak D, Materka A, Trattnig S. Effects of MRI acquisition parameter variations and protocol heterogeneity on the results of texture analysis and pattern discrimination: an application-oriented study. Med Phys 2009;36:1236-43. [Crossref] [PubMed]
Li Y, Ammari S, Balleyguier C, Lassau N, Chouzenoux E. Impact of Preprocessing and Harmonization Methods on the Removal of Scanner Effects in Brain MRI Radiomic Features. Cancers (Basel) 2021.
Khodabakhshi Z, Gabrys H, Wallimann P, Guckenberger M, Andratschke N, Tanadini-Lang S. Magnetic resonance imaging radiomic features stability in brain metastases: Impact of image preprocessing, image-, and feature-level harmonization. Phys Imaging Radiat Oncol 2024;30:100585. [Crossref] [PubMed]
Veiga-Canuto D, Fernández-Patón M, Cerdà Alberich L, Jiménez Pastor A, Gomis Maya A, Carot Sierra JM, Sangüesa Nebot C, Martínez de Las Heras B, Pötschger U, Taschner-Mandl S, Neri E, Cañete A, Ladenstein R, Hero B, Alberich-Bayarri Á, Martí-Bonmatí L. Reproducibility Analysis of Radiomic Features on T2-weighted MR Images after Processing and Segmentation Alterations in Neuroblastoma Tumors. Radiol Artif Intell 2024;6:e230208. [Crossref] [PubMed]
Dewi DEO, Sunoqrot MRS, Nketiah GA, Sandsmark E, Giskeødegård GF, Langørgen S, Bertilsson H, Elschot M, Bathen TF. The impact of pre-processing and disease characteristics on reproducibility of T2-weighted MRI radiomics features. MAGMA 2023;36:945-56. [Crossref] [PubMed]
Damulina A, Pirpamer L, Soellradl M, Sackl M, Tinauer C, Hofer E, Enzinger C, Gesierich B, Duering M, Ropele S, Schmidt R, Langkammer C. Cross-sectional and Longitudinal Assessment of Brain Iron Level in Alzheimer Disease Using 3-T MRI. Radiology 2020;296:619-26. [Crossref] [PubMed]
Jalalifar SA, Soliman H, Sahgal A, Sadeghi-Naini A. Automatic Assessment of Stereotactic Radiation Therapy Outcome in Brain Metastasis Using Longitudinal Segmentation on Serial MRI. IEEE J Biomed Health Inform 2023;27:2681-92. [Crossref] [PubMed]
Xie L, Das SR, Li Y, Wisse LEM, McGrew E, Lyu X, et al. A multi-cohort study of longitudinal and cross-sectional Alzheimer’s disease biomarkers in cognitively unimpaired older adults. Alzheimers Dement 2025;21:e14492. [Crossref] [PubMed]
Lagendijk JJ, Raaymakers BW, van Vulpen M. The magnetic resonance imaging-linac system. Semin Radiat Oncol 2014;24:207-9. [Crossref] [PubMed]
Tijssen RHN, Philippens MEP, Paulson ES, Glitzner M, Chugh B, Wetscherek A, Dubec M, Wang J, van der Heide UA. MRI commissioning of 1.5T MR-linac systems - a multi-institutional study. Radiother Oncol 2019;132:114-20. [Crossref] [PubMed]
Winkel D, Bol GH, Kroon PS, van Asselen B, Hackett SS, Werensteijn-Honingh AM, Intven MPW, Eppinga WSC, Tijssen RHN, Kerkmeijer LGW, de Boer HCJ, Mook S, Meijer GJ, Hes J, Willemsen-Bosman M, de Groot-van Breugel EN, Jürgenliemk-Schulz IM, Raaymakers BW. Adaptive radiotherapy: The Elekta Unity MR-linac concept. Clin Transl Radiat Oncol 2019;18:54-9. [Crossref] [PubMed]
Yu H, Tang B, Fu Y, Wei W, He Y, Dai G, Xiao Q. Quantifying the reproducibility and longitudinal repeatability of radiomics features in magnetic resonance Image-Guide accelerator Imaging: A phantom study. Eur J Radiol 2024;181:111735. [Crossref] [PubMed]
Woodings SJ, de Vries JHW, Kok JMG, Hackett SL, van Asselen B, Bluemink JJ, van Zijp HM, van Soest TL, Roberts DA, Lagendijk JJW, Raaymakers BW, Wolthaus JWH. Acceptance procedure for the linear accelerator component of the 1.5 T MRI-linac. J Appl Clin Med Phys 2021;22:45-59. [Crossref] [PubMed]
van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts HJWL. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-7. [Crossref] [PubMed]
Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020;295:328-38. [Crossref] [PubMed]
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15:155-63. [Crossref] [PubMed]
van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging-”how-to” guide and critical reflection. Insights Imaging 2020;11:91. [Crossref] [PubMed]
Berenguer R, Pastor-Juan MDR, Canales-Vázquez J, Castro-García M, Villas MV, Mansilla Legorburo F, Sabater S. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology 2018;288:407-15. [Crossref] [PubMed]
Christiansen RL, Johansen J, Zukauskaite R, Hansen CR, Bertelsen AS, Hansen O, Mahmood F, Brink C, Bernchou U. Accuracy of automatic structure propagation for daily magnetic resonance image-guided head and neck radiotherapy. Acta Oncol 2021;60:589-97. [Crossref] [PubMed]
Hasler SW, Kallehauge JF, Hansen RH, Samsøe E, Arp DT, Nissen HD, Edmund JM, Bernchou U, Mahmood F. Geometric distortions in clinical MRI sequences for radiotherapy: insights gained from a multicenter investigation. Acta Oncol 2023;62:1551-60. [Crossref] [PubMed]
Park SH, Lim H, Bae BK, Hahm MH, Chong GO, Jeong SY, Kim JC. Robustness of magnetic resonance radiomic features to pixel size resampling and interpolation in patients with cervical cancer. Cancer Imaging 2021;21:19. [Crossref] [PubMed]
Bleker J, Roest C, Yakar D, Huisman H, Kwee TC. The Effect of Image Resampling on the Performance of Radiomics-Based Artificial Intelligence in Multicenter Prostate MRI. J Magn Reson Imaging 2024;59:1800-6. [Crossref] [PubMed]
Koçak B, Yüzkan S, Mutlu S, Karagülle M, Kala A, Kadıoğlu M, Solak S, Sunman Ş, Temiz ZH, Ganiyusufoğlu AK. Influence of image preprocessing on the segmentation-based reproducibility of radiomic features: in vivo experiments on discretization and resampling parameters. Diagn Interv Radiol 2024;30:152-62. [Crossref] [PubMed]
Carré A, Klausner G, Edjlali M, Lerousseau M, Briend-Diop J, Sun R, Ammari S, Reuzé S, Alvarez Andres E, Estienne T, Niyoteka S, Battistella E, Vakalopoulou M, Dhermain F, Paragios N, Deutsch E, Oppenheim C, Pallud J, Robert C. Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics. Sci Rep 2020;10:12340. [Crossref] [PubMed]
Hoebel KV, Patel JB, Beers AL, Chang K, Singh P, Brown JM, Pinho MC, Batchelor TT, Gerstner ER, Rosen BR, Kalpathy-Cramer J. Radiomics Repeatability Pitfalls in a Scan-Rescan MRI Study of Glioblastoma. Radiol Artif Intell 2021;3:e190199. [Crossref] [PubMed]
Yu G, Zhang Z, Eresen A, Hou Q, Garcia EE, Yu Z, Abi-Jaoudeh N, Yaghmai V, Zhang Z. MRI radiomics to monitor therapeutic outcome of sorafenib plus IHA transcatheter NK cell combination therapy in hepatocellular carcinoma. J Transl Med 2024;22:76. [Crossref] [PubMed]
Wang Y, Zhang L, Jiang Y, Cheng X, He W, Yu H, Li X, Yang J, Yao G, Lu Z, Zhang Y, Yan S, Zhao F. Multiparametric magnetic resonance imaging (MRI)-based radiomics model explained by the Shapley Additive exPlanations (SHAP) method for predicting complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicenter retrospective study. Quant Imaging Med Surg 2024;14:4617-34.
Moradmand H, Aghamiri SMR, Ghaderi R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J Appl Clin Med Phys 2020;21:179-90. [Crossref] [PubMed]
Nyúl LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging 2000;19:143-50. [Crossref] [PubMed]
Shinohara RT, Sweeney EM, Goldsmith J, Shiee N, Mateen FJ, Calabresi PA, Jarso S, Pham DL, Reich DS, Crainiceanu CMAustralian Imaging Biomarkers Lifestyle Flagship Study of Ageing. Alzheimer’s Disease Neuroimaging Initiative. Statistical normalization techniques for magnetic resonance imaging. Neuroimage Clin 2014;6:9-19.
Reinhold JC, Dewey BE, Carass A, Prince JL. Evaluating the Impact of Intensity Normalization on MR Image Synthesis. Proc SPIE Int Soc Opt Eng 2019;10949:109493H. [Crossref] [PubMed]
Bologna M, Corino V, Mainardi L. Technical Note: Virtual phantom analyses for preprocessing evaluation and detection of a robust feature set for MRI-radiomics of the brain Med Phys. 2019;46:5116-5123. [Crossref] [PubMed]

Cite this article as: Yu H, Wei W, Fu Y, Tang B, Xiao Q, Zhang X, Dai G, He S. Influence of image preprocessing on reproducibility and longitudinal repeatability analysis of radiomics features in magnetic resonance image-guided accelerator imaging. Quant Imaging Med Surg 2026;16(5):350. doi: 10.21037/qims-2025-aw-2151

Influence of image preprocessing on reproducibility and longitudinal repeatability analysis of radiomics features in magnetic resonance image-guided accelerator imaging

Introduction

Methods

Data acquisition

Table 1

Image segmentation

Image preprocessing

Radiomics features

Statistical analysis

Results

General observations

Table 2

Influence of image preprocessing on test-retest repeatability

Table 3

Influence of image preprocessing on longitudinal repeatability

Table 4

Influence of image preprocessing on inter-platform reproducibility

Table 5

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share