MRI radiomics for diagnosing small BI-RADS 4 breast lesions: an interpretable model

Chaokang Han; Jiayue Chen; Minping Hong; Shuqi Chen; Yujie Ying; Jiahuan Liu; Fan Yang; Hua Qian; Xuewei Ding; Ruixin Zhang; Jinghan Wu; Louting Hu; Chengchen Xu; Xuejing Liu; Wangwei Lin; Changyu Zhou; Maosheng Xu; Zhen Fang

doi:10.21037/qims-24-1893

Original Article

MRI radiomics for diagnosing small BI-RADS 4 breast lesions: an interpretable model

Chaokang Han^1,2#, Jiayue Chen^1,2#, Minping Hong^3#, Shuqi Chen⁴, Yujie Ying^1,2, Jiahuan Liu⁵, Fan Yang^1,2, Hua Qian^1,2, Xuewei Ding^1,2, Ruixin Zhang^1,2, Jinghan Wu^1,2, Louting Hu^1,2, Chengchen Xu^2,6, Xuejing Liu⁷, Wangwei Lin^1,2, Changyu Zhou^1,2 , Maosheng Xu^1,2 , Zhen Fang^1,2

¹Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China; ²The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China; ³Department of Radiology, Jiaxing TCM Hospital Affiliated to Zhejiang Chinese Medical University, Jiaxing, China; ⁴The Second School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China; ⁵Department of Radiology, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; ⁶Department of Pathology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China; ⁷Department of Radiology, The Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine, Hangzhou, China

Contributions: (I) Conception and design: Z Fang, M Hong; (II) Administrative support: C Zhou, M Xu; (III) Provision of study materials or patients: Z Fang, W Lin; (IV) Collection and assembly of data: Z Fang, J Chen, C Han; (V) Data analysis and interpretation: C Han, S Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Zhen Fang, MD; Maosheng Xu, MD; Changyu Zhou, MD. Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), 54 Youdian Road, Hangzhou 310060, China; The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China. Email: fz@zcmu.edu.cn; xums166@zcmu.edu.cn; tophorizon@zcmu.edu.cn.

Background: The early detection of breast cancer is crucial. Magnetic resonance imaging (MRI) offers significant advantages in the diagnosis of lesions. We aimed to develop and validate an interpretable MRI-based radiomics model to identify small Breast Imaging Reporting and Data System (BI-RADS) category 4 lesions to help radiologists with decision making.

Methods: In total, 561 patients (with 580 small BI-RADS category 4 lesions) from two centers (The First Affiliated Hospital of Zhejiang Chinese Medical University and The Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine) were consecutively enrolled in this study, and the radiomics features of the intratumoral and peritumoral (3 mm) regions were extracted. After a series of feature selections, extreme gradient boosting (XGBoost) was used to construct the radiomics model, and the radiomics score (radscore) was calculated. Univariate and multivariate logistic regression analyses were performed to determine the pathological malignant-related clinico-radiological factors. Finally, a model was constructed that combined the radscore and clinico-radiological factors using logistic algorithms. Subsequently, our artificial intelligence (AI)-assisted strategy was validated in an external group (n=163), and its clinical utility was evaluated by measuring improvements in BI-RADS classification accuracy with AI support.

Results: The combined model demonstrated a robust predictive capability, and had area under the curve (AUC) values of 0.897 [95% confidence interval (CI): 0.862–0.931], 0.871 (95% CI: 0.803–0.934), and 0.869 (95% CI: 0.807–0.920) in the training, internal validation, and external validation groups, respectively. Additionally, the contribution of each feature to the radiomics and combined models was illustrated using the SHapley Additive exPlanations (SHAP) algorithm, a method for interpreting machine-learning models. Further, the AI-assisted strategy improved the two radiologists’ AUC values in the two modes (the 4b+ and 4c) significantly.

Conclusions: An interpretable combined model based on MRI was developed to distinguish between benign and malignant small BI-RADS4 lesions to assist radiologists to make more accurate diagnostic decisions.

Keywords: Breast cancer; radiomics; SHapley Additive exPlanations (SHAP); machine learning; Breast Imaging Reporting and Data System 4 (BI-RADS 4)

Submitted Sep 07, 2024. Accepted for publication Mar 17, 2025. Published online May 23, 2025.

doi: 10.21037/qims-24-1893

Introduction

The widespread use of modern imaging modalities has led to an increase in the detection of small breast lesions. However, current clinical practices often neglect the management and diagnosis of small breast lesions, and as a result, most breast cancer patients are diagnosed at advanced stages (1). This is concerning because the early detection of primary tumor 1 stage small breast cancer (a maximum diameter ≤20 mm) is critical for achieving the highest likelihood of curability (2,3). Moreover, the 5-year relative survival rate for primary tumor 1, node 0, metastasis 0 stage breast cancer is between 97% and 100% (4,5). Therefore, the early diagnosis of malignant small lesions is crucial for improving clinical outcomes.

The probability of malignancy in Breast Imaging Reporting and Data System (BI-RADS) category 4 lesions ranges from 2% to 95% (6). The accurate diagnosis of benign and malignant breast lesions, particularly those classified as BI-RADS category 4, remains a significant challenge in clinical practice. Magnetic resonance imaging (MRI), especially dynamic contrast-enhanced MRI (DCE-MRI), which simultaneously assesses lesion morphology and enhancement kinetics, has demonstrated high sensitivity in detecting breast cancer (7,8). However, its diagnostic accuracy for small lesions is limited by their subtle morphological characteristics and atypical hemodynamic features (9,10).

Diffusion-weighted imaging (DWI) and the apparent diffusion coefficient (ADC), which can measure the mobility of water molecules in tissues, have been shown to increase the diagnostic accuracy of DCE-MRI for malignant lesions (11-13). However, due to the limited volume of small breast lesions, DWI and the ADC may not provide accurate assessments (14). These limitations further diminish the diagnostic accuracy of MRI for small lesions classified as BI-RADS category 4. Therefore, developing a tool that can effectively differentiate between benign and malignant small breast lesions is essential to enhance the diagnosis of BI-RADS category 4 lesions.

Radiomics, a high-throughput method for feature extraction from medical image data, can extract information that cannot be obtained by the naked eye (15). Numerous studies have shown the utility of radiomics in diagnosing breast diseases (16-18). However, previous radiomics research has primarily focused on broad breast cancer populations or BI-RADS category 4 lesions in general, without specifically addressing small BI-RADS category 4 lesions, which present unique diagnostic challenges. In clinical practice, small lesions present unique diagnostic challenges due to their subtle morphological characteristics and atypical hemodynamic features, which are often difficult to characterize using conventional imaging techniques (10,19).

This study aimed to develop and validate an artificial intelligence (AI) model that incorporates intra- and peritumoral DCE radiomics features and clinico-radiological factors to predict the malignancy of small BI-RADS category 4 lesions. Further, we compared the classification accuracy of radiologists before and after AI assistance to investigate whether the diagnostic performance of radiologists could be improved with the support of the AI model. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1893/rc).

Methods

Patient enrollment and data collection

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The First Affiliated Hospital of Zhejiang Chinese Medical University (No. 2024-KLS-158-01), and the requirement of individual consent for this retrospective analysis was waived. Both participating institutions were informed of and approved to the study.

The data of 417 and 163 BI-RADS category 4 small breast lesions from The First Affiliated Hospital of Zhejiang Chinese Medical University (Center I) and The Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine (Center II), respectively, collected from September 2021 to September 2023, were retrospectively analyzed. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had incomplete clinical data; (II) had MRI images showing that the maximum diameter of the breast mass was > 2 cm; (III) had incomplete pathological data; (IV) had incomplete DCE-MRI phase II sequence images or the image quality that did not meet the analysis requirements; and (V) had previously undergone breast puncture, surgery, radiotherapy, chemotherapy, or hormone therapy.

Benign and malignant lesions were classified based on the pathological diagnosis benchmark. Lesions exhibiting invasive components or ductal carcinomas in situ were categorized as malignant, while those lacking such features were categorized as benign. The lesions from Center I were stratified and divided into the training and internal validation groups at a ratio of 7:3 to ensure that the proportion of benign and malignant patients with Bl-RADS category 4 lesions remained consistent between the two groups. The lesions from Center II were selected as the external validation group.

MRI acquisition

The radiomics workflow is shown in Figure 1. Breast MRI was acquired with a 1.5-T system (Avanto, Siemens Healthcare; scanner 1) or 3.0-T system (Verio, Siemens Healthcare, Erlangen, Germany; scanner 2) at Center I, and with a 3.0-T system (Verio; scanner 3) at Center II. Further, a breast-specific coil with 16 channels was used in all patients. The sequence parameters for fat-suppressed DCE T1-weighted imaging are detailed in Table S1. Gadopentetate dimeglumine (Beijing Beilu Pharmaceutical Co., Ltd., Beijing, China) was used as the pre-contrast agent and was administered intravenously at a dosage of 0.1 mmol/kg followed by a 15-mL saline flush at a rate of 2.0 mL/sec. Five post-contrast series were then acquired.

Figure 1 Flowchart of radiomics analysis for small BI-RADS category 4 lesions. Center I: The First Affiliated Hospital of Zhejiang Chinese Medical University; Center II: The Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine. AI, artificial intelligence; AUC, area under the curve; BI-RADS, Breast Imaging Reporting and Data System; CI, confidence interval; DCA, decision curve analysis; LASSO, least absolute shrinkage, and selection operator; radscore, radiomics score; ROC, receiver operating characteristic; SHAP, SHapley Additive exPlanations; TIC, time-intensity curve; XGBoost, extreme gradient boosting.

Image pre-processing and tumor segmentation

The enrolled breast DCE-MRI images were exported from the Picture Archiving and Communication System in digital imaging and communications in medicine format. Given the distinct advantages of phase II DCE-MRI, including its superior ability to delineate lesion boundaries and high contrast resolution, phase II was selected for our analysis (20,21). Initially, the N4 bias field correction technique was applied to correct intensity inhomogeneity in phase II of the DCE-MRI images. Subsequently, all images were resampled to 1×1×1 mm³ (x, y, z) using a linear interpolation algorithm to standardize the voxel spacing. Further, each MRI image was normalized to ensure a standard normal distribution of image intensities.

The volume of interest (VOI) for the whole tumor was manually segmented slice-by-slice along the tumor margins on the DCE images by Radiologist A, who had 8 years of experience in breast diagnosis, using ITK-SNAP software (version 3.80; http://www.itksnap.org/). Based on the manually delineated intratumoral VOIs, the peritumoral VOI was obtained by subtracting the original VOI from the dilated VOI using Python (version 3.7). A 3-mm region surrounding the intratumoral VOI was defined as the peritumoral VOI, which was subsequently reviewed and adjusted to exclude areas involving the skin, chest wall, and air (22-25).

Radiomics feature extraction and normalization

Radiomics features, including shape, first-order, texture, wavelet, exponential, and square transform, were then extracted from the original tumoral and peritumoral VOIs using the Pyradiomics package (version 3.0.1; https://www.radiomics.io/pyradiomics.html). By evaluating all the radiomics features extracted using intra- and inter-class correlation (ICC) coefficients, we assessed the inter- and intra-observer consistency and repeatability. One month later, radiologists A and B (who had 15 years of breast diagnosis experience) repeated the identical processes of segmentation for 40 randomly chosen lesions. Features with an ICC coefficient >0.75 were selected to ensure the reliability of the extracted features (26). Moreover, Z-score normalization was performed on the training group, and the parameters calculated from the training group (i.e., the mean and standard deviation) were rigorously applied to standardize the internal and external validation groups. To visualize the differences in the radiomics features derived from the 1.5- and 3.0-T MRI scans, a scatter visualization analysis was performed using a principal component analysis.

Construction of the radiomics model

A three-step feature selection methodology was performed to select significant radiomics features based on intratumoral radiomics features and peritumoral radiomics features, respectively, in the training group. First, the Mann-Whitney U test was used to select the statistically different features (a P value <0.05) between the groups with pathologically confirmed benign and malignant tumors. Second, feature pairs with a |r|>0.6 were selected, and feature pairs with a higher mean absolute correlation were eliminated (24). Third, the least absolute shrinkage and selection operator (LASSO) logistics method with 10-fold cross-validation was used to select the predictive features most closely associated with malignancy in the training group. Finally, the LASSO logistics method was used again to select the combined intratumoral and peritumoral radiomics features. The combined intratumoral and peritumoral radiomics features were included in the extreme gradient boosting (XGBoost) analysis, and the predicted probability served as the radiomics score (radscore). The hyperparameter tuning of the XGBoost model was conducted using the grid search method, and the optimal parameters were determined based on the AUC performance metric using both the training and internal validation groups.

Construction of the clinico-radiological model and the combined model

Univariate and multivariate logistic regression analyses were performed on the candidate clinico-radiological factors [including tumor, age, position, size, and time-intensity curve (TIC)] to identify the pathological malignant-related factors (a P value <0.05). Subsequently, the significant clinico-radiological predictors were used to construct a clinical model. Finally, the combined-model nomogram was built using the clinico-radiological factors and the radscore.

Interpretability analysis of radiomics and combined models

SHapley Additive exPlanations (SHAP) offers a robust approach for assessing feature importance and addresses the interpretability challenges associated with machine-learning models. By calculating the contribution of each variable to the model’s predictions, SHAP provides a measure of additive feature attributions (27,28). Consequently, radiomics and combined models can be interpreted globally and locally using SHAP, thus helping radiologists to use them better.

Statistical analysis

The Mann-Whitney U test, Chi-squared test, or Fisher’s exact test was used to assess the difference in clinico-radiological factors between different groups. The model’s prediction performance was assessed using the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score. The SHAP technique was used individually to calculate the importance of each feature. The calibration of the combined model was evaluated by calibration curves using the “RMS” package. A decision curve analysis (DCA) was applied to assess the clinical usefulness of the combined model by quantifying the net benefit at different threshold probabilities using the “RMDA” package. A two-sided P value <0.05 indicated statistical significance. All the statistical analyses were performed using Python software (version 3.7) and R software (version 4.2.3).

Results

Clinico-radiological characteristics of the patients

A schematic of the analytic approach used in this study is presented in Figure 1. We used a retrospective cohort from two medical centers. In total, 580 lesions met the specified study criteria (Figure 2). The lesions from patients at Center I were randomized at a 7:3 ratio into the following two groups: the training group (n=292) and the internal validation group (n=125). The lesions from patients at Center II served as the external validation group (n=163). Table 1 shows the clinico-radiological characteristics across the three distinct groups (i.e., the training group, the internal validation group, and the external validation group). The TIC and age exhibited statistically significant differences between the benign and malignant groups in both the training and external validation groups. In the internal validation group, the tumor type, the TIC, and age also demonstrated statistically significant differences.

Figure 2 Flowchart of the inclusion and exclusion criteria. Center I: The First Affiliated Hospital of Zhejiang Chinese Medical University; Center II: The Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine. BI-RADS, Breast Imaging Reporting and Data System; DCE, dynamic contrast-enhanced; MRI, magnetic resonance imaging.

Table 1

Clinico-radiological characteristics across the three groups

Items	Training group (n=292)			Internal validation group (n=125)			External validation group (n=163)
Items	Benign (n=167)	Malignant (n=125)	P value	Benign (n=71)	Malignant (n=54)	P value	Benign (n=82)	Malignant (n=81)	P value
Tumor			0.986			0.043			0.344
Mass	143 (85.6)	108 (86.4)		57 (80.3)	51 (94.4)		63 (76.8)	68 (84.0)
No mass	24 (14.4)	17 (13.6)		14 (19.7)	3 (5.6)		19 (23.2)	13 (16.0)
Position			0.326			0.164			0.814
Left	84 (50.3)	71 (56.8)		39 (54.9)	22 (40.7)		40 (48.8)	42 (51.9)
Right	83 (49.7)	54 (43.2)		32 (45.1)	32 (59.3)		42 (51.2)	39 (48.1)
TIC			<0.001			0.010			<0.001
Inflow type	76 (45.5)	15 (12.0)		28 (39.4)	8 (14.8)		27 (32.9)	5 (6.2)
Outflow type	47 (28.1)	82 (65.6)		26 (36.6)	29 (53.7)		11 (13.4)	26 (32.1)
Platform type	44 (26.3)	28 (22.4)		17 (23.9)	17 (31.5)		44 (53.7)	50 (61.7)
Age (years)	44.0 (35.5–52.0)	50.0 (42.0–59.0)	<0.001	43.0 (36.0–52.5)	53.5 (46.0–57.0)	<0.001	39.0 (33.0–45.0)	51.0 (43.0– 60.0)	<0.001
Size (mm)	12.72±4.10	13.52±4.18	0.105	11.89±4.10	13.22±3.49	0.057	11.59±4.05	12.81±4.20	0.059

Data are presented as n (%), median (IQR), or mean ± SD. IQR, interquartile range; SD, standard deviation; TIC, time-intensity curve.

Construction and performance of radiomics model

A total of 944 radiomics features were extracted from the manually segmented intratumoral regions of interest (ROIs) and automatically segmented peritumoral ROIs. In total, 845 intratumoral features and 618 peritumoral features were identified as highly reproducible for further analysis based on the calculated ICC coefficients. After a coarse-to-fine feature screening process (Figure S1), 15 radiomics features were selected, including seven intratumoral features and eight peritumoral features. The contributions of these 15 features to the radiomics model (XGBoost) were visually represented using a SHAP bee-swarm plot (Figure 3A).

Figure 3 Model visualization. (A) Shapley summary diagram of the XGBoost model. A higher Shapley value (red) suggests a greater likelihood of malignant lesions; conversely, a lower Shapley value (blue) suggests a greater likelihood of benign lesions. (B) Development of the radiomics nomogram. The nomogram was constructed by combining the radscore, TIC, and age. Radscore, radiomics score; SHAP, SHapley Additive exPlanations; TIC, time-intensity curve; XGBoost, extreme gradient boosting.

Table S2 presents the comprehensive performance metrics of the radiomics model, which had AUC values of 0.850 [95% confidence interval (CI): 0.804–0.890], 0.807 (95% CI: 0.718–0.888), and 0.778 (95% CI: 0.699–0.844) in the training, internal validation, and external validation groups, respectively.

Construction and performance of the clinical model and the combined model

The correlated characteristics associated with pathological invasiveness were selected, and age and the TIC were identified as significant predictors of malignancy (Table S3) and included in the clinical model. Subsequently, a combined model (Figure 3B) was constructed by combining age, the TIC, and the radscore.

Table S2 presents the AUC, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score of each model, while Figure 4A-4C display the receiver operating characteristic (ROC) curves of the three models in the training, internal validation, and external validation groups, respectively. Table S4 presents comparisons of the predictive performance of the combined model and the other models. The combined model demonstrated superior performance, with AUCs of 0.897 (95% CI: 0.862–0.931) in the training group, 0.871 (95% CI: 0.803–0.934) in the internal validation group, and 0.869 (95% CI: 0.807–0.920) in the external validation group. The calibration curve of the combined model showed good agreement between the predicted and observed values in the three groups (Figure 4D-4F). The DCA showed that the nomogram (Figure 3B) had more net benefits in predicting benign and malignant breast lesions than the other models in the three groups (Figure 4G-4I). In addition, the subgroup analysis of tumor type, location, size, and TIC patterns (Figure 5A) examined the performance of the combined model across different patient categories. As Table S5 shows, the model had a higher AUC for lesions >10 mm (AUC: 0.907) than for smaller lesions (AUC: 0.819). This is likely due to the diagnostic challenges associated with smaller lesions, which often lack distinct features. While the model provides greater confidence for larger lesions, its solid performance for smaller lesions still offers valuable diagnostic support for challenging cases.

Figure 4 Performance of the prediction models. The ROC curve and DCA results of the radiomics model, clinical model, and combined model were compared for the differential diagnosis of breast lesions in the training group (A,G), the internal validation group (B,H), and the external validation group (C,I). The calibration curves of the combined model are shown for the training group (D), the internal validation group (E), and the external validation group (F). AUC, area under the curve; CI, confidence interval; DCA, decision curve analysis; ROC, receiver operating characteristic.

Figure 5 BI-RADS diagnostic performance of radiologists before and after AI assistance, and subgroup analysis. (A) The forest plot illustrates the ORs of the combined model for differentiating between breast lesions across key subgroups: lesion type (mass vs. non-mass), lesion location (left vs. right), lesion size (≤10 vs. >10 mm), and TIC type (inflow type, outflow type, and platform type). (B,D) Diagnostic performance of the two radiologists under BI-RADS 4b+ mode in the external validation group. (C,E) Diagnostic performance of the two radiologists under BI-RADS 4c mode in the external validation group. (F,G) Application analysis for two BI-RADS category 4 lesions. The red circles denote the manually delineated lesion contours using ITK-SNAP software for radiomics feature extraction. AI, artificial intelligence; AUC, area under the curve; BI-RADS, Breast Imaging Reporting and Data System; CI, confidence interval; OR, odds ratio; TIC, time-intensity curve.

A subgroup analysis (Figure S2) showed the 3.0-T group had a slightly higher AUC (AUC: 0.874; 95% CI: 0.797–0.938) than the 1.5-T group (AUC: 0.857; 95% CI: 0.649–1.000). This was likely due to the improved signal-to-noise ratio and spatial resolution. However, the difference was not statistically significant (DeLong’s test, P=0.517), demonstrating consistent model performance across different MRI field strengths (1.5- and 3.0-T) and confirming excellent field-strength stability.

AI-assisted analysis

Table 2 illustrates the AI-assisted decision adjustments made by two radiologists in the external validation group, and shows that more malignant lesions were upgraded (averaged 22%) and more benign lesions were downgraded (averaged 8%). Table 3 and Figure 5B,5C demonstrate that the AI strategy resulted in a statistically significant improvement (all P values <0.05) in AUC performance for radiologists A and B across both the 4b+ and 4c modes. The DCA results (Figure 5D,5E) indicated that AI-assisted decisions enhanced the net benefits for radiologists in both modes.

Table 2

AI-assisted changes made by junior and senior radiologists

Items	Adjustment	Benign lesions (n=82)	Malignant lesions (n=81)
Junior	No change	69	59
	Downgrade	8	3
	Upgrade	5	19
Senior	No change	75	61
	Downgrade	5	2
	Upgrade	2	18

Data are presented as number. Number of lesions that the radiologists altered from BI-RADS 4a to 4b (+) and 4b to 4c, or from BI-RADS 4c to 4b (−) and 4b to 4a using computer assistance. AI, artificial intelligence; BI-RADS, Breast Imaging Reporting and Data System.

Table 3

The diagnostic performance of the radiologists alone, and of the radiologists with AI assistance

Items	BI-RADS 4b+ mode		BI-RADS 4c mode
Items	AUC (95% CI)	P value^†	AUC (95% CI)	P value^†
Junior	0.736 (0.668–0.804)	–	0.655 (0.597–0.712)	–
Junior + AI	0.798 (0.737–0.859)	0.037*	0.747 (0.690–0.804)	<0.001*
Senior	0.804 (0.743–0.865)	–	0.698 (0.642–0.753)	–
Senior + AI	0.865 (0.813–0.917)	0.006*	0.784 (0.729–0.839)	0.001*

^†, P values for AUC are based on the DeLong’s test. *, P<0.05. AI, artificial intelligence; AUC, area under the curve; BI-RADS, Breast Imaging Reporting and Data System; CI, confidence interval.

Case analyses

We calculated both overall and individual Shapley values for the interpretation and clinical application of the combined model. The SHAP bee-swarm plot (Figure S3) revealed that the radscore, age, and outflow patterns were factors positively affecting the evaluation of the malignant lesions, of which, the radscore was the most influential factor. Figure 5F,5G presents two typical examples of correctly predicted malignant (Figure 5F) and benign (Figure 5G) lesions.

A case analysis was conducted of a 52-year-old female patient presenting with a left breast lesion, which was confirmed as invasive ductal carcinoma by postoperative pathology. The lesion had a maximum axial diameter of 11 mm, and the TIC exhibited a platform pattern. A retrospective analysis of the case showed that the radscore was 0.668, and the combined model predicted a malignant lesion (Figure 5G).

A case analysis was conducted of a 56-year-old female patient presenting with a right breast lesion, which was confirmed as hyperplasia of mammary glands by postoperative pathology. The lesion had a maximum axial diameter of 6 mm and exhibited an outflow pattern. A retrospective analysis of this case revealed a radscore of 0.139, suggesting a benign lesion.

Discussion

The accurate differentiation between benign and malignant small suspicious breast lesions remains a challenging task for radiologists but is crucial for optimizing patient management. This study developed a model to distinguish between benign and malignant breast lesions classified as BI-RADS category 4 and measuring <2 cm based on early phase DCE-MRI. A nomogram that incorporates the radscore alongside clinico-radiological factors demonstrated strong performance in discriminating between malignant and benign small lesions. We used Shapley to interpret the prediction process of the model, from the overall results down to the individual level. Ultimately, our AI-assisted analysis in the external validation group indicated that interpretable AI models can enhance radiologists’ accuracy in classifying these diagnostically challenging lesions.

The SHAP analysis revealed that the radscore, which consists of 15 radiomics features, played a significant role in the predictive model. Notably, four of the top 5 most important features were peritumoral features, indicating their potential as strong indicators of early breast cancer. This observation aligns with that of Jiang et al. (29), who also highlighted the importance of peritumoral features in assessing lymphovascular invasion in invasive breast cancer. The radiomics features of the peritumoral region may be linked to factors such as the blood and lymphatic plexus, immune infiltration, and stromal response (30-34), all of which could affect tumor development. Our finding shows the value of the peritumoral region in breast lesion prediction. This suggests that peritumoral features could be used to predict the progression of benign lesions to malignant lesions, and serve as a supplementary aspect of pathological reports. Further research needs to be conducted to explore these implications further.

The combined model, which integrates clinico-radiological factors and the radscore, demonstrated superior performance (DeLong’s test, all P<0.05) compared to the other models in both the training and internal validation groups. The combined model did not significantly outperform the clinical model in the external validation group (DeLong’s test P=0.051); however, this could be because the benign cases in this group were younger, resulting in more distinct clinical profiles compared to malignant cases, which improved classification performance. These findings underscore the complementary relationship between clinico-radiological factors and radiomics, and the importance of developing integrated models for lesion assessment.

MRI is known for its high sensitivity in screening breast lesions; however, its ability to visually differentiate between the morphological and kinetic characteristics of malignant tumors decreases as lesion volume decreases. Schlossbauer et al. (10) found that the score differences between benign and malignant lesions were less in lesions <1 cm, which led to difficulties in diagnosis. Our AI model showed significant potential in addressing diagnostic challenges, achieving an AUC of 0.819 for lesions <1 cm and an even higher AUC of 0.907 for lesions measuring 1–2 cm. Thus, our model could provide clinicians with increased confidence, especially when diagnosing challenging smaller lesions.

In clinical practice, the opaque nature of machine-learning models is often seen as a hindrance to the widespread adoption of AI (35). SHAP, a practical tool for interpreting machine-learning models, allows for the visualization of each feature’s contribution to predictions, thereby increasing the transparency of these models. This could help to increase radiologists’ confidence in using AI-based predictive models in their practice. In our research on AI-assisted diagnosis, we observed a notable increase in the ability of radiologists to detect malignant lesions when using AI assistance in BI-RADS 4b+ and 4c modes. The DCA results further showed that this improvement led to an increase in the net income. While the interpretability of the model is still in its preliminary stages, our findings suggest that AI-assisted strategies hold promise in enhancing clinical decision making.

Study limitations

First, the retrospective nature of this analysis might have introduced inherent biases, and the exclusive inclusion of surgical cases might have resulted in selection bias. Second, the optimal peritumoral distance might vary depending on tumor type, size, and biological behavior. Previous studies have adopted a ROI of 3 mm as the preferred region; however, this might not be generalizable to all tumor subtypes. Third, while our model showed promising results, its reliance on DCE sequences alone might have led to key features being overlooked. In our future research, we will incorporate multi-sequence data to improve diagnostic performance. Fourth, some clinico-radiological features in this study were semi-quantitative, and the results might be affected by the evaluator’s subjectivity.

Conclusions

We developed and validated an interpretable combined model that integrates DCE radiomics features with clinico-radiological factors using machine-learning techniques for the diagnosis of small BI-RADS category 4 lesions. The SHAP method serves as a bridge for personalized prediction, offering valuable insights that may assist radiologists to improve the diagnostic accuracy of BI-RADS category 4 small lesions. In the future, we intend to further improve the model’s performance by integrating additional imaging modalities and leveraging larger, multi-institutional datasets to enhance its generalizability and clinical utility.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1893/rc

Funding: The study was supported by grants from the Medical Science and Technology Project of Zhejiang Province (Nos. 2024KY1200, 2023KY338, 2024KY454, 2023KY873, and 2024KY131), the Zhejiang Basic Public Welfare Research Project (No. LTGY24H180007), the Natural Science Foundation of Zhejiang Province (No. LQN25H180008), and the Zhejiang Traditional Chinese Medicine Administration (Nos. 2024ZL475, 2024ZL1058, and 2024ZL448).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1893/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committee of The First Affiliated Hospital of Zhejiang Chinese Medical University (No. 2024-KLS-158-01), and the requirement of individual consent for this retrospective analysis was waived. Both participating institutions were informed of and approved to the study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Ashrafizadeh M, Zarrabi A, Bigham A, Taheriazam A, Saghari Y, Mirzaei S, Hashemi M, Hushmandi K, Karimi-Maleh H, Nazarzadeh Zare E, Sharifi E, Ertas YN, Rabiee N, Sethi G, Shen M. (Nano)platforms in breast cancer therapy: Drug/gene delivery, advanced nanocarriers and immunotherapy. Med Res Rev 2023;43:2115-76. [Crossref] [PubMed]
Colonna SV, Higgins AK, Alvarez J, Saville BR, Lawrence J, Abramson VG. Analysis of Risk of Recurrence by Subtype in ≤ 1-cm Breast Tumors. Clin Breast Cancer 2016;16:223-31. [Crossref] [PubMed]
Rojas MP, Telaro E, Russo A, Moschetti I, Coe L, Fossati R, Palli D, del Roselli TM, Liberati A. Follow-up strategies for women treated for early breast cancer. Cochrane Database Syst Rev 2005;CD001768. [Crossref] [PubMed]
Shen K, Yao L, Zhu J, Gu X, Wang J, Qian W, Zheng Z, Fu D, Wu S. Impact of adjuvant chemotherapy on T1N0M0 breast cancer patients: a propensity score matching study based on SEER database and external cohort. BMC Cancer 2022;22:863. [Crossref] [PubMed]
Houvenaeghel G, Goncalves A, Classe JM, Garbay JR, Giard S, Charytensky H, et al. Characteristics and clinical outcome of T1 breast cancer: a multicenter retrospective cohort study. Ann Oncol 2014;25:623-8. [Crossref] [PubMed]
Gradishar WJ, Moran MS, Abraham J, Aft R, Agnese D, Allison KH, et al. NCCN Guidelines® Insights: Breast Cancer, Version 4.2021. J Natl Compr Canc Netw 2021;19:484-93. [Crossref] [PubMed]
Jacobs MA, Barker PB, Bluemke DA, Maranto C, Arnold C, Herskovits EH, Bhujwalla Z. Benign and malignant breast lesions: diagnosis with multiparametric MR imaging. Radiology 2003;229:225-32. [Crossref] [PubMed]
Heywang-Köbrunner SH, Bick U, Bradley WG Jr, Boné B, Casselman J, Coulthard A, Fischer U, Müller-Schimpfle M, Oellinger H, Patt R, Teubner J, Friedrich M, Newstead G, Holland R, Schauer A, Sickles EA, Tabar L, Waisman J, Wernecke KD. International investigation of breast MRI: results of a multicentre study (11 sites) concerning diagnostic parameters for contrast-enhanced MRI based on 519 histopathologically correlated lesions. Eur Radiol 2001;11:531-46. [Crossref] [PubMed]
Kuhl CK, Mielcareck P, Klaschik S, Leutner C, Wardelmann E, Gieseke J, Schild HH. Dynamic breast MR imaging: are signal intensity time course data useful for differential diagnosis of enhancing lesions? Radiology 1999;211:101-10. [Crossref] [PubMed]
Schlossbauer T, Leinsinger G, Wismuller A, Lange O, Scherr M, Meyer-Baese A, Reiser M. Classification of small contrast enhancing breast lesions in dynamic magnetic resonance imaging using a combination of morphological criteria and dynamic analysis based on unsupervised vector-quantization. Invest Radiol 2008;43:56-64. [Crossref] [PubMed]
Partridge SC, Nissan N, Rahbar H, Kitsch AE, Sigmund EE. Diffusion-weighted breast MRI: Clinical applications and emerging techniques. J Magn Reson Imaging 2017;45:337-55. [Crossref] [PubMed]
Li X, Wang H, Gao J, Jiang L, Chen M. Quantitative apparent diffusion coefficient metrics for MRI-only suspicious breast lesions: any added clinical value? Quant Imaging Med Surg 2023;13:7092-104. [Crossref] [PubMed]
An Y, Mao G, Zheng S, Bu Y, Fang Z, Lin J, Zhou C. External validation of multiparametric magnetic resonance imaging-based decision rules for characterizing breast lesions and comparison to Kaiser score and breast imaging reporting and data system (BI-RADS) category. Quant Imaging Med Surg 2025;15:648-61. [Crossref] [PubMed]
Baltzer P, Mann RM, Iima M, Sigmund EE, Clauser P, Gilbert FJ, Martincich L, Partridge SC, Patterson A, Pinker K, Thibault F, Camps-Herrero J, Le Bihan D. EUSOBI international Breast Diffusion-Weighted Imaging working group. Diffusion-weighted imaging of the breast-a consensus and mission statement from the EUSOBI International Breast Diffusion-Weighted Imaging working group. Eur Radiol 2020;30:1436-50. [Crossref] [PubMed]
Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
Luo WQ, Huang QX, Huang XW, Hu HT, Zeng FQ, Wang W. Predicting Breast Cancer in Breast Imaging Reporting and Data System (BI-RADS) Ultrasound Category 4 or 5 Lesions: A Nomogram Combining Radiomics and BI-RADS. Sci Rep 2019;9:11921. [Crossref] [PubMed]
Zhou J, Zhang Y, Chang KT, Lee KE, Wang O, Li J, Lin Y, Pan Z, Chang P, Chow D, Wang M, Su MY. Diagnosis of Benign and Malignant Breast Lesions on DCE-MRI by Using Radiomics and Deep Learning With Consideration of Peritumor Tissue. J Magn Reson Imaging 2020;51:798-809. [Crossref] [PubMed]
Xu Z, Wang Y, Chen M, Zhang Q. Multi-region radiomics for artificially intelligent diagnosis of breast cancer using multimodal ultrasound. Comput Biol Med 2022;149:105920. [Crossref] [PubMed]
Meissnitzer M, Dershaw DD, Feigin K, Bernard-Davila B, Barra F, Morris EA. MRI appearance of invasive subcentimetre breast carcinoma: benign characteristics are common. Br J Radiol 2017;90:20170102. [Crossref] [PubMed]
Song D, Yang F, Zhang Y, Guo Y, Qu Y, Zhang X, Zhu Y, Cui S. Dynamic contrast-enhanced MRI radiomics nomogram for predicting axillary lymph node metastasis in breast cancer. Cancer Imaging 2022;22:17. [Crossref] [PubMed]
Militello C, Rundo L, Dimarco M, Orlando A, Woitek R, D'Angelo I, Russo G, Bartolotta TV. 3D DCE-MRI Radiomic Analysis for Malignant Lesion Prediction in Breast Cancer Patients. Acad Radiol 2022;29:830-40. [Crossref] [PubMed]
Zhang Z, Wan X, Lei X, Wu Y, Zhang J, Ai Y, Yu B, Liu X, Jin J, Xie C, Jin X. Intra- and peri-tumoral MRI radiomics features for preoperative lymph node metastasis prediction in early-stage cervical cancer. Insights Imaging 2023;14:65. [Crossref] [PubMed]
Wang S, Sun Y, Li R, Mao N, Li Q, Jiang T, Chen Q, Duan S, Xie H, Gu Y. Diagnostic performance of perilesional radiomics analysis of contrast-enhanced mammography for the differentiation of benign and malignant breast lesions. Eur Radiol 2022;32:639-49. [Crossref] [PubMed]
Braman N, Prasanna P, Whitney J, Singh S, Beig N, Etesami M, Bates DDB, Gallagher K, Bloch BN, Vulchi M, Turk P, Bera K, Abraham J, Sikov WM, Somlo G, Harris LN, Gilmore H, Plecha D, Varadan V, Madabhushi A. Association of Peritumoral Radiomics With Tumor Biology and Pathologic Response to Preoperative Targeted Therapy for HER2 (ERBB2)-Positive Breast Cancer. JAMA Netw Open 2019;2:e192561. [Crossref] [PubMed]
Zeng Q, Deng Y, Nan J, Zou Z, Yu T, Liu L. Delta dual‑region DCE-MRI radiomics from breast masses predicts axillary lymph node response after neoadjuvant therapy for breast cancer. BMC Cancer 2025;25:264. [Crossref] [PubMed]
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15:155-63. [Crossref] [PubMed]
Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 2021;137:104813. [Crossref] [PubMed]
Ma J, Bo Z, Zhao Z, Yang J, Yang Y, Li H, Yang Y, Wang J, Su Q, Wang J, Chen K, Yu Z, Wang Y, Chen G. Machine Learning to Predict the Response to Lenvatinib Combined with Transarterial Chemoembolization for Unresectable Hepatocellular Carcinoma. Cancers (Basel) 2023.
Jiang W, Meng R, Cheng Y, Wang H, Han T, Qu N, Yu T, Hou Y, Xu S. Intra- and Peritumoral Based Radiomics for Assessment of Lymphovascular Invasion in Invasive Breast Cancer. J Magn Reson Imaging 2024;59:613-25. [Crossref] [PubMed]
Christiansen A, Detmar M. Lymphangiogenesis and cancer. Genes Cancer 2011;2:1146-58. [Crossref] [PubMed]
Grivennikov SI, Greten FR, Karin M. Immunity, inflammation, and cancer. Cell 2010;140:883-99. [Crossref] [PubMed]
Pagès F, Galon J, Dieu-Nosjean MC, Tartour E, Sautès-Fridman C, Fridman WH. Immune infiltration in human tumors: a prognostic factor that should not be ignored. Oncogene 2010;29:1093-102. [Crossref] [PubMed]
Ocaña A, Diez-Gónzález L, Adrover E, Fernández-Aramburo A, Pandiella A, Amir E. Tumor-infiltrating lymphocytes in breast cancer: ready for prime time? J Clin Oncol 2015;33:1298-9. [Crossref] [PubMed]
Chan TS, Shaked Y, Tsai KK. Targeting the Interplay Between Cancer Fibroblasts, Mesenchymal Stem Cells, and Cancer Stem Cells in Desmoplastic Cancers. Front Oncol 2019;9:688. [Crossref] [PubMed]
Castelvecchi D. Can we open the black box of AI? Nature 2016;538:20-3. [Crossref] [PubMed]

Cite this article as: Han C, Chen J, Hong M, Chen S, Ying Y, Liu J, Yang F, Qian H, Ding X, Zhang R, Wu J, Hu L, Xu C, Liu X, Lin W, Zhou C, Xu M, Fang Z. MRI radiomics for diagnosing small BI-RADS 4 breast lesions: an interpretable model. Quant Imaging Med Surg 2025;15(6):5060-5072. doi: 10.21037/qims-24-1893

MRI radiomics for diagnosing small BI-RADS 4 breast lesions: an interpretable model

Introduction

Methods

Patient enrollment and data collection

MRI acquisition

Image pre-processing and tumor segmentation

Radiomics feature extraction and normalization

Construction of the radiomics model

Construction of the clinico-radiological model and the combined model

Interpretability analysis of radiomics and combined models

Statistical analysis

Results

Clinico-radiological characteristics of the patients

Table 1

Construction and performance of radiomics model

Construction and performance of the clinical model and the combined model

AI-assisted analysis

Table 2

Table 3

Case analyses

Discussion

Study limitations

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share