Novel study on the prediction of BI-RADS 4A positive lesions in mammography using deep learning technology and clinical factors

Rushan Ouyang; Tingting Liao; Yuting Yang; Xiaohui Lin; Xuhui Zhou; Jie Ma

doi:10.21037/qims-24-1075

Original Article

Novel study on the prediction of BI-RADS 4A positive lesions in mammography using deep learning technology and clinical factors

Rushan Ouyang¹, Tingting Liao², Yuting Yang², Xiaohui Lin², Xuhui Zhou^1*, Jie Ma^2*

¹Department of Radiology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen, China; ²Department of Radiology, Shenzhen People’s Hospital, Shenzhen, China

Contributions: (I) Conception and design: J Ma, R Ouyang; (II) Administrative support: J Ma, X Zhou; (III) Provision of study materials or patients: R Ouyang, T Liao, Y Yang, X Lin; (IV) Collection and assembly of data: R Ouyang, T Liao, X Lin; (V) Data analysis and interpretation: R Ouyang, X Lin; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^*These authors contributed equally to this work.

Correspondence to: Jie Ma, MD. Department of Radiology, Shenzhen People’s Hospital, 1017 Dongmen North Road, Luohu District, Shenzhen 518020, China. Email: cjr.majie@vip.163.com; Xuhui Zhou, MD. Department of Radiology, The Eighth Affiliated Hospital, Sun Yat-sen University, 3025 Shennan Middle Road, Futian Street, Futian District, Shenzhen 518033, China. Email: zhouxuh@mail.sysu.edu.cn.

Background: The classification of Breast Imaging Reporting and Data System (BI-RADS) category 4A lesions in mammography is complicated by subjective interpretations and unclear criteria, which can lead to potential misclassifications and unnecessary biopsies. Thus, more accurate assessment methods need to be developed. This study aimed to improve the classification prediction of BI-RADS 4A positive lesions in mammography by combining deep learning (DL) technology with relevant clinical factors.

Methods: A retrospective analysis of 590 patients diagnosed with BI-RADS 4A at Shenzhen People’s Hospital and Shenzhen Luohu People’s Hospital was conducted, and a multi-faceted approach was employed to construct a robust predictive model. The patients were divided into training, validation, and external validation sets. The classification results from a DL system applied to mammography were recorded, and data on relevant clinical factors were collected. Univariate and multivariate logistic regression analyses were performed to identify the independent predictive factors. A predictive model and nomogram integrating these factors were developed. Assessment metrics, such as the areas under the curve (AUCs), calibration curves, and a decision curve analysis (DCA), were employed to evaluate the diagnostic performance, calibration, and clinical net benefit of the model. External validation was conducted to assess the generalization ability of the model.

Results: Four independent predictive factors (i.e., age, nipple discharge, ultrasound BI-RADS assessment, and DL system classification results) were identified and included in the predictive model. The model showed commendable diagnostic performance with AUC values of 0.85, 0.82, and 0.84 for the training, validation, and external validation sets, respectively. There were no statistically significant differences in the AUCs of the predictive model between the training set, and the internal and external validation sets (P=0.543 and 0.842, respectively). The calibration curves showed excellent calibration in the training, validation, and external validation sets, indicating a minimal deviation between the predicted and actual positive risk probabilities (P=0.906, 0.890, and 0.769, respectively). The DCA results illustrated the clinical net benefit of the model for risk thresholds greater than 0.15 and less than 0.70 in both the internal validation and external validation sets.

Conclusions: Our predictive model, which incorporated age, nipple discharge, ultrasound BI-RADS assessment, and DL system classification results, emerged as a powerful tool for accurately predicting BI-RADS 4A positive lesions. Its application holds significant promise in helping radiologists enhance diagnostic precision and reduce unnecessary biopsies in BI-RADS 4A positive lesion cases.

Keywords: Breast cancer; mammography; artificial intelligence; deep learning (DL); ultrasound

Submitted May 29, 2024. Accepted for publication Oct 08, 2024. Published online Nov 27, 2024.

doi: 10.21037/qims-24-1075

Introduction

In mammography, Breast Imaging Reporting and Data System (BI-RADS) category 4A lesions primarily comprise anomalies that necessitate biopsy but have a comparatively low probability of malignancy (1,2). The American College of Radiology (ACR) 5th edition BI-RADS guidelines (3) specify distinct morphological calcification features for BI-RADS 4B and 4C lesions, but do not specify morphological calcification features for BI-RADS 4A lesions. Similarly, for mass characteristics, only specific, well-defined masses, such as partially circumscribed masses considered fibroadenomas on ultrasound, palpable isolated complex cysts, and suspicious abscesses, meet the criteria for BI-RADS 4A classification (4,5). Surprisingly, in a study (6) involving 125,447 patients undergoing mammography, approximately 55.6% of the patients diagnosed with BI-RADS 4 lesions were further subcategorized as 4A. Due to the diversity of atypical benign and malignant lesion features, the subjective interpretation of BI-RADS 4A lesion features by different radiologists, which often relies on experiential knowledge and lacks explicit criteria, introduces the potential for the misclassification of BI-RADS 4A lesions.

Under the ACR 5th edition BI-RADS guidelines (1), biopsy is the preferred management approach for BI-RADS 4A lesions. However, image-guided biopsy raises challenges due to its high cost, the stringent qualifications required for medical equipment and radiologists, and limitations related to finite healthcare resources. Consequently, some patients with BI-RADS 4A lesions may elect to forego biopsy despite a 10% risk of malignancy, which can lead to delayed interventions in some cases of breast cancer (7). In the distinctive healthcare landscape of China, some radiologists may categorize uncertain breast lesions as BI-RADS 4A to mitigate potential medical risks; however, this contributes to an increased false-positive rate of BI-RADS 4A lesions (8,9). Given that biopsy is an invasive procedure with complications such as bleeding and infection, and excessive biopsies contribute to elevated healthcare costs, the assessment of BI-RADS 4A lesions remains a formidable challenge for radiologists in clinical practice (10,11). Thus, a more accurate and objective approach for evaluating BI-RADS 4A lesions needs to be established.

Computer-aided diagnostic systems based on deep learning (DL) have shown promise in overcoming the inherent subjectivity of assessing BI-RADS 4A lesions by assisting radiologists to more accurately evaluate such lesions (12). However, most existing DL systems are constructed based solely on mammographic images, and integration with clinical data is lacking, resulting in disparities between these systems and real-world scenarios (13). This study sought to address this gap by constructing a predictive model based on the results of a DL system and clinical factors. This study aimed to predict lesions categorized by radiologists as BI-RADS 4A to increase the precision of diagnosing such lesions. The primary objectives of this study were to reduce the need for unnecessary biopsy procedures, increase the identification rate of malignant lesions, and establish a robust imaging basis for early intervention in breast cancer. This study introduced a novel methodology to enhance the evaluation of BI-RADS 4A lesions on mammography. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1075/rc).

Methods

Clinical data

Retrospective data were collected from January 2018 to July 2022 at Shenzhen People’s Hospital (467 patients) and from January 2019 to June 2021 at Shenzhen Luohu People’s Hospital (123 patients) from a total of 590 patients who underwent mammography for breast disease. To be eligible for inclusion in this study, the patients had to meet the following inclusion criteria: (I) have a classification of BI-RADS 4A as assessed in a mammography diagnostic report. The reports were evaluated by two radiologists; one with less than five years of experience and no specialized breast imaging training, and the other with over 20 years of experience and specialized training in breast imaging; (II) have complete breast ultrasound reports and clinical data available; (III) have complete bilateral breast craniocaudal (CC) and mediolateral oblique (MLO) images available; and (IV) have complete biopsy or surgical pathology results available.

Patients were excluded from the study if they met any of the following exclusion criteria. (I) had images that did not meet the quality requirements; and/or (II) had a history of breast augmentation, surgery, or trauma on either side of the breast. In total, 590 female patients aged 21–85 years (mean age: 44±10 years) were included in the study. The training set comprised 324 patients from Shenzhen People’s Hospital (January 2018 to July 2021), the validation set comprised 143 patients from Shenzhen People’s Hospital (August 2021 to July 2022), and the external validation set comprised 123 patients from Shenzhen Luohu People’s Hospital. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This study was approved by the Medical Ethics Committee of Shenzhen People’s Hospital (approval No. LL-KY-2021624). The other hospital was informed of and agreed with the study. The requirement of individual consent for this retrospective analysis was waived.

Image acquisition

Mammography images were acquired using German Siemens Mammoma Inspiration, Italian Giotto Image MD, and American GE Senographe Pristina full-field digital mammography machines. Standard CC and MLO images of both breasts were obtained using an automated compression system and the automatic exposure control mode. The mammography acquisition methods adhered to the standards of the Chinese Medical Association’s 2022 Mammography Database Construction and Quality Control Expert Consensus (14).

Image analysis and clinical data collection

The glandular tissue type of BI-RADS 4A patient images (classified as types a, b, c, and d) were recorded, with a and b types categorized as non-dense breasts, and c and d types categorized as dense breasts. Relevant data, including patient age, glandular tissue type, menopausal status, age at menarche, family history of breast cancer, oral contraceptive use, lactation history, nipple discharge, skin abnormalities, palpable masses, breast pain/tenderness, ultrasound BI-RADS assessment, and pathological results were collected. Additionally, the results of the DL system’s classification of the mammography images of the 467 patients were recorded.

Construction of the DL system

This study used the Mammomat artificial intelligence, version 3, a DL system jointly developed by Ping An Technology (Shenzhen) Co., Ltd. (China) and Shenzhen People’s Hospital. The system includes a fused DL model for lesion detection, segmentation, and classification (Figure 1). A model based on an improved U-Net architecture was used to segment the calcifications. This model effectively isolates calcifications from the background and prepares them for subsequent classifications. The fusion model integrates modules for mass detection, calcification segmentation, benign/malignant lesion classification, and BI-RADS categorization. Specifically, the mass-detection model (8) employs two distinct high-resolution deep detection and segmentation networks designed separately for ipsilateral and contralateral images. These networks work in tandem to detect lesions with the fused outputs providing the final detection results for masses and other abnormalities. The mass-detection network accounts for breast tissue symmetry and the geometric relationship relative to the nipple, enhancing the precision of abnormal lesion detection. The mass-detection model achieved an area under the curve (AUC) value of 0.950. Additionally, a Dense-Net convolutional neural network was layered onto the detection model to classify the masses, and achieved an AUC of 0.914 for benign/malignant classification.

Figure 1 Mammogram artificial intelligence interface for deep learning system. After adjusting the window width and level of the original DICOM image to obtain a reasonable display region, traditional image processing algorithms were used to remove artifacts and noise. Small 512×512-pixel image blocks were extracted from the entire image’s effective area. The Ipsilateral Dual-view Network and Bilateral Dual-view Network were used for mass detection and segmentation (blue area). The Dense-Net was adopted for mass classification (red rectangle). The final output shows a mass malignancy degree of 0.62, evaluated as BI-RADS 4A. The classic U-net segmentation model was applied for calcification segmentation (red area). The Res-net-34 model was used for calcification classification (red circle). The final output shows a calcification malignancy degree of 0.91, evaluated as BI-RADS 4B. BI-RADS, Breast Imaging Reporting and Data System.

For the calcification segmentation, a modified U-Net architecture was employed, incorporating group normalization modules after the convolutional layers to stabilize the training process. The U-Net segmentation model accurately delineated the contours of the calcifications in the images, and a 34-layer residual neural network (Res-Net-34) was used to classify each suspicious calcification identified by the U-Net output. During the development of this model, radiologists annotated each lesion area using specialized tools, marking the benign/malignant status and internal lesion characteristics. The cases were randomly divided into training, validation, and test sets at a ratio of 8:1:1 (8,988 images for training, 1,120 for validation, and 1,120 for testing). The classification thresholds for the mass and calcification models were set at 0.60 and 0.80, respectively. The final trained model demonstrated high performance in both mass and calcification classification with AUC values of 0.90 and 0.95, respectively. It should be noted that this model is currently a research tool and has not been developed into a commercial product.

Grouping

According to the European guidelines (15), lesions with uncertain malignant potential should be categorized as B3 lesions. This category includes atypical ductal hyperplasia, flat epithelial atypia lesions, lobular neoplasia (which include lobular carcinomas in situ and atypical lobular hyperplasia), papillary lesions, radial scars, and other miscellaneous entities such as fibroepithelial lesions, mucocele-like lesions, and apocrine adenoses. The risk of these B3 lesions being upgraded to malignancy on excision varies, as does the risk of subsequent in situ or invasive malignancies. Such lesions typically require puncture biopsy for further evaluation. Thus, in this study, intraductal papillary lesions, atypical ductal hyperplasia, intraductal papilloma, and breast cancers were categorized as pathologically positive, while other breast lesions were classified as pathologically negative. Based on the pathological results, the patients were divided into pathologically-positive and pathologically-negative groups.

Statistical analysis

The statistical analysis was performed using R language 4.2.1 (http://www.Rproject.org), SPSS 26.0, and MedCalc 15.6.1. Single- and multi-factor analyses were conducted on the training set. The Kolmogorov-Smirnov test was used to assess whether the measured data was normally distributed, and the data that conformed to a normal distribution were analyzed using the independent samples t test for the one-way analysis of variance (ANOVA), and are expressed as the mean ± standard deviation, while the data that did not conform to a normal distribution were analyzed using the Wilcoxon rank-sum test for the one-way ANOVA, and are expressed as the mean ± standard deviation. The Chi-squared test (χ²) was used for the categorical data. Logistic regression was employed for the multi-factor analysis to identify the independent predictive factors. The combined predictive model and individual predictive factors were evaluated using receiver operating characteristic (ROC) curves. Delong’s test was used to compare the AUCs of the model and individual factors. Visualization of the logistic regression model, calibration curves, and a decision curve analysis (DCA) were performed using R language and relevant software packages.

Results

General information

A total of 590 patients with BI-RADS 4A classification were included in this study, of whom 324 were included in the training set, 143 in the internal validation set, and 123 in the external validation set. Among the 590 patients, 160 had a positive pathology and 430 had a negative pathology. There were 553 cases of dense breast tissue, and 37 cases of non-dense breast tissue. The lesions included 328 masses, 213 calcifications, 36 asymmetries, and 13 distortions, totaling 377 non-calcified lesions. Based on the biopsy and surgical pathology results, the positive rate for BI-RADS 4A patients was 27.12% (160/590) with rates of 24.38% (79/324), 24.48% (35/143), and 37.40% (46/123) in the training, internal validation, and external validation sets, respectively (Table 1). There were no statistically significant differences in the clinical, pathological, and imaging characteristics between the training, internal, and external validation sets.

Table 1

General patient information

Category	Classification	Training set (n=324)	Internal validation set (n=143)	External validation set (n=123)
Pathologically negative (n=430)	Fibroadenoma	101 (31.17%)	29 (20.28%)	35 (28.46%)
	Breast adenosis	101 (31.17%)	70 (48.95%)	30 (24.39%)
	Benign phyllodes tumor	16 (4.94%)	2 (1.40%)	0 (0.00%)
	Inflammatory changes	13 (4.01%)	3 (2.10%)	9 (7.32%)
	Ductal dilatation	2 (0.62%)	1 (0.70%)	0 (0.00%)
	Tubular adenoma	2 (0.62%)	1 (0.70%)	0 (0.00%)
	Fibrocystic breast disease	1 (0.31%)	2 (1.40%)	0 (0.00%)
	Hamartoma	1 (0.31%)	1 (0.70%)	0 (0.00%)
	Breast cyst	2 (0.62%)	0 (0.00%)	0 (0.00%)
	Adenomyoepithelioma	2 (0.62%)	0 (0.00%)	1 (0.81%)
	Adipose tumor	1 (0.31%)	0 (0.00%)	0 (0.00%)
	Ductal hyperplasia	1 (0.31%)	0 (0.00%)	0 (0.00%)
	Normal tissue	1 (0.31%)	0 (0.00%)	2 (1.63%)
Pathologically positive (n=160)	Invasive ductal carcinoma	25 (7.72%)	10 (6.99%)	24 (19.51%)
	Papillary lesion	19 (5.86%)	9 (6.29%)	8 (6.50%)
	Ductal carcinoma in situ	20 (6.17%)	7 (4.90%)	11 (8.94%)
	Atypical ductal hyperplasia	7 (2.16%)	3 (2.10%)	0 (0.00%)
	Borderline phyllodes tumor	4 (1.23%)	1 (0.70%)	0 (0.00%)
	Lobular carcinoma in situ	2 (0.62%)	2 (1.40%)	0 (0.00%)
	Invasive special type carcinoma	2 (0.62%)	2 (1.40%)	3 (2.44%)
	Malignant phyllodes tumor	1 (0.31%)	0 (0.00%)	0 (0.00%)
Mammography features (n=590)	Mass	190 (58.64%)	68 (47.55%)	70 (56.91%)
	Calcification	117 (36.11%)	66 (46.15%)	30 (24.39%)
	Architectural distortion	3 (0.93%)	6 (4.20%)	4 (3.25%)
	Asymmetry	14 (4.32%)	3 (2.10%)	19 (15.45%)
Mammography manufacturers (n=590)	German Siemens Mammomat Inspiration	178 (54.94%)	80 (55.94%)	48 (39.02%)
	Italian Giotto Image MD	97 (29.94%)	42 (29.37%)	75 (60.98%)
	American GE Senographe Pristina	49 (15.12%)	21 (14.69%)	0 (0.00%)

Univariate analysis of the pathologically positive and pathologically negative groups in the training set

In total, 13 variables, including age, glandular tissue type, menopausal status, age at menarche, family history of breast cancer, oral contraceptive use, lactation history, nipple discharge, skin abnormalities, palpable masses, breast pain/tenderness, ultrasound BI-RADS assessment, and DL system classification results, were included in the univariate logistic regression analysis. (The measured age data did not conform to a normal distribution and are expressed as Z values; other factors are count data and are expressed as χ² values). The analysis revealed statistically significant differences between the pathologically positive and pathologically negative groups in terms of age, family history of breast cancer, nipple discharge, ultrasound BI-RADS assessment, and DL system classification results (P<0.05) (Table 2).

Table 2

Univariate analysis between pathologically positive and negative groups in the training set

Group	Pathologically positive (n=80)	Pathologically negative (n=244)	Z/χ² value	P value
Age (years)	48.08±10.53	43.77±9.72	3.02	0.003
Glandular type			0.00	>0.999
Non-dense	3	9
Dense	77	235
Menopausal status			0.74	0.391
Pre-menopausal	53	174
Post-menopausal	27	70
Age at first menstruation			1.37	0.242
≤12 years	14	58
>12 years	66	186
Family history of breast cancer			12.67	<0.001
No	59	219
Yes	21	25
Oral contraceptives			0.06	0.813
No	68	210
Yes	12	34
Breastfeeding history			0.01	0.944
No	42	127
Yes	38	117
Nipple discharge			35.04	<0.001
No	53	226
Yes	27	18
Skin abnormalities			0.00	>0.999
No	79	241
Yes	1	3
Palpable mass on examination			1.76	0.185
No	23	90
Yes	57	154
Breast pain/tenderness			0.08	0.783
No	73	225
Yes	7	19
Ultrasound BI-RADS assessment			30.68	<0.001
BI-RADS 2–3	20	148
BI-RADS 4–5	60	96
Deep learning system classification results			30.90	<0.001
Positive	64	108
Negative	16	136

Data are presented as mean ± standard deviation or number. BI-RADS, Breast Imaging Reporting and Data System.

Multivariate logistic regression analysis and establishment of a joint prediction model for the pathologically positive and pathologically negative groups in the training set

The variables that were determined to have statistically significant differences in the univariate analysis were included in the multivariate logistic regression analysis. In this study, age was included as a continuous variable without subgrouping, as the age of breast cancer onset varies internationally, and a specific critical value was not initially identified. The following four variables were included in the predictive model as independent predictive factors: age [odds ratio (OR): 1.06, 95% confidence interval (CI): 1.02–1.09, P=0.001], nipple discharge (OR: 7.57, 95% CI: 3.44–16.68, P<0.001), ultrasound BI-RADS assessment (OR: 5.77, 95% CI: 2.99–11.13, P<0.001), and DL system classification results (OR: 6.58, 95% CI: 3.25–13.33, P<0.001) (Table 3). The maximum Youden index for age was 0.22, which corresponded to an age of 42 years. The final predictive model for positive BI-RADS 4A lesions was logit (P) = −6.222 + 0.055 × age + 2.024 × nipple discharge + 1.752 × ultrasound + 1.884 × DL system classification results. Figure 2 shows a graph of the visualized model.

Table 3

Multi-factorial analysis between pathologically positive and negative groups in the training set

Factor	β	Wald c²	OR (95% CI)	P value
Age	0.055	10.88	1.06 (1.02–1.09)	0.001
Nipple discharge	2.024	25.21	7.57 (3.44–16.68)	<0.001
Ultrasound BI-RADS assessment	1.752	27.23	5.77 (2.99–11.13)	<0.001
Deep learning system classification results	1.884	27.41	6.58 (3.25–13.33)	<0.001
Constant	–6.222	43.92	0.00	<0.001

OR, odds ratio; CI, confidence interval; BI-RADS, Breast Imaging Reporting and Data System.

Figure 2 Nomogram of the constructed predictive model for positive BI-RADS 4A lesions assessment and validation of a predictive model for breast lesions. BI-RADS, Breast Imaging Reporting and Data System.

Evaluation and validation of predictive model performance

The ROC curve analysis of the training set showed that the model for predicting breast lesions requiring biopsy had an AUC of 0.85 (95% CI: 0.80–0.88). At a cut-off value (T) of 0.22, the Youden index was maximized (0.56) with a sensitivity of 0.79, specificity of 0.78, positive predictive value of 0.53, and negative predictive value of 0.92. The model was applied to the internal and external validation sets, and the corresponding ROC curves were plotted (Figure 3). In the internal validation set, the AUC of the predictive model was 0.82 (95% CI: 0.74–0.88) with a cut-off value (T) of 1.64, maximum Youden index of 0.55, sensitivity of 0.65, specificity of 0.90, positive predictive value of 0.67, and negative predictive value of 0.89. In the external validation set, the AUC of the predictive model was 0.84 (95% CI: 0.76–0.90) with a cut-off value (T) of 1.33, maximum Youden index of 0.55, sensitivity of 0.72, specificity of 0.83, positive predictive value of 0.72, and negative predictive value of 0.83. There were no statistically significant differences between the training set, and the internal and external validation sets in terms of the AUCs of the predictive model (P=0.54 and 0.84, respectively) (Table 4). In the training set, and internal and external validation sets, the AUC of the predictive model was higher than the AUCs of age, nipple discharge, ultrasound BI-RADS assessment, and DL system classification results alone, and the pairwise comparisons revealed statistically significant differences (DeLong test, P<0.001) (Tables 5-7).

Figure 3 Receiver operating characteristic curves of the predictive model for the training, internal, and external validation sets.

Table 4

Diagnostic performance of the training and internal/external validation sets for predicting BI-RADS 4A positive lesions

Set	SEN (95% CI)	SPE (95% CI)	PPV (95% CI)	NPV (95% CI)	AUC (95% CI)	Z value	P value
Training set	0.79 (0.68–0.87)	0.78 (0.72–0.83)	0.53 (0.47–0.60)	0.92 (0.88–0.95)	0.85 (0.80–0.88)
Internal validation set	0.65 (0.47–0.80)	0.9 (0.83–0.95)	0.67 (0.52–0.79)	0.89 (0.84–0.93)	0.82 (0.74–0.88)	0.608	0.543
External validation set	0.72 (0.57–0.84)	0.83 (0.73–0.91)	0.72 (0.60–0.87)	0.83 (0.75–0.88)	0.84 (0.76–0.90)	0.199	0.842

BI-RADS, Breast Imaging Reporting and Data System; SEN, sensitivity; CI, confidence interval; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.

Table 5

Diagnostic efficacy of each independent predictor and predictive model using the training set for BI-RADS category 4A positive lesions

Factor	SEN (95% CI)	SPE (95% CI)	PPV (95% CI)	NPV (95% CI)	AUC (95% CI)	Z value	P value
Age	0.75 (0.64–0.84)	0.47 (0.41–0.54)	0.32 (0.28–0.36)	0.85 (0.79–0.90)	0.61 (0.54–0.68)	6.10	<0.001
Nipple discharge	0.34 (0.24–0.45)	0.93 (0.89–0.96)	0.6 (0.47–0.72)	0.81 (0.78–0.83)	0.63 (0.58–0.69)	7.56	<0.001
Ultrasound BI-RADS assessment	0.75 (0.64–0.84)	0.61 (0.54–0.67)	0.39 (0.34–0.43)	0.88 (0.83–0.92)	0.68 (0.62–0.74)	6.10	<0.001
Deep learning system classification results	0.80 (0.70–0.88)	0.56 (0.49–0.62)	0.37 (0.33–0.42)	0.90 (0.84–0.93)	0.68 (0.63–0.73)	6.30	<0.001
Predictive model	0.79 (0.68–0.87)	0.78 (0.72–0.83)	0.53 (0.47–0.60)	0.92 (0.88–0.95)	0.85 (0.80–0.90)

BI-RADS, Breast Imaging Reporting and Data System; SEN, sensitivity; CI, confidence interval; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.

Table 6

Diagnostic efficacy of each independent predictor and predictive model using the internal validation set for BI-RADS category 4A positive lesions

Factor	SEN (95% CI)	SPE (95% CI)	PPV (95% CI)	NPV (95% CI)	AUC (95% CI)	Z value	P value
Age	0.88 (0.73–0.97)	0.39 (0.29–0.48)	0.31 (0.27–0.35)	0.89 (0.78–0.94)	0.61 (0.52–0.72)	3.75	<0.001
Nipple discharge	0.29 (0.15–0.48)	0.98 (0.94–1.00)	0.83 (0.54–0.96)	0.82 (0.78–0.85)	0.64 (0.56–0.72)	4.03	<0.001
Ultrasound BI-RADS assessment	0.71 (0.53–0.85)	0.63 (0.54–0.72)	0.38 (0.30–0.46)	0.87 (0.80–0.92)	0.67 (0.58–0.76)	3.37	<0.001
Dee-learning system classification results	0.85 (0.69–0.95)	0.48 (0.38–0.58)	0.34 (0.29–0.39)	0.91 (0.82–0.96)	0.67 (0.59–0.74)	3.8	0.001
Predictive model	0.65 (0.47–0.80)	0.9 (0.83–0.95)	0.67 (0.52–0.78)	0.89 (0.84–0.93)	0.82 (0.73–0.90)

BI-RADS, Breast Imaging Reporting and Data System; SEN, sensitivity; CI, confidence interval; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.

Table 7

Diagnostic efficacy of each independent predictor and predictive model of the external validation set for BI-RADS category 4A positive lesions

Factor	SEN (95% CI)	SPE (95% CI)	PPV (95% CI)	NPV (95% CI)	AUC (95% CI)	Z value	P value
Age	0.80 (0.66–0.91)	0.51 (0.39–0.63)	0.49 (0.43–0.56)	0.81 (0.70–0.89)	0.68 (0.59–0.77)	2.79	0.005
Nipple discharge	0.35 (0.21–0.50)	0.91 (0.82–0.96)	0.70 (0.50–0.84)	0.70 (0.65–0.75)	0.63 (0.54–0.71)	5.62	<0.001
Ultrasound BI-RADS assessment	0.85 (0.71–0.94)	0.65 (0.53–0.76)	0.59 (0.51–0.67)	0.88 (0.78–0.94)	0.75 (0.66–0.82)	2.35	0.019
Deep learning system classification results	0.65 (0.50–0.79)	0.66 (0.55–0.77)	0.54 (0.44–0.63)	0.76 (0.68–0.83)	0.66 (0.57–0.74)	5.34	<0.001
Predictive model	0.72 (0.57–0.84)	0.83 (0.73–0.91)	0.72 (0.60–0.81)	0.83 (0.75–0.88)	0.84 (0.76–0.90)

BI-RADS, Breast Imaging Reporting and Data System; SEN, sensitivity; CI, confidence interval; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.

The calibration curve analysis (Figure 4) showed well-fitted deviation correction curves of the predictive model for the training and validation sets, with the x-axis representing the predicted risk probability and the y-axis representing the actual risk probability. The Hosmer-Lemeshow test showed that the predictive model had a good fit in both the training set (P=0.906), internal validation set (P=0.890), and external validation set (P=0.769), and predicted the probability of malignancy risk with no significant deviation from the actual probability of malignancy risk. The DCA results (Figure 5) demonstrated that the predictive model curves for the training set, and the internal and external validation sets were superior to the two extreme lines. In the training, internal validation, and external validation sets, net benefits were obtained when the risk thresholds were more significant than 0.05 and 0.15, and less than 0.70, respectively, facilitating clinical decision making. The process of integrative prediction of various features of actual cases through Nomo plots is shown in Figures 6,7.

Figure 4 Calibration curves of the predictive model for the (A) training, (B) internal, and (C) external validation sets.

Figure 5 Decision curve analysis of the predictive model for the training and internal and external validation sets.

Figure 6 A 58-year-old female patient with no nipple discharge. The deep learning system identified a mass (A, green rectangle) and assessed it as likely benign (ultrasound assessment BI-RADS 4B). Every risk factor corresponds to “Points” (red arrows), and “Total Points” corresponds to the “Predicted Probability of BI-RADS 4A Pathologically-Positive”, resulting in a predictive model probability of 21.51% (B, red arrows). Pathology confirmed fibroadenoma. BI-RADS, Breast Imaging Reporting and Data System.

Figure 7 A 46-year-old female patient with left nipple discharge. The deep learning system identified calcifications (A, red circles) and assessed them as likely malignant (ultrasound assessment BI-RADS 1), while two additional calcifications were assessed as likely benign (A, green circles). The final result of the deep learning system was largely based on the lesion with a higher possibility of malignancy. Each risk factor corresponds to “Points” (red arrows), and “Total Points” corresponds to the “Predicted Probability of BI-RADS 4A Pathologically-Positive”, resulting in a predictive model probability of 55.11% (B, red arrows). Pathologically confirmed ductal carcinoma in situ. BI-RADS, Breast Imaging Reporting and Data System.

Discussion

This study introduced novel approaches that redefine pathological categorizations and leverage DL technology that may contribute to the management of BI-RADS 4A lesions. Our findings, based on the ACR 5th edition BI-RADS guidelines, provide a more nuanced view of the probability of malignant lesions. A distinctive feature of this study lies in its comprehensive re-evaluation of pathological positives, incorporating ductal intraductal papillomas, atypical ductal hyperplasia, and borderline lobular tumors. By expanding the scope of malignancy considerations, this study revealed a previously underestimated positive rate of 27.12% in the BI-RADS 4A category. This recalibration of pathological classifications represents a significant departure from conventional practices, and further refines understandings of lesion characteristics.

Given that pathological diagnoses confirm that most BI-RADS 4A lesions are benign, the need of biopsies for this category of lesions requires re-evaluation (16). The study’s novel model identified malignancies and excelled at risk stratification, downstaging a substantial percentage of patients in internal (88.07%) and external (90.91%) validation sets. This achievement underscores the model’s robust performance and capacity to optimize patient management by minimizing unnecessary biopsies and associated healthcare costs. Integrating DL technology into decision making represents a paradigm shift, offering a more precise and efficient approach to BI-RADS 4A lesion assessment.

A vital element of the model’s innovation lies in its integration of the following four independent factors: age, nipple discharge, ultrasound BI-RADS assessment, and DL system classification results (17). The prominence of age as a crucial predictor aligns with broader trends in breast cancer risk assessment and highlights the model’s ability to capture clinically relevant information. Chhatwal et al. (18) confirmed that the risk of breast cancer is positively correlated with age, but that a family history of breast cancer and long-term contraceptive use are not significant predictive factors for breast cancer. Raza et al. (19) also showed that age is a crucial clinical factor in predicting breast cancer, suggesting that even in cases with radiologically benign features, older patients should undergo a biopsy. Similarly, this study found that neither a family history of breast cancer nor oral contraceptive use were independent predictive factors of breast cancer. However, age was an independent predictor. The age of onset of breast cancer in China ranges from 40 to 50 years, and is approximately 10 years earlier than that in European and American women (20). In this study, the threshold age was 42 years, which is in line with the age of onset of breast cancer in China, and positive patients aged ≥42 years old in this study accounted for 76.88% of the overall number of positive patients. The deliberate exclusion of less impactful factors, such as a family history of breast cancer and contraceptive use, further refined the model’s focus on high-impact variables, enhancing its clinical utility.

Nipple discharge, when considered as an independent factor of breast cancer, was specific but exhibited lower sensitivity. Caution should be exercised in treating BI-RADS 4A patients presenting with nipple discharge, particularly if intraductal lesions are considered. Physiological nipple discharge does not require diagnostic imaging, while pathological nipple discharge typically requires mammography and ultrasound examinations for comprehensive evaluation (21). Pathological nipple discharge is associated with intraductal papillomas, ductal dilation, breast cancer, and infections. Among them, intraductal papillomas are the most common cause of pathological nipple discharge. Up to 57% of pathological nipple discharge cases are linked to intraductal papillomas, and 5–12% of breast cancers initially present with nipple discharge (22). The incorporation of this feature in our model enhances its ability to effectively stratify patients.

Ultrasound and mammography have several advantages in the diagnosis of breast lesions. In clinical practice, imaging physicians typically integrate patients’ clinical complaints, physical examinations, relevant medical history, ultrasound BI-RADS assessment, and mammography imaging features to comprehensively evaluate breast lesions. Breast magnetic resonance imaging (MRI) is more sensitive to breast lesions than mammography, but has a prolonged examination time and higher costs. It is not recommended for all BI-RADS 4A patients (16,23). Yang et al. (23) retrospectively analyzed the general clinical data, ultrasound imaging features, and mammography imaging features (including BI-RADS assessments and suspicious signs) of 418 patients assessed as BI-RADS 4A using ultrasound. Logistic regression was used to construct a model to predict the malignancy of BI-RADS 4A lesions. The consistency index (C-index) values of the model’s training set and validation set were 0.81 and 0.77, respectively. Thus, Yang et al.’s model could assist physicians to more accurately assess BI-RADS 4A lesions and reduce unnecessary diagnosis and treatment (23). Comparatively, ultrasound, mammography, and DL classification contribute to the comprehensive evaluation of breast lesions. This multi-faceted approach, supported by the model’s high AUC values (0.85, 0.82, and 0.84 in the training, internal, and external validation sets, respectively), facilitates objective and accurate BI-RADS 4A assessments. The model’s biopsy or short-term follow-up recommendations align with the study’s overarching goal of personalized and stratified patient management. Despite these advancements, the study had a number of limitations, including its retrospective design and potential selection bias in the BI-RADS 4A classification. To enhance the diagnostic efficiency of this model, prospective validation and the incorporation of breast MRI and contrast-enhanced mammography imaging features should be considered in the future (24,25).

This study pioneered a comprehensive BI-RADS 4A lesion management approach that leverages DL and clinical data integration. This innovative predictive model provides a framework for guiding personalized patient care, potentially minimizing unnecessary interventions, and addressing the limitations inherent in the existing biopsy practices for this category of lesions.

Conclusions

This study introduced a novel model for managing BI-RADS 4A lesions to address the challenges that arise in breast cancer diagnosis. The model, which integrates age, nipple discharge, ultrasound assessment, and DL results, could assist radiologists in objectively assessing BI-RADS category 4A lesions. This approach could enhance diagnostic accuracy and reduce the number of unnecessary biopsies. The results of this study are promising; however, future studies should undertake prospective validation and seek to integrate breast MRI features into the model. This research represents a step toward personalized and efficient BI-RADS 4A lesion management, advancing diagnostic accuracy and patient care.

Acknowledgments

Funding: This study was supported by the Shenzhen Science and Technology Research Fund (No. GJHZ20220913142613025).

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1075/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1075/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki (revised in 2013). This study was approved by the Medical Ethics Committee of Shenzhen People’s Hospital (approval No. LL-KY-2021624). The other hospital was informed of and agreed with the study. The requirement of individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Niu S, Huang J, Li J, Liu X, Wang D, Zhang R, Wang Y, Shen H, Qi M, Xiao Y, Guan M, Liu H, Li D, Liu F, Wang X, Xiong Y, Gao S, Wang X, Zhu J. Application of ultrasound artificial intelligence in the differential diagnosis between benign and malignant breast lesions of BI-RADS 4A. BMC Cancer 2020;20:959. [Crossref] [PubMed]
Manohar S, Dantuma M. Current and future trends in photoacoustic breast imaging. Photoacoustics 2019;16:100134. [Crossref] [PubMed]
D'orsi C, Morris E, Mendelson E. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. Reston, VA: American College of Radiology; 2013.
Forrai G, Kovács E, Ambrózay É, Barta M, Borbély K, Lengyel Z, Ormándi K, Péntek Z, Tünde T, Sebő É. Use of Diagnostic Imaging Modalities in Modern Screening, Diagnostics and Management of Breast Tumours 1st Central-Eastern European Professional Consensus Statement on Breast Cancer. Pathol Oncol Res 2022;28:1610382. [Crossref] [PubMed]
Liu G, Zhang MK, He Y, Liu Y, Li XR, Wang ZL. BI-RADS 4 breast lesions: could multi-mode ultrasound be helpful for their diagnosis? Gland Surg 2019;8:258-70. [Crossref] [PubMed]
Elezaby M, Li G, Bhargavan-Chatfield M, Burnside ES, DeMartini WB. ACR BI-RADS Assessment Category 4 Subdivisions in Diagnostic Mammography: Utilization and Outcomes in the National Mammography Database. Radiology 2018;287:416-22. [Crossref] [PubMed]
Guo H, Sun Y. The Clinical Pathological Application of Breast Fine Needle Aspiration Biopsy. Chinese Journal of Pathology 2004;33:277-9.
Hai L, Feng Y, Zhao J, Tang Q, Wang X, Cao X, Xiao C. An Improved Nomogram to Reduce False-Positive Biopsy Rates of Breast Imaging Reporting and Data System Ultrasonography Category 4A Lesions. Cancer Control 2022;29:10732748221122703. [Crossref] [PubMed]
Shen Y, Shamout FE, Oliver JR, Witowski J, Kannan K, Park J, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat Commun 2021;12:5645. [Crossref] [PubMed]
Cai Y, Zhu C, Chen Q, Zhao F, Guo S. Application of a second opinion ultrasound in Breast Imaging Reporting and Data System 4A cases: can immediate biopsy be avoided? J Int Med Res 2021;49:3000605211024452. [Crossref] [PubMed]
Shen L, Jiang T, Tang P, Ge H, You C, Peng W. Comprehensive quantitative malignant risk prediction of pure grouped amorphous calcifications: clinico-mammographic nomogram. Quant Imaging Med Surg 2022;12:2672-83. [Crossref] [PubMed]
Zhao Z, Hou S, Li S, Sheng D, Liu Q, Chang C, Chen J, Li J. Application of Deep Learning to Reduce the Rate of Malignancy Among BI-RADS 4A Breast Lesions Based on Ultrasonography. Ultrasound Med Biol 2022;48:2267-75. [Crossref] [PubMed]
Wong DJ, Gandomkar Z, Wu WJ, Zhang G, Gao W, He X, Wang Y, Reed W. Artificial intelligence and convolution neural networks assessing mammographic images: a narrative literature review. J Med Radiat Sci 2020;67:134-42. [Crossref] [PubMed]
Breast group of Chinese Society of Radiology Chinese Medical Association. Expert consensus on the construction and quality control of mammography datasets. Chinese Journal of Radiology 2022;56:959-66.
Rubio IT, Wyld L, Marotti L, Athanasiou A, Regitnig P, Catanuto G, Schoones JW, Zambon M, Camps J, Santini D, Dietz J, Sardanelli F, Varga Z, Smidt M, Sharma N, Shaaban AM, Gilbert F. European guidelines for the diagnosis, treatment and follow-up of breast lesions with uncertain malignant potential (B3 lesions) developed jointly by EUSOMA, EUSOBI, ESP (BWG) and ESSO. Eur J Surg Oncol 2024;50:107292. [Crossref] [PubMed]
Xie Y, Zhu Y, Chai W, Zong S, Xu S, Zhan W, Zhang X. Downgrade BI-RADS 4A Patients Using Nomogram Based on Breast Magnetic Resonance Imaging, Ultrasound, and Mammography. Front Oncol 2022;12:807402. [Crossref] [PubMed]
Sigrist RMS, Liau J, Kaffas AE, Chammas MC, Willmann JK. Ultrasound Elastography: Review of Techniques and Clinical Applications. Theranostics 2017;7:1303-29. [Crossref] [PubMed]
Chhatwal J, Alagoz O, Lindstrom MJ, Kahn CE Jr, Shaffer KA, Burnside ES. A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. AJR Am J Roentgenol 2009;192:1117-27. [Crossref] [PubMed]
Raza S, Goldkamp AL, Chikarmane SA, Birdwell RL. US of breast masses categorized as BI-RADS 3, 4, and 5: pictorial review of factors influencing clinical management. Radiographics 2010;30:1199-213. [Crossref] [PubMed]
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
Gupta D, Mendelson EB, Karst I. Nipple Discharge: Current Clinical and Imaging Evaluation. AJR Am J Roentgenol 2021;216:330-9. [Crossref] [PubMed]
Patel BK, Falcon S, Drukteinis J. Management of nipple discharge and the associated imaging findings. Am J Med 2015;128:353-60. [Crossref] [PubMed]
Yang Y, Hu Y, Shen S, Jiang X, Gu R, Wang H, Liu F, Mei J, Liang J, Jia H, Liu Q, Gong C. A new nomogram for predicting the malignant diagnosis of Breast Imaging Reporting and Data System (BI-RADS) ultrasonography category 4A lesions in women with dense breast tissue in the diagnostic setting. Quant Imaging Med Surg 2021;11:3005-17. [Crossref] [PubMed]
Liu Y, Wang S, Qu J, Tang R, Wang C, Xiao F, Pang P, Sun Z, Xu M, Li J. High-temporal resolution DCE-MRI improves assessment of intra- and peri-breast lesions categorized as BI-RADS 4. BMC Med Imaging 2023;23:58. [Crossref] [PubMed]
Wang S, Wang Z, Li R, You C, Mao N, Jiang T, Wang Z, Xie H, Gu Y. Association between quantitative and qualitative image features of contrast-enhanced mammography and molecular subtypes of breast cancer. Quant Imaging Med Surg 2022;12:1270-80. [Crossref] [PubMed]

Cite this article as: Ouyang R, Liao T, Yang Y, Lin X, Zhou X, Ma J. Novel study on the prediction of BI-RADS 4A positive lesions in mammography using deep learning technology and clinical factors. Quant Imaging Med Surg 2024;14(12):8864-8877. doi: 10.21037/qims-24-1075

Novel study on the prediction of BI-RADS 4A positive lesions in mammography using deep learning technology and clinical factors

Introduction

Methods

Clinical data

Image acquisition

Image analysis and clinical data collection

Construction of the DL system

Grouping

Statistical analysis

Results

General information

Table 1

Univariate analysis of the pathologically positive and pathologically negative groups in the training set

Table 2

Multivariate logistic regression analysis and establishment of a joint prediction model for the pathologically positive and pathologically negative groups in the training set

Table 3

Evaluation and validation of predictive model performance

Table 4

Table 5

Table 6

Table 7

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share