Random forest with preoperative core biopsy categories: a novel method for refining ultrasonic Breast Imaging Reporting and Data System evaluation

Junhui Shen; Jieyi Huang; Xiaolu Ye; Lina Wu; Jiexin Wang; Fengjuan Chen; Xiaoya Zhou; Kebing Liu; Chunwang Huang; Ting Liang

doi:10.21037/qims-24-2070

Original Article

Random forest with preoperative core biopsy categories: a novel method for refining ultrasonic Breast Imaging Reporting and Data System evaluation

Junhui Shen^1#, Jieyi Huang^2#, Xiaolu Ye², Lina Wu², Jiexin Wang³, Fengjuan Chen², Xiaoya Zhou², Kebing Liu², Chunwang Huang⁴, Ting Liang²

¹Department of Rehabilitation Medicine, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; ²Department of Ultrasound, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Clinical Research Academy of Chinese Medicine, Guangzhou University of Chinese Medicine, Guangzhou, China; ³Department of Ultrasound, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China; ⁴Department of Ultrasound, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China

Contributions: (I) Conception and design: T Liang, J Shen; (II) Administrative support: C Huang; (III) Provision of study materials or patients: T Liang, C Huang; (IV) Collection and assembly of data: J Huang, X Ye, L Wu, F Chen, X Zhou; (V) Data analysis and interpretation: J Shen, J Huang, J Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Ting Liang, MD. Department of Ultrasound, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Clinical Research Academy of Chinese Medicine, Guangzhou University of Chinese Medicine, No. 16 Jichang Road, Guangzhou 510405, China. Email: lt831102@foxmail.com; Chunwang Huang, MD. Department of Ultrasound, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, No. 106 Zhongshan Er Road, Guangzhou 510080, China. Email: huangchunwang@126.com.

Background: Many benign breast lesions are classified as Breast Imaging Reporting and Data System (BI-RADS) category 4, resulting in unneeded biopsies. We thus aimed to build a model based on a core needle biopsy category (CBC) to improve upon BI-RADS classification by analyzing clinical and ultrasonic features.

Methods: A retrospective study was conducted in which female patients with solid breast tumors who underwent ultrasound-guided core needle biopsy (CNB) were enrolled. Participants were randomly allocated to either a training or validation cohort at a 7:3 ratio. We developed CBC prediction models using five machine learning algorithms: support vector machine, random forest (RF), multilayer perceptron (MLP), logistic regression (LR), and k-nearest neighbors (KNNs). The optimal model was selected based on the highest area under the curve (AUC) value and subsequently applied to adjust the BI-RADS categories. The category of BI-RADS was downgraded by one if the CBC prediction was B1 or B2 or upgraded by one if the CBC prediction was B3 or B5. The number and rate of missed or accurate up- or downgrading were calculated.

Results: A total of 1,082 female patients were included comprising 1,185 lesions. The optimal model was RF [AUC =0.943, 95% confidence interval (CI): 0.930–0.956]. In 42 BI-RADS category 3 lesions, 4 (9.5%) cases were upgraded, 3 of which were correct, while 38 (90.5%) cases were downgraded, 37 of which were correct. In 167 BI-RADS category 4A lesions, 149 (89.2%) cases were downgraded, 145 of which were correct, while 18 cases (10.8%) were upgraded, 13 of which were correct.

Conclusions: The predictive model of CBC built by RF can aid in adjusting BI-RADS category 3 and 4A and thus help prevent unnecessary biopsy.

Keywords: Ultrasound; core biopsy categories; breast solid tumor; random forest (RF); Breast Imaging Reporting and Data System (BI-RADS)

Submitted Sep 25, 2024. Accepted for publication Apr 07, 2025. Published online May 27, 2025.

doi: 10.21037/qims-24-2070

Introduction

Breast lesions, a highly prevalent condition across the world, and histologically categorized into benign, malignant, and borderline types. These lesions can be managed through various approaches, including imaging follow-up, biopsy, or surgical intervention (1,2).

Imaging modalities can serve as a preliminary method for assessing the histological characteristics of breast lesions, including initial determination of their benign or malignant status. For benign lesions, continued follow-up with persistent monitoring of their progression is recommended through the use of techniques such as magnetic resonance imaging (MRI), mammography, and ultrasonography. However, MRI is frequently associated with a high false-positive rate in identifying malignant tumors and involves substantial cost (3); meanwhile, mammography is limited in its ability to detect tumors within dense breast tissue (4). On the other hand, ultrasound is distinguished by being a nonradioactive, cost-effective, and readily accessible diagnostic tool. Given that the majority of women in Asian countries have dense breast tissue, ultrasound screening has emerged as the diagnostic modality of choice in these regions (5).

The ultrasound lexicon of the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) is extensively applied to estimate the likelihood of malignancy. However, based on our clinical experience, lesions classified as BI-RADS category 3 and 4A continue to represent a critical but challenging diagnostic group. Chae et al. demonstrated that lesions classified as category 3 have a low malignancy rate (6), while in a study by Barr et al., a mere 0.1% of lesions exhibited suspicious malignant changes during a 6-month follow-up period (7). In clinical practice, many benign lesions are often classified as category 4A due to the diagnostic uncertainty experienced by physicians, leading to a high number of unnecessary biopsies. If these lesions could be accurately classified as BI-RADS category 3, patients could potentially forego a biopsy. Attempts have been made to enhance the diagnostic accuracy of BIRADS. For example, Weng et al. used contrast-enhanced ultrasound (8), and Zhao et al. employed strain elastography (9). Nonetheless, there remains a scarcity of appropriate diagnostic techniques to complement BI-RADS, particularly for lesions categorized as 3 and 4A.

Percutaneous imaging-guided core needle biopsy (CNB) can provide a definitive pathological classification for breast tumors, which may differ from that of the BI-RADS category. CNB categories range from B1 to B5 (10), offering valuable insights into the characteristics of breast lesions. These categories enable clinicians to make precise decisions related to clinical management. Leveraging these CNB categories, we developed a novel approach to enhance the accuracy of the BI-RADS classification system.

Machine learning has demonstrated considerable potential in the management of malignant tumors, particularly in the imaging assessment of breast cancer (11-14). By automatically analyzing and extracting patterns from existing data, machine learning can use these patterns to predict outcomes for unknown data. Its ability to perform classification tasks with high precision has been widely recognized across various medical fields, including radiology, critical care medicine, and cardiology (13,15-18). For instance, Panourgias et al. employed an MRI-based inductive decision tree to classify B3 lesions within BI-RADS 4 and 5 categories, achieving high accuracy (88.7%) and an excellent area under the receiver operating characteristic (ROC) curve (AUC =0.992) in the training set. However, the model’s performance appeared to decline in the test set (AUC =0.5), likely due to the limited sample size (19). In another study, Bahl et al. developed a mammogram-based random forest (RF) model to predict the risk of pathologic upgrade of high-risk breast lesions (B3 lesions) to cancer. Their model demonstrated the ability to reduce the number unnecessary surgeries by nearly one-third (20). These studies highlight the effectiveness of machine learning in analyzing B3 lesions from different perspectives and attest to its robust performance (19,20). However, despite these advancements, the application of machine learning to the classification of core needle biopsy category (CBC) based on imaging remains underexplored. To our knowledge, few ultrasound studies have used machine learning to investigate the relationship between CBC and BI-RADS or to refine BI-RADS classification via CBC. This presents a promising opportunity for future research to leverage machine learning in improving the precision of breast lesion classification and clinical decision-making.

Our study aimed to leverage machine learning to predict CBC by assessing ultrasound and clinical characteristics, with the ultimate goal of refining the BI-RADS classification, particularly for category 3 and 4A lesions. This approach may be able to reduce the number unnecessary biopsies by improving the accuracy of lesion characterization. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2070/rc).

Methods

Participants

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Ethical approval for this study was granted by the institutional review board of Guangdong Provincial People’s Hospital (No. KY2023-1069-01), who waived the requirement for informed consent due to the retrospective nature of the analysis. Histological characteristics of breast nodules were extracted from pathology reports. We included 1082 consecutive female patients aged 12–96 years (mean age 42.22±13.37 years) who attended Guangdong Provincial People’s Hospital between March 1 and December 31, 2019. A total of 1,185 nodules (815 benign and 370 malignant) satisfied the inclusion criteria, with all ultrasound images archived in the medical system.

The inclusion criteria for patients were as follows: (I) nodules clinically suspicious for breast cancer and recommended for biopsy; (II) nodules classified as B1, B2, B3, or B5 based on CNB results; and (III) nodules categorized as BI-RADS 3 or higher. Meanwhile, the exclusion criteria were (I) lesions identified as metastatic tumors and (II) patients who had undergone systemic hormone therapy or adjuvant chemotherapy.

The workflow of the study is illustrated in Figure 1.

Figure 1 Workflow of the study procedure. AUC, area under the curve; BI-RADS, Breast Imaging Reporting and Data System; CBC, core needle biopsy category; RPF, thickness ratio of breast parenchyma to mammary fat; RPT, thickness ratio of breast parenchyma to tissue before pectoralis fascia; TBP, anteroposterior thickness of breast parenchyma.

Clinical features and ultrasonic image acquisition

The clinical features examined in this study included height, weight, body mass index (BMI), and age. We used a 14-MHz linear transducer (Toshiba Aplio 500, Tokyo, Japan) to capture ultrasonic images. Images of the nodules were acquired in a standard manner, and contained at least two orthogonal planes (radial and antiradial or transverse and longitudinal). According to the ACR BI-RADS fifth edition classification criteria and a previous study (21), all images were analyzed retrospectively by two breast radiologists (Reader 1 with 10 years of experience and Reader 2 with 5 years of experience). The radiologists maintained strict records of 13 ultrasonic features: shape, orientation, margin, echogenic pattern, posterior features, calcifications, vascularity distribution, vascularity grade, tumor size, BI-RADS category, anteroposterior thickness of the breast parenchyma (TBP), anteroposterior thickness ratio of breast parenchyma to mammary fat (RPF), and anteroposterior thickness ratio of breast parenchyma to tissue before pectoralis fascia (RPT). RPF and RPT were the adjusted parameters of TBP and were obtained after TBP was corrected according to the thickness of tissue but before pectoralis fascia and thickness of fat, respectively (Figure 2). Detailed feature descriptions are provided in Appendix 1 (22). Both Reader 1 and Reader 2 were blinded to histological results but had access to patient age. We assessed the inter- and intraobserver agreement for all 13 sonographic features. In cases of discrepancy between readers, final determinations were reached through consensus discussion.

Figure 2 Calculation method of RPT and RPF. RPT = a/c. RPF = a/b. (a) The red line is the thickness of the breast parenchyma. (b) The blue line is the thickness of mammary fat. (c) The yellow line is the thickness of pectoralis fascia. RPT, thickness ratio of breast parenchyma to tissue before pectoralis fascia; RPF, thickness ratio of breast parenchyma to mammary fat.

Core biopsy reporting categories

To facilitate the analysis of the clinical and ultrasound characteristics of the lesions, all lesions were categorized into four groups based on histological examination (10). (I) The B1 group included normal tissue. (II) The B2 group included fibroadenoma, fibrocystic changes, sclerosing adenosis, duct ectasia, and other nonparenchymal lesions such as abscesses and fat necrosis. (III) The B3 group included lesions with uncertain malignant potential. These lesions may exhibit benign histology on core biopsy but are known to display heterogeneity or carry an increased risk of associated malignancy. This category encompassed atypical intraductal epithelial proliferation, flat epithelial atypia, lobular neoplasia, phyllodes tumors, papillary lesions, radial scars, mucocele-like lesions, and other rare lesions. Due to their uncertain malignant potential, B3 lesions are recommended for expanded excision during surgeries. (IV) Finally, the B5 group included malignant nodules.

It should be noted that the B4 classification includes suspicious nodules. Typically, this classification is applied when errors occur in pathological section preparation, such as crushed deformation or poorly fixed core tissue samples, leading to suspicion of cancerous tissue within the sample. In such cases, remaking the pathological sections is necessary to confirm the tumor characteristics. Consequently, B4 lesions were excluded from this study.

Statistical analysis

Statistical analyses were conducted with SPSS version 22.0 (IBM Corp., Armonk, NY, USA). A two-sided significance threshold of P<0.05 was applied. Continuous variables were compared using the least significant difference (LSD) test, whereas categorical variables were assessed with the Bonferroni correction. The best subset method was used to select the optimal predictive features for model development.

Machine learning in characteristics analysis

SPSS Modeler 18.0 software (IBM Corp.) was used to implement the machine learning workflow, which directly predicted the probability of CBC for each nodule. The procedure consisted of the following steps. First, the optimal features were selected using the best subset method. Second, during implementation, the dataset was randomly split into training and validation cohorts using the partition node at a ratio of 7:3 (training: validation). Subsequently, the balance node was applied to address class imbalance issues. Finally, given the variety of available machine learning algorithms, we employed several widely used models to perform the classification task. These models included RF, support vector machine (SVM), k-nearest neighbor (KNN), multilayer perceptron (MLP), and logistic regression (LR).

Performance of the machine learning models

The diagnostic performance of each algorithm was evaluated using ROC curve analysis, with the AUC calculated for comparison. The algorithm demonstrating the highest AUC was selected. We then applied this optimal algorithm to perform CBC prediction of contralateral breast cancer for each nodule.

Up- and downgrading in BI-RADS

If the CBC prediction was B1 or B2, the BI-RADS category was downgraded by one level. Conversely, if the CBC prediction was B3 or B5, the BI-RADS category was upgraded by one level. Lesions classified as BI-RADS 4 or 5 were recommended for biopsy. Specifically, for BI-RADS 4B or 4C lesions, regardless of whether they were upgraded or downgraded, biopsy remained necessary. Therefore, the focus of this study was on BI-RADS-US category 3 and 4A lesions, and we calculated the number and rate of accurate or missed upgrades and downgrades.

Results

Clinical and ultrasonic characteristics

A total of 1,185 lesions were examined in this study, and the distribution of BI-RADS classifications was as follows: 42 were category 3 (3.5%), 167 were category 4A (14%), 399 were category 4B (34%), 296 were category 4C (25%), and 281 were category 5 (24%). Meanwhile, the distribution of CBC was follows: 44 were category B1 (4%), 714 were category B2 (60%), 57 were category B3 (5%), and 370 were category B5 (31%). The baseline ultrasonic and clinical features are summarized in Table 1. Significant differences (P<0.05) were observed in 15 features, including age, height, weight, BMI, echo pattern, shape, margin, orientation, posterior features, calcification, vascularity distribution, vascularity grade, BI-RADS category, tumor size, and TBP. However, no significant differences were found for RPT or RPF (P>0.05).

Table 1

Baseline clinical and ultrasonic characteristics

Feature	BI-RADS					P
Feature	3 (n=42)	4A (n=167)	4B (n=399)	4C (n=296)	5 (n=281)	P
Age (years)	36.95±13.0	37.66±11.54	38.41±11.87	43.47±13.54	49.81±12.7	<0.001
Height (cm)	158.76±4.95	158.68±4.79	159.18±4.76	158.14±5.12	157.9±5.04	0.009
Weight (kg)	53.86±7.39	53.68±7.27	54.33±7.78	55.94±8.17	57.46±8.36	<0.001
BMI, kg/m²	21.39±3.03	21.35±2.97	21.45±2.96	22.36±3.19	23.04±8.36	<0.001
Echo pattern						<0.001
Hyperechoic	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
Complex cystic and solid	3 (7.1)	1 (0.6)	3 (0.8)	1 (0.3)	0 (0)
Hypoechoic	34 (81.0)	109 (65.3)	163 (40.9)	107 (36.1)	117 (41.6)
Isoechoic	5 (11.9)	8 (4.8)	3 (0.8)	1 (0.3)	11 (3.9)
Heterogeneous	0 (0)	49 (29.3)	230 (57.6)	187 (63.2)	153 (54.4)
Shape						<0.001
Oval	27 (64.3)	120 (71.9)	69 (17.3)	8 (2.7)	5 (1.8)
Round	1 (2.4)	1 (0.6)	8 (2.0)	1 (0.3)	3 (1.1)
Irregular	14 (33.3)	46 (27.5)	322 (80.7)	287 (97.0)	273 (97.2)
Margin						<0.001
Circumscribed	29 (69.0)	128 (76.6)	85 (21.3)	13 (4.4)	6 (2.1)
Indistinct	4 (9.5)	19 (11.4)	103 (25.8)	44 (14.9)	23 (8.2)
Angular	9 (21.4)	16 (9.6)	203 (50.9)	208 (70.3)	125 (44.5)
Microlobulated	0 (0)	4 (2.4)	8 (2.0)	31 (10.5)	127 (45.2)
Orientation						<0.001
Parallel	38 (90.5)	165 (98.8)	364 (91.2)	241 (81.4)	191 (68.0)
Not parallel	4 (9.5)	2 (1.2)	35 (8.8)	55 (18.6)	90 (32.0)
Posterior feature						<0.001
No posterior feature	25 (59.5)	38 (22.8)	33 (8.3)	19 (6.4)	14 (5.0)
Enhancement sound	8 (19.0)	31 (18.6)	41 (10.3)	20 (6.8)	8 (2.8)
Shadowing	6 (14.3)	14 (8.4)	102 (25.6)	71 (24.0)	54 (19.2)
Combined pattern	3 (7.1)	84 (50.3)	223 (55.9)	186 (62.8)	205 (73.0)
Calcification						<0.001
In a mass	2 (4.8)	4 (2.4)	22 (5.5)	61 (20.6)	112 (39.9)
Outside of a mass	0 (0)	1 (0.6)	0 (0)	0 (0)	2 (0.7)
Intraductal calcification	0 (0)	0 (0)	0 (0)	2 (0.7)	0 (0)
None	40 (95.2)	162 (97.0)	377 (94.5)	233 (78.7)	167 (59.4)
Vascularity distribution						<0.001
Absent	22 (52.4)	78 (46.7)	156 (39.1)	74 (25.1)	20 (7.1)
Vessels in rim	4 (9.5)	33 (19.8)	64 (16.0)	48 (16.2)	35 (12.5)
Internal	16 (38.1)	56 (33.5)	179 (44.9)	174 (58.7)	226 (80.4)
Vascularity grade						<0.001
Grade I	22 (52.4)	78 (46.7)	155 (38.8)	74 (25.0)	21 (7.5)
Grade II	10 (23.8)	58 (34.7)	149 (37.3)	109 (36.8)	88 (31.3)
Grade III	3 (7.1)	26 (15.6)	61 (15.3)	77 (26.0)	116 (41.3)
Grade IV	7 (16.7)	5 (3.0)	34 (8.5)	36 (12.2)	56 (19.9)
Tumor size (mm)	14.08±7.16	14.59±5.87	17.29±8.71	20.91±10.84	25.90±12.0	<0.001
TBP (mm)	8.10±2.97	7.71±2.73	8.05±3.32	8.48±3.74	9.20±3.83	<0.001
RPT	0.52±0.12	0.51±0.12	0.52±0.14	0.51±0.15	0.49±0.23	0.453
RPF	1.73±1.01	1.79±1.55	1.93±1.70	1.84±1.86	1.60±1.82	0.207
CBC
B1	2 (4.8)	15 (9.0)	15 (3.8)	11 (3.7)	1 (0.4)	<0.001
B2	36 (85.7)	135 (80.8)	335 (84.0)	166 (56.1)	42 (14.9)
B3	3 (7.1)	10 (6.0)	12 (3.0)	23 (7.8)	9 (3.2)
B5	1 (2.4)	7 (4.2)	37 (9.3)	96 (32.4)	229 (81.5)

Data are presented as mean ± standard deviation or number (%). BI-RADS, Breast Imaging Reporting and Data System; BMI, body mass index; CBC, core needle biopsy category; RPF, thickness ratio of breast parenchyma to mammary fat; RPT, thickness ratio of breast parenchyma to tissue before pectoralis fascia; TBP, anteroposterior thickness of breast parenchyma.

Selection of features, construction, and performance of models

Following the best subset analysis, 10 features were selected for model construction, including age, BMI, shape, weight, orientation, margin, tumor size, BI-RADS category, vascularity distribution, and vascularity grade. No statistically significant differences were observed in the clinical characteristics or ultrasound features between the training and validation cohorts (Table S1).

These features were used to build models through five machine learning algorithms. The AUC values for each model are presented in Table 2. According to the AUC results, RF was the optimal algorithm, achieving the highest AUC of 0.943 [95% confidence interval (CI): 0.930–0.956]. The AUC values for the remaining algorithms were as follows: MLP, 0.916 (95% CI: 0.898–0.934); LR, 0.472 (95% CI: 0.435–0.509); KNN, 0.828 (95% CI: 0.802–0.854); and SVM, 0.909 (95% CI: 0.891–0.928).

Table 2

The AUCs of the five machine learning algorithms

Algorithms	AUC (95% CI)
MLP	0.916 (0.898–0.934)
RF	0.943 (0.930–0.956)
LR	0.472 (0.435–0.509)
KNN	0.828 (0.802–0.854)
SVM	0.909 (0.891–0.928)

AUC, area under the curve; CI, confidence interval; KNN, k-nearest neighbor; LR, logistic regression; MLP, multilayer perceptron; RF, random forest; SVM, support vector machine.

The feature importance ranking derived from RF model is reported in Figure 3. Based on the length of the bars in the bar chart in Figure 3, the most important feature was age, followed in descending order by tumor size, BMI, weight, BI-RADS, margin, vascularity distribution, vascularity grade, shape, and orientation.

Figure 3 Feature importance. Based on the length of the bars in the bar chart, the most important feature is age, followed in descending order by tumor size, BMI, weight, BI-RADS, margin, vascularity distribution, vascularity grade, shape, and orientation. BI-RADS, Breast Imaging Reporting and Data System; BMI, body mass index.

Probability of disease in BI-RADS category 3 and 4A lesions

The CBC prediction was calculated for each lesion, and these predictions were used to adjust (upgrade or downgrade) the BI-RADS categories. The number and rate of accurate and missed upgrades or downgrades are summarized in Table 3.

Table 3

BI-RADS upgrading and downgrading

Upgrade or downgrade	BI-RADS category 3 (n=42)	BI-RADS category 4A (n=167)
Upgrade one category	4	18
Downgrade one category	38	149
Missed upgrade	1 (2.4)	5 (3.0)
Accurate upgrade	3 (7.1)	13 (7.8)
Missed downgrade	1 (2.4)	4 (2.4)
Accurate downgrade	37 (88.1)	145 (86.8)

The data are presented as number or number (%). BI-RADS, Breast Imaging Reporting and Data System.

Among the 42 BI-RADS category 3 lesions, 4 (9.5%) were upgraded, with 3 being accurate, while 38 cases (90.5%) were downgraded, with 37 being accurate. The accurate upgrade rate (7.1%) was higher than the missed upgrade rate (2.4%), and the accurate downgrade rate (88.1%) was significantly higher than the missed downgrade rate (2.4%) (Figure 4).

Figure 4 Two cases of misdowngrading. (A,B) Invasive breast cancer in a 48-year-old patient. Ultrasound features: BI-RADS category 3, hypoechoic, oval shape, parallel orientation, circumscribed margin, combined posterior features, no calcification, internal vascularity, and grade Ⅲ vascularity. After regrading, the BI-RADS category 3 was misdowngraded to BI-RADS category 2. (C,D) Invasive breast cancer in a 64-year-old patient. Ultrasound features: BI-RADS category 4A, hypoechoic, irregular shape, parallel orientation, microlobulated margin, combined posterior features, no calcification, vessels in rim, and grade Ⅱ vascularity. After regrading, the BI-RADS category 4A was misdowngraded to BI-RADS category 3. The green boxes in (B) and (D) represent the color flow sampling frames in color Doppler ultrasound, which define the regions for acquiring blood flow signals. BI-RADS, Breast Imaging Reporting and Data System.

For the 167 BI-RADS category 4A lesions, 149 (89.2%) cases were downgraded, with 145 being accurate, and 18 (10.8%) cases were upgraded, with 13 being accurate. The accurate upgrade rate (7.8%) was higher than the missed upgrade rate (3.0%), and the accurate downgrade rate (86.8%) was significantly higher than the missed downgrade rate (2.4%) (Figure 4).

Discussion

Our study identified RF as the optimal machine learning algorithm for predicting CBC. We used the RF model to enhance the diagnostic performance of BI-RADS categories 3 and 4A, thereby reducing unnecessary biopsies for patients with category 4A lesions, as demonstrated in Table 3. First, for BI-RADS categories 3 and 4A, the misgrading rates were low (4.8% and 5.4%, respectively), with most of the cases being accurately graded (95.2% and 94.6%, respectively). The accurate upgrade and downgrade rates were significantly higher than were the missed rates. Meucci et al. analyzed the distribution of MRI features in CBC lesions but only included 61 cases (23). Meanwhile, Giuliani et al. analyzed multiple clinical and sonographic characteristics in 102 B3 lesions; however, their study did not comprehensively evaluate the association between B3 lesions and individual BI-RADS lexicon features (24). Our previous study also identified RF as the optimal machine learning algorithm for predicting CBC; however, we did not integrate RF into the practical diagnostic workflow to refine BI-RADS categorization (25). In the present study, we confirmed RF as the optimal algorithm using a larger sample size (1,185 nodules), rendering our findings more robust and practical as compared to those reported previously studies (23-25). Furthermore, among the 167 BI-RADS category 4A cases, 149 (89.2%) were downgraded, with 145 being correctly downgraded. These results suggest that our approach can avoid a significant number of unnecessary biopsies.

Although Wang et al. and Wei et al. used computer-aided diagnosis (CAD) systems to improve the performance of BI-RADS (26,27), their studies had several limitations. Their AUC values were lower than ours (0.91 and 0.906 vs. 0.945, respectively), and they only classified breast masses into malignant and benign categories, failing to address the issue of unnecessary biopsies for some nonmalignant masses (e.g., atypical lesions). Additionally, Wang et al.’s study had a small sample size, comprising only 54 malignant and 162 benign lesions (26). Meanwhile, Wei et al. used BI-RADS categories 4A and 4B as the cutoff for their CAD software (27); however, BI-RADS categorization is highly subjective and should not be used as the sole basis for setting diagnostic thresholds. In contrast, our study included a larger sample size (1,185 nodules) and refined the BI-RADS classification based on the objective standard of CBC, which are grounded in breast histological types. Consequently, our study can be considered more objective and to have greater clinical applicability.

Both the benign-or-malignant classification system and the CBC system are based on histological classification. However, the CBC system provides more precise guidance for the clinical management of breast lesions following biopsy, making it superior to the benign-or-malignant system. In practice, the clinical goal of both CBC and BI-RADS is to serve as the foundation for breast mass management. However, BI-RADS classification is subject to variability due to its reliance on human experience. Therefore, we believe that CBC is better suited for refining and improving the BI-RADS system.

Machine learning is highly advantageous for constructing predictive models and plays a pivotal role in radiological research (13,28). Among the various types of machine learning algorithms, the five employed in our study are widely used. Although MLP, KNN, and SVM had an inferior performance to that of RF, they were satisfactory, with AUC values exceeding 0.8. In contrast, LR had poor performance (AUC =0.472). LR, as a linear classifier, has several limitations: (I) it is unsuitable to solving nonlinear problems; (II) it is sensitive to multicollinearity in data; (III) it struggles to handle imbalanced datasets; and (IV) its accuracy is limited due to its simplistic structure, making it difficult to capture the true distribution of the data. Consequently, LR failed to achieve the classification objectives in our study. Both our previous and current studies identified RF as the optimal method for predicting CBC (25). In this study, we successfully integrated RF into the practical diagnostic workflow, achieving excellent performance. RF operates through ensemble learning, aggregating predictions from multiple decision trees and determining the final output category via majority voting among individual tree outputs. This ensemble approach grants RF a significant advantage in classification tasks.

Our study demonstrated strong reproducibility for several reasons. First, the sample size was substantially large, enhancing the reliability of our findings. Second, the results were obtained using SPSS Modeler software, which ensures robustness due to its fixed seed number for randomization.

Three cases (7.1%) of BI-RADS category 3 lesions were accurately upgraded, representing a prevention of misdiagnoses that would require biopsy. However, 4 cases (2.4%) of BI-RADS category 4A lesions were incorrectly downgraded. This discrepancy may be attributed to the highly ambiguous features in these cases, such as younger age, regular shape, absence of calcification, and other confounding factors. Consequently, the RF model requires further refinement so that its performance in reclassifying BI-RADS category 4A lesions can be improved.

Our study involved several limitations that should be addressed. First, it was conducted at a single center, and the sample size for the B3 category was relatively small (n=42). The generalizability of our RF model requires validation in larger, multicenter cohorts. Second, because of the inherent limitations of retrospective data, other clinical risk factors (e.g., serological markers, family history, and menopausal status) were not included. Future prospective studies with comprehensive datasets are recommended to address this deficiency. Third, the assessment of histological features relied on subjective analysis, potentially introducing bias. To mitigate this, the incorporation of objective parameters, such as ultrasound radiomics, should be explored to enhance the model’s accuracy. Finally, although RF demonstrated the potential to improve upon the BI-RADS classification, further research is needed to integrate it into electronic ultrasound systems for practical clinical application.

Conclusions

Based on a comprehensive array of clinical and ultrasonographic characteristics, machine learning algorithms were employed to assess CBC in solid breast lesions. Among the evaluated models, RF demonstrated superior performance in predicting CBC, achieving the highest AUC. Subsequently, this predictive model was effectively used to refine the BI-RADS category classification for 3 and 4A lesions. Our findings indicate that the model-assisted approach can significantly enhance grading accuracy, reduce the number of unnecessary biopsy procedures, and minimize the misdiagnosis of malignant tumors.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2070/rc

Funding: This work was supported by the National Center for Inheritance and Innovation of Traditional Chinese Medicine Research Special Project (No. 2022QN18).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2070/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Ethical approval for this study was granted by the institutional review board of Guangdong Provincial People’s Hospital (No. KY2023-1069-01), who waived the requirement for informed consent due to the retrospective nature of the analysis.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Neal L, Sandhu NP, Hieken TJ, Glazebrook KN, Mac Bride MB, Dilaveri CA, Wahner-Roedler DL, Ghosh K, Visscher DW. Diagnosis and management of benign, atypical, and indeterminate breast lesions detected on core needle biopsy. Mayo Clin Proc 2014;89:536-47. [Crossref] [PubMed]
Rungruang B, Kelley JL 3rd. Benign breast diseases: epidemiology, evaluation, and management. Clin Obstet Gynecol 2011;54:110-24. [Crossref] [PubMed]
Sardanelli F, Boetes C, Borisch B, Decker T, Federico M, Gilbert FJ, et al. Magnetic resonance imaging of the breast: recommendations from the EUSOMA working group. Eur J Cancer 2010;46:1296-316. [Crossref] [PubMed]
Yap YS, Lu YS, Tamura K, Lee JE, Ko EY, Park YH, Cao AY, Lin CH, Toi M, Wu J, Lee SC. Insights Into Breast Cancer in the East vs the West: A Review. JAMA Oncol 2019;5:1489-96. [Crossref] [PubMed]
Shen S, Zhou Y, Xu Y, Zhang B, Duan X, Huang R, Li B, Shi Y, Shao Z, Liao H, Jiang J, Shen N, Zhang J, Yu C, Jiang H, Li S, Han S, Ma J, Sun Q. A multi-centre randomised trial comparing ultrasound vs mammography for screening breast cancer in high-risk Chinese women. Br J Cancer 2015;112:998-1004. [Crossref] [PubMed]
Chae EY, Cha JH, Shin HJ, Choi WJ, Kim HH. Reassessment and Follow-Up Results of BI-RADS Category 3 Lesions Detected on Screening Breast Ultrasound. AJR Am J Roentgenol 2016;206:666-72. [Crossref] [PubMed]
Barr RG, Zhang Z, Cormack JB, Mendelson EB, Berg WA. Probably benign lesions at screening breast US in a population with elevated risk: prevalence and rate of malignancy in the ACRIN 6666 trial. Radiology 2013;269:701-12. [Crossref] [PubMed]
Weng L, Yu M. Diagnosis of Benign and Malignant BI-RADS 4 Breast Masses by Contrastenhanced Ultrasound Combined with Shear Wave Elastography. Curr Med Imaging 2023; Epub ahead of print. [Crossref]
Zhao XB, Yao JY, Zhou XC, Hao SY, Mu WJ, Li LJ, Zhong WJ, Hui Z. Strain Elastography: A Valuable Additional Method to BI-RADS? Ultraschall Med 2018;39:526-34. [Crossref] [PubMed]
Lee A, Anderson N, Carder P, Cooke J, Deb R, Ellis IO, Howe M, Jenkins JA, Knox F, Stephenson T. Guidelines for non-operative diagnostic procedures and reporting in breast cancer screening. London, UK: The Royal College of Pathologists; 2016.
Achilonu OJ, Fabian J, Bebington B, Singh E, Eijkemans MJC, Musenge E. Predicting Colorectal Cancer Recurrence and Patient Survival Using Supervised Machine Learning Approach: A South African Population-Based Study. Front Public Health 2021;9:694306. [Crossref] [PubMed]
Yu KH, Lee TM, Yen MH, Kou SC, Rosen B, Chiang JH, Kohane IS. Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation. J Med Internet Res 2020;22:e16709. [Crossref] [PubMed]
Zhang B, Tian J, Pei S, Chen Y, He X, Dong Y, Zhang L, Mo X, Huang W, Cong S, Zhang S. Machine Learning-Assisted System for Thyroid Nodule Diagnosis. Thyroid 2019;29:858-67. [Crossref] [PubMed]
Bitencourt AGV, Gibbs P, Rossi Saccarelli C, Daimiel I, Lo Gullo R, Fox MJ, Thakur S, Pinker K, Morris EA, Morrow M, Jochelson MS. MRI-based machine learning radiomics can predict HER2 expression level and pathologic response after neoadjuvant therapy in HER2 overexpressing breast cancer. EBioMedicine 2020;61:103042. [Crossref] [PubMed]
Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography. J Am Coll Cardiol 2016;68:2287-95. [Crossref] [PubMed]
Nanayakkara S, Fogarty S, Tremeer M, Ross K, Richards B, Bergmeir C, Xu S, Stub D, Smith K, Tacey M, Liew D, Pilcher D, Kaye DM. Characterising risk of in-hospital mortality following cardiac arrest using machine learning: A retrospective international registry study. PLoS Med 2018;15:e1002709. [Crossref] [PubMed]
Sutton EJ, Onishi N, Fehr DA, Dashevsky BZ, Sadinski M, Pinker K, Martinez DF, Brogi E, Braunstein L, Razavi P, El-Tamer M, Sacchini V, Deasy JO, Morris EA, Veeraraghavan H. A machine learning model that classifies breast cancer pathologic complete response on MRI post-neoadjuvant chemotherapy. Breast Cancer Res 2020;22:57. [Crossref] [PubMed]
Wu T, Sultan LR, Tian J, Cary TW, Sehgal CM. Machine learning for diagnostic ultrasound of triple-negative breast cancer. Breast Cancer Res Treat 2019;173:365-73. [Crossref] [PubMed]
Panourgias E, Karampotsis E, Douma N, Bourgioti C, Koutoulidis V, Rigas G, Moulopoulos L, Dounias G. Accuracy of distinguishing benign, high-risk lesions and malignancies with inductive machine learning models in BIRADS 4 and BIRADS 5 lesions on breast MR examinations. Eur J Radiol 2024;181:111801. [Crossref] [PubMed]
Bahl M, Barzilay R, Yedidia AB, Locascio NJ, Yu L, Lehman CD. High-Risk Breast Lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision. Radiology 2018;286:810-8. [Crossref] [PubMed]
Lin X, Zhuang S, Yang S, Lai D, Chen M, Zhang J. Development and internal validation of a conventional ultrasound-based nomogram for predicting malignant nonmasslike breast lesions. Quant Imaging Med Surg 2022;12:5452-61. [Crossref] [PubMed]
Adler DD, Carson PL, Rubin JM, Quinn-Reid D. Doppler ultrasound color flow imaging in the study of breast cancer: preliminary findings. Ultrasound Med Biol 1990;16:553-9. [Crossref] [PubMed]
Meucci R, Pistolese Chiara A, Perretta T, Vanni G, Portarena I, Manenti G, Ryan Colleen P, Castrignanò A, Di Stefano C, Ferrari D, Lamacchia F, Pellicciaro M, Materazzo M, Buonomo Oreste C. MR imaging-guided vacuum assisted breast biopsy: Radiological-pathological correlation and underestimation rate in pre-surgical assessment. Eur J Radiol Open 2020;7:100244. [Crossref] [PubMed]
Giuliani M, Rinaldi P, Rella R, D’Angelo A, Carlino G, Infante A, Romani M, Bufi E, Belli P, Manfredi R. A new risk stratification score for the management of ultrasound-detected B3 breast lesions. Breast J 2018;24:965-70. [Crossref] [PubMed]
Liang T, Shen J, Wang J, Liao W, Zhang Z, Liu J, Feng Z, Pei S, Liu K. Ultrasound-based prediction of preoperative core biopsy categories in solid breast tumor using machine learning. Quant Imaging Med Surg 2023;13:2634-46. [Crossref] [PubMed]
Wang Y, Tang L, Chen P, Chen M. The Role of a Deep Learning-Based Computer-Aided Diagnosis System and Elastography in Reducing Unnecessary Breast Lesion Biopsies. Clin Breast Cancer 2023;23:e112-21. [Crossref] [PubMed]
Wei Q, Yan YJ, Wu GG, Ye XR, Jiang F, Liu J, Wang G, Wang Y, Song J, Pan ZP, Hu JH, Jin CY, Wang X, Dietrich CF, Cui XW. The diagnostic performance of ultrasound computer-aided diagnosis system for distinguishing breast masses: a prospective multicenter study. Eur Radiol 2022;32:4046-55. [Crossref] [PubMed]
Lu CF, Hsu FT, Hsieh KL, Kao YJ, Cheng SJ, Hsu JB, Tsai PH, Chen RJ, Huang CC, Yen Y, Chen CY. Machine Learning-Based Radiomics for Molecular Subtyping of Gliomas. Clin Cancer Res 2018;24:4429-36. [Crossref] [PubMed]

Cite this article as: Shen J, Huang J, Ye X, Wu L, Wang J, Chen F, Zhou X, Liu K, Huang C, Liang T. Random forest with preoperative core biopsy categories: a novel method for refining ultrasonic Breast Imaging Reporting and Data System evaluation. Quant Imaging Med Surg 2025;15(6):5362-5372. doi: 10.21037/qims-24-2070

Random forest with preoperative core biopsy categories: a novel method for refining ultrasonic Breast Imaging Reporting and Data System evaluation

Introduction

Methods

Participants

Clinical features and ultrasonic image acquisition

Core biopsy reporting categories

Statistical analysis

Machine learning in characteristics analysis

Performance of the machine learning models

Up- and downgrading in BI-RADS

Results

Clinical and ultrasonic characteristics

Table 1

Selection of features, construction, and performance of models

Table 2

Probability of disease in BI-RADS category 3 and 4A lesions

Table 3

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share