Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging
Original Article

Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging

Samuel B. Ogunlade1 ORCID logo, Lian Wang2, Santo Maimone2, Kristin A. Robinson2, Kaitlin M. Moran3, Amie Leon2, Andrey P. Morozov2, Chidi T. Nwachukwu4, Haley P. Letter2

1Division of Interventional Radiology, Department of Radiology, Mayo Clinic, Jacksonville, FL, USA; 2Department of Radiology, Mayo Clinic, Jacksonville, FL, USA; 3Department of Internal Medicine, Mayo Clinic, Jacksonville, FL, USA; 4Department of Radiology, Mayo Clinic, Rochester, MN, USA

Contributions: (I) Conception and design: HP Letter, SB Ogunlade, KA Robinson, S Maimone; (II) Administrative support: S Maimone, HP Letter, KM Moran; (III) Provision of study materials or patients: HP Letter, KM Moran, A Leon, AP Morozov, L Wang, CT Nwachukwu; (IV) Collection and assembly of data: All authors; (V) Data analysis and interpretation: SB Ogunlade, HP Letter; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Haley P. Letter, MD. Department of Radiology, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA. Email: Letter.haley@mayo.edu.

Background: Image-based artificial intelligence (AI) risk models can estimate short-term breast cancer risk directly from mammograms and may outperform traditional questionnaire-based tools. However, risk stratification remains particularly challenging in women with dense breasts who do not otherwise meet high-risk criteria. At our institutions, molecular breast imaging (MBI) is used as supplemental screening for this population. This study evaluated the performance and clinical utility of a mammography-based AI risk model (iCAD ProFound AI® Risk) in predicting short-term breast cancer risk among women with dense breasts undergoing MBI.

Methods: This retrospective IRB-approved study included 416 non-actionable (BI-RADS category 1 or 2) screening digital breast tomosynthesis mammograms (BI-RADS C–D density) obtained from 2018 to 2023, all followed by MBI within one year. The cohort comprised 70 cancer cases (16.8%) and 346 (83.2%) non-cancer controls. Mammograms were retrospectively processed using the ProFound AI® Risk model to generate 1-year risk and density scores. Tyrer-Cuzick and Gail model scores were computed for comparison. Group differences were assessed using t-tests and effect sizes, and model discrimination was evaluated with ROC analysis using area under the curve (AUC), sensitivity, specificity, and 95% confidence intervals (CIs).

Results: Across the full cohort, mean AI risk scores were higher in cancer cases than controls (0.41±0.35 vs. 0.37±0.21), although this difference was not statistically significant (P=0.239; Cohen’s d=0.23). Subgroup analyses demonstrated progressively stronger discriminatory performance with increasing breast density. The greatest separation was observed in women with extremely dense breasts (category D), where the AI model achieved an AUC of 0.75 (95% CI: 0.61–0.89; P=0.049), with 69.3% sensitivity and 61.1% specificity at a threshold of 0.14. Effect size in this group was the largest (d=0.41). In contrast, traditional models showed limited and non-significant discrimination across all density categories, with AUC values ranging from 0.54 to 0.63. When stratified by cancer subtype, the AI model produced significantly higher risk scores in invasive lobular carcinoma (ILC) compared with controls (0.69±0.46 vs. 0.41±0.32; P=0.048; d=0.56). Although differences in ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC) were not significant, risk scores trended higher for cancer cases. A similar pattern of increasing AI-estimated risk was observed with higher tumor grade, with the strongest separation seen in grade 2 cancers (P=0.089).

Conclusions: Although overall differences between cancer and non-cancer groups were not statistically significant, the mammography-based AI risk model demonstrated meaningful and statistically significant discrimination in women with extremely dense breasts, outperforming both Tyrer-Cuzick and Gail models. The AI model also showed better separation in ILC and in higher-grade tumors. These findings support the role of image-based AI tools in refining risk assessment in women for whom mammography is least effective and in guiding more targeted use of supplemental MBI screening.

Keywords: Artificial intelligence (AI); breast cancer risk; dense breasts; molecular breast imaging (MBI); supplemental screening


Submitted Jul 29, 2025. Accepted for publication Dec 17, 2025. Published online Jan 23, 2026.

doi: 10.21037/qims-2025-1650


Introduction

Breast cancer remains a leading cause of morbidity and mortality among women worldwide, with an estimated 2.3 million new cases and 685,000 deaths globally in 2020 alone (1). Early detection is central to reducing mortality, and mammography has long served as the cornerstone of population-level breast cancer screening. However, the sensitivity of mammography is significantly reduced in women with dense breast tissue, which not only obscures lesions but is also an independent risk factor for breast cancer (2-4). Dense breast tissue, classified by the Breast Imaging Reporting and Data System (BI-RADS) as heterogeneously or extremely dense, is present in approximately 43.3% of women aged 40 to 74 years undergoing screening mammography, with the proportion inversely associated with age and body mass index (BMI), and corresponding to an estimated 27.6 million women in the United States (5).

To address the limitations of mammography in women with dense breasts, supplemental imaging modalities such as breast ultrasound, magnetic resonance imaging (MRI), and molecular breast imaging (MBI) have been utilized. While these modalities can improve cancer detection rates, they also present challenges, including higher false-positive rates, increased healthcare costs, and patient anxiety (6,7). As a result, refining selection criteria for supplemental imaging remains a clinical priority. Triaging women with dense breast tissue without additional high-risk features, such as family history or genetic mutations, continues to pose a significant clinical challenge.

MBI is available at our institution as a supplemental screening option for women with dense breasts who do not otherwise meet high-risk criteria. The use of artificial intelligence (AI)-based risk models in this context could help refine patient selection for MBI, identifying those who may benefit most from supplemental screening based on intrinsic image-derived risk signatures rather than questionnaire data alone. Current risk stratification models, such as the Gail and Tyrer-Cuzick (TC) models, incorporate demographic and clinical risk factors, including breast density. However, these tools often underperform in diverse clinical populations and may not fully capture imaging-based risk indicators (8,9). Recent advances in AI offer a new approach by using image-based AI risk models that analyze mammograms directly to assess short-term breast cancer risk. These AI models have demonstrated superior performance compared with traditional questionnaire-based models, particularly in predicting interval and near-term cancers in high-risk patients (10,11).

In this context, the iCAD ProFound AI® Risk model—a deep convolutional neural network (CNN) trained on over 13,000 screening mammograms from multi-site international cohorts—provides a 1-year absolute risk estimate directly from mammographic data for high-risk patients. Importantly, none of the patient data from the three participating centers in this study was included in the AI model’s training or validation datasets; thus, the model’s performance here represents independent validation.

This study aims to determine whether the AI models can enhance risk stratification and inform more targeted supplemental screening in clinical practice. If effective, such tools could minimize the overuse of imaging, reduce patient burden, and improve early detection, thereby supporting a precision medicine approach to breast cancer screening. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/rc).


Methods

Study design and setting

This retrospective observational study was approved by the Institutional Review Board of Mayo Clinic (No. 23-007303) and was conducted in accordance with the Declaration of Helsinki and its subsequent amendments across three affiliated academic breast imaging centers of Mayo Clinic between January 2018 and December 2023. The study aimed to evaluate the performance of a mammogram-based AI risk model in predicting short-term breast cancer risk in intermediate-risk women with dense breasts undergoing supplemental screening with MBI. The requirement for informed consent was waived due to the retrospective nature of the data collection and analysis. The study was conducted in compliance with the Health Insurance Portability and Accountability Act.

Study population

A cohort of 416 women was retrospectively reviewed in this study. All patients underwent screening digital breast tomosynthesis (DBT) mammography between 2018 and 2023. Eligible participants had a screening mammogram with a benign or negative result and underwent a supplemental screening MBI study performed either on the same day or within one year of the index mammogram. All included women had heterogeneously dense or extremely dense breast tissue, as classified by the interpreting radiologist using the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) qualitative density categories (12,13). The patients in the cancer group had a cancer diagnosis within 2 years following the benign or negative mammogram. Additionally, participants were classified as intermediate risk based on the TC lifetime risk model score <20%.

Exclusion criteria included a personal history of breast cancer at the time of the index screening or classification as high risk by TC score >20%. This ensured that the study population remained focused on evaluating AI model performance specifically within a non-high-risk, dense breast cohort.

From the cohort of 416 women, 70 (16.8%) were diagnosed with histologically confirmed breast cancer within 2 years of the index mammogram. The remaining 346 women (83.2%) served as cancer-free controls, having no diagnosis of breast cancer within at least 2 years of follow-up.

Imaging protocols

DBT

All patients underwent bilateral screening DBT using full-field digital mammography (FFDM) systems (Hologic Inc., Marlborough, MA, USA). Images were interpreted by board-certified breast radiologists using the ACR BI-RADS lexicon. Only those mammograms initially classified as BI-RADS 1 or 2 were included.

MBI

MBI was performed using dual-head gamma camera systems following intravenous administration of 300 MBq (approximately 8 mCi) of Tc-99m sestamibi. Standard two-view imaging [craniocaudal (CC) and mediolateral oblique (MLO)] of each breast was acquired with mild compression. All MBI examinations were interpreted independently by experienced breast imaging radiologists, with AI risk scores unavailable at the time of interpretation.

At our institution, MBI is offered specifically as a supplemental screening modality for women with dense breasts and no additional high-risk features. This institutional policy motivated the present investigation to determine whether image-based AI risk estimation could optimize the selection of women who would most benefit from MBI, thereby improving cost-effectiveness and reducing unnecessary exposure.

AI risk model

The mammograms from all patients in this study were retrospectively processed using a commercially available, image-based breast cancer risk assessment algorithm—the iCAD ProFound AI® Risk model (iCAD Inc., Nashua, NH, USA). This model, developed in collaboration with researchers at the Karolinska Institute (Sweden), is an individualized, image-driven AI system capable of generating short-term breast-cancer risk estimates directly from FFDM and DBT.

The ProFound AI® Risk model is built upon a multi-stage deep CNN framework. In its first stage, convolutional layers automatically extract and encode high-dimensional image features from screening mammograms, including tissue density, parenchymal texture, structural asymmetry, micro-calcifications, and architectural distortions. These low-level image features are progressively abstracted through multiple convolutional blocks, pooling, and activation layers. The resulting feature maps are then aggregated across both the CC and MLO views of each breast, allowing the model to form a comprehensive spatial context representation.

A subsequent feature-fusion layer combines the learned bilateral representations and feeds them into a set of fully connected (dense) layers, which integrate image-derived features with demographic variables such as age, race, and geographic region. The network’s final activation layer produces a continuous 1-year absolute risk probability, scaled into four interpretive categories (low, general, moderate, and high). This output is accompanied by an AI-predicted breast density score: least dense (a), dense (b), more dense (c), and most dense (d).

The ProFound AI® Risk model was trained and validated on large, multi-institutional datasets that are entirely independent of the patient population used in the present study. The FFDM network was trained on 974 biopsy-proven cancer cases and 9,376 non-cancer controls, while the DBT network used 563 cancers and 3,609 controls, collected from multiple vendors and imaging centers to ensure robustness across acquisition protocols (14). The mammograms analyzed in this study were not included in any stage of model training or validation, ensuring a true external validation design.

During training, supervised learning was performed using binary cross-entropy loss, with ground-truth cancer outcomes as labels. The CNN weights were optimized via back-propagation with stochastic gradient descent and momentum, and early stopping was used to prevent overfitting. Model evaluation on held-out test sets yielded an area under the curve (AUC) of 0.73 for FFDM and 0.80 for DBT, outperforming traditional risk-assessment tools such as TC and Gail models.

In this study, only index (baseline) screening mammograms interpreted as benign or negative (BI-RADS 1–2) were processed by the AI model. These exams were obtained prior to the subsequent cancer diagnosis, ensuring that the algorithm evaluated pre-diagnostic images rather than mammograms displaying visible tumor signs. The AI-generated risk scores, therefore, represent predicted probabilities derived solely from the initial, ostensibly normal screening studies.

Thus, ProFound AI® Risk functions as an end-to-end deep-learning pipeline that extracts latent imaging biomarkers from screening mammograms and integrates them with minimal demographic input to generate an objective 1-year breast cancer risk score. By excluding our institutional data from its training set and applying the algorithm only to baseline benign/negative images, the current study provides an independent, retrospective validation of this model’s performance in women with dense breasts.

Traditional risk models for comparison

Tyrer-Cuzick (version 8) and Gail (version 2) models were included to enable direct comparison with the AI-based risk estimates. Risk calculations were performed during routine clinical assessment at each patient’s initial clinic visit, using the validated online calculators—the IBIS Breast Cancer Risk Evaluation Tool (Tyrer-Cuzick) version 8 and the NCI Breast Cancer Risk Assessment Tool (Gail). These scores were based on patient-provided information obtained at intake, including age, age at menarche, parity, age at first childbirth, family history of breast cancer, prior biopsies, hormonal therapy use, and BI-RADS breast-density category. For this study, the previously computed TC and Gail scores were extracted directly from the electronic medical record (EMR) without recalculation and verified by two independent reviewers for accuracy and completeness. The TC model provided lifetime and 10-year risk estimates, whereas the Gail model generated 5-year and lifetime risk scores.

Statistical analysis

Descriptive statistics, including means and standard deviations, were calculated for demographic and imaging characteristics in the cancer and non-cancer groups. Independent t-tests were used to compare continuous variables, including AI risk scores, TC scores, and Gail model scores, between the two groups. A two-tailed P value less than 0.05 was considered statistically significant. Cohen’s d effect sizes were also computed to quantify the magnitude of mean-score differences, complementing the P values.

Subgroup analyses were conducted according to breast density categories as determined by the AI model (categories A, B, C, and D) to assess whether model performance differed by density. Further subgroup analysis within the cancer group was performed based on histologic subtype, including ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC). Furthermore, cancer cases were analyzed by tumor grade to assess whether risk model scores varied according to tumor differentiation. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of the AI model. The sensitivity and specificity were also determined. All statistical analyses were conducted using the International Business Machines (IBM) Corporation Statistical Package for the Social Sciences (SPSS) Statistics for Windows, Version 28.0 (IBM Corporation, Armonk, New York, USA).


Results

Patient characteristics

A total of 416 patients were included in the study, comprising 70 (16.8%) cancer cases (sample case in Figure 1) and 346 (83.2%) non-cancer controls (sample case in Figure 2). The overall mean age was 61.12±10.41 years. The mean age in the cancer group was 61.01±10.81 years, while the non-cancer group had a similar mean age of 61.14±10.34 years. Based on AI-derived breast density, 1 patient (0.2%) was classified as category A, 124 (29.8%) as category B, 210 (50.5%) as category C, and 81 (19.5%) as category D. Among the cancer cases, histologic subtypes included 10 patients (14.3%) with DCIS, 46 (65.7%) with IDC, and 14 (20.0%) with ILC. Tumor grade, assessed using the Nottingham grading system, showed that 19 patients (27.1%) had grade 1 tumors, 32 (45.7%) had grade 2, and 19 (27.1%) had grade 3. Figures 1,2 show illustrative examples from the study cohort: Figure 1 presents a representative case from the cancer group and Figure 2 presents a representative non-cancer case. These examples demonstrate how the AI-derived risk scores vary with image appearance and parenchymal density

Figure 1 Sample case from cancer cohort: 71-year-old with bilateral screening mammogram craniocaudal images (left) read as negative (BI-RADS 1). Retrospective analysis by the AI risk tool demonstrated a high 1-year risk score of 1.24% (middle). MBI images (right) with focal area of radiotracer uptake in the inner central left breast (red circle). Ultrasound-guided core needle biopsy (not shown) demonstrated a grade 2 IDC. BI-RADS, Breast Imaging Reporting and Data System; IDC, invasive ductal carcinoma; MBI, molecular breast imaging.
Figure 2 Sample case from non-cancer cohort: 68-year-old with bilateral screening mammogram. Craniocaudal images (left) read as negative (BI-RADS 1). Retrospective analysis by the AI risk tool demonstrated a low 1-year risk score of 0.08% (middle). MBI images (right) showed no abnormal radiotracer uptake. No malignancy was identified on follow-up imaging or clinical examination during the 12-month surveillance period. BI-RADS, Breast Imaging Reporting and Data System; MBI, molecular breast imaging.

Risk score across models

The average risk scores were higher in the cancer group than in the non-cancer group across all models. However, neither the conventional risk models nor the AI model demonstrated statistically significant differences between the groups. Among all models, the AI model had the lowest P value (0.239) and the largest effect size (d=0.23). While not statistically significant, the AI model showed a trend towards greater separation between the groups (Table 1).

Table 1

Overall risk score comparison across models

Model Cancer (mean ± SD) Non-cancer (mean ± SD) Mean difference (95% CI) Cohen’s d P value
AI (%) 0.41±0.35 0.37±0.21 0.04 (−0.03 to 0.10) 0.23 0.239
TC lifetime (%) 20.1±7.5 18.2±9.4 1.9 (−1.3 to 5.1) 0.12 0.307
TC 10-year (%) 7.4±4.6 5.7±3.8 1.7 (−0.9 to 4.3) 0.10 0.396
Gail lifetime (%) 14.2±7.1 14.7±6.4 −0.5 (−2.7 to 1.7) 0.05 0.819
Gail 5-year (%) 2.9±2.7 3.2±1.6 −0.3 (−0.9 to 0.3) 0.07 0.702

P<0.05 denotes significance. AI, artificial intelligence; CI, confidence interval; SD, standard deviation; TC, Tyrer-Cuzick.

Comparative assessments of risk model discrimination across patient and tumor characteristics

Across breast density categories B to D, the AI model demonstrated a consistent trend of higher mean risk scores in cancer patients compared to non-cancer controls. Notably, there was a visible pattern of increasing separation between cancer and non-cancer scores with higher breast density. While the difference was minimal in category B and modest in category C, the separation was most pronounced in category D, where the AI model approached statistical significance and showed a larger effect size (P=0.080; d=0.41) (Table 2). The TC and Gail models also showed some separation across density categories, particularly at higher densities, but the magnitude and consistency of the differences were smaller than those of the AI model. None of their comparisons reached statistical significance. These findings suggest that the AI model may offer improved discriminatory power, particularly in women with extremely dense breasts.

Table 2

Risk model discrimination across patient and tumor characteristics

Subgroup Group AI model TC lifetime model TC 10-year model Gail lifetime model Gail 5-year model
Breast density
   B Cancer (n=15) 0.49±0.32 18.5±7.2 7.1±4.1 14.9±6.5 3.1±1.8
Non-cancer (n=109) 0.43±0.37 16.9±8.5 6.8±3.9 14.7±6.0 3.0±1.4
MD (95% CI) 0.06 (–0.12 to 0.24) 1.6 (–2.5 to 5.7) 0.3 (–1.5 to 2.1) 0.2 (–1.9 to 2.3) 0.1 (–0.5 to 0.7)
P value 0.883 0.898 0.946 0.961 0.972
Cohen’s d 0.17 0.20 0.09 0.04 0.07
   C Cancer (n=46) 0.52±0.35 21.0±7.6 7.4±4.6 15.1±7.4 3.2±2.1
Non-cancer (n=164) 0.41±0.22 15.2±9.1 5.6±3.7 14.3±6.1 3.1±1.5
MD (95% CI) 0.11 (–0.05 to 0.27) 5.8 (2.0 to 9.6) 1.8 (0.1 to 3.5) 0.8 (–1.2 to 2.8) 0.1 (–0.4 to 0.6)
P value 0.358 0.611 0.727 0.775 0.901
Cohen’s d 0.35 0.40 0.32 0.12 0.06
   D Cancer (n=14) 0.49±0.21 20.9±8.1 7.3±4.7 14.1±7.6 3.0±1.7
Non-cancer (n=67) 0.33±0.24 14.4±9.3 5.4±4.1 13.9±6.0 2.8±2.4
MD (95% CI) 0.16 (0.02 to 0.30) 6.5 (1.9 to 11.1) 1.9 (0.2 to 3.6) 0.2 (–2.1 to 2.5) 0.2 (–0.9 to 1.3)
P value 0.080 0.161 0.194 0.870 0.889
Cohen’s d 0.41 0.44 0.38 0.05 0.09
Cancer type
   DCIS Cancer (n=10) 0.45±0.37 18.9±7.1 6.9±4.1 14.7±6.5 3.1±1.5
Non-cancer (n=346) 0.41±0.33 17.4±8.2 6.5±3.4 14.5±6.0 3.0±2.2
MD (95% CI) 0.04 (–0.14 to 0.22) 1.5 (–2.8 to 5.8) 0.4 (–1.3 to 2.1) 0.2 (–1.8 to 2.2) 0.1 (–0.6 to 0.8)
P value 0.643 0.912 0.894 0.924 0.958
Cohen’s d 0.10 0.18 0.12 0.05 0.04
   IDC Cancer (n=46) 0.49±0.36 20.6±7.9 7.0±4.5 14.9±7.3 3.0±1.6
Non-cancer (n=346) 0.31±0.12 17.2±9.1 5.7±4.0 14.6±6.2 2.9±2.6
MD (95% CI) 0.18 (0.06 to 0.30) 3.4 (0.1 to 6.7) 1.3 (–0.4 to 3.0) 0.3 (–1.9 to 2.5) 0.1 (–0.8 to 1.0)
P value 0.765 0.804 0.869 0.917 0.901
Cohen’s d 0.47 0.37 0.29 0.06 0.05
   ILC Cancer (n=14) 0.69±0.46 21.4±6.9 7.1±4.6 14.4±7.2 3.1±1.7
Non-cancer (n=346) 0.41±0.32 17.1±9.4 5.1±3.9 14.3±6.1 2.7±2.5
MD (95% CI) 0.28 (0.06 to 0.50) 4.3 (0.7 to 7.9) 2.0 (0.2 to 3.8) 0.1 (–2.2 to 2.4) 0.4 (–0.8 to 1.6)
P value 0.048 0.124 0.159 0.947 0.782
Cohen’s d 0.56 0.44 0.41 0.03 0.14
Tumor grade
   1 Cancer (n=19) 0.39±0.28 19.2±7.1 6.8±4.2 14.5±6.1 3.1±2.3
Non-cancer (n=346) 0.37±0.31 17.3±8.7 5.9±3.5 14.2±6.7 3.0±1.6
MD (95% CI) 0.02 (–0.14 to 0.18) 1.9 (–1.8 to 5.6) 0.9 (–0.8 to 2.6) 0.3 (–1.8 to 2.4) 0.1 (–0.7 to 0.9)
P value 0.857 0.921 0.899 0.933 0.904
Cohen’s d 0.07 0.23 0.19 0.06 0.05
   2 Cancer (n=32) 0.49±0.41 21.1±7.9 7.3±4.5 15.2±7.4 3.3±2.4
Non-cancer (n=346) 0.37±0.40 15.8±9.0 5.4±3.9 14.3±6.2 3.1±1.7
MD (95% CI) 0.12 (–0.04 to 0.28) 5.3 (1.7 to 8.9) 1.9 (0.3 to 3.5) 0.9 (–1.1 to 2.9) 0.2 (–0.5 to 0.9)
P value 0.089 0.174 0.193 0.798 0.747
Cohen’s d 0.33 0.45 0.38 0.11 0.09
   3 Cancer (n=19) 0.50±0.44 20.3±8.1 7.4±4.6 14.2±6.3 3.1±1.6
Non-cancer (n=346) 0.37±0.40 15.9±9.4 5.2±3.8 14.0±7.3 2.9±2.5
MD (95% CI) 0.13 (–0.07 to 0.33) 4.4 (–0.3 to 9.1) 2.2 (0.1 to 4.3) 0.2 (–2.2 to 2.6) 0.2 (–0.8 to 1.2)
P value 0.976 0.981 0.953 0.918 0.882
Cohen’s d 0.29 0.41 0.36 0.06 0.08

Values are presented as mean ± SD. AI, artificial intelligence; CI, confidence interval; DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma; ILC, invasive lobular carcinoma; MD, mean difference (cancer – non-cancer); SD, standard deviation; TC, Tyrer-Cuzick.

When stratified by cancer histology, the AI model consistently showed higher mean risk scores in patients with DCIS, IDC, and ILC than in controls. Although the differences were not significant in the DCIS and IDC subgroups, a significant separation was observed in the ILC group (P=0.048; d=0.56), indicating the AI model’s potential advantage in identifying risk in this subtype. The TC and Gail models also showed a trend toward higher scores in cancer cases across subtypes, but the differences were smaller, and none reached statistical significance (Table 2).

Subgroup analysis by tumor grade revealed a similar trend (Table 2). The AI model showed progressively higher mean scores with increasing tumor grade in cancer patients, while maintaining relatively stable scores in non-cancer controls. Although the comparisons did not reach statistical significance, the greatest separation occurred in grade 2 tumors (P=0.089), suggesting that the model may capture imaging features associated with biologically relevant tumor aggressiveness. Traditional models demonstrated parallel trends, with slightly higher scores in higher-grade tumors, but the differences were less pronounced and failed to significantly distinguish between cancer and control groups.

Collectively, these subgroup findings reinforce the superior performance of the image-based AI model over traditional risk assessment tools, particularly in high-density breast tissue and certain tumor subgroups, while acknowledging that traditional models also showed some separation, albeit to a lesser extent.

Diagnostic accuracy of the AI and conventional risk models by breast density

The diagnostic performance of the AI model and conventional risk models across breast-density categories is summarized in Tables 3,4. As shown in Table 3, the AI model exhibited progressively higher discriminatory performance with increasing breast density, reaching its best accuracy in category D [AUC =0.75 (95% CI: 0.61–0.89); P=0.049] with a sensitivity of 69.3% and specificity of 61.1% at a threshold of 0.14. In contrast, the Tyrer-Cuzick and Gail models (Table 4) demonstrated relatively flat AUC profiles across density categories (range ≈0.54–0.63) without a density-dependent gain in performance. This comparative pattern, consistent with the trends observed in Table 2, underscores that the AI-based model uniquely improved diagnostic accuracy in extremely dense breasts, whereas conventional risk models remained largely unchanged.

Table 3

Diagnostic accuracy of the AI model by breast density

Density category AUC (95% CI) P value Sensitivity (%) Specificity (%) Threshold
B 0.56 (0.41–0.70) 0.892 54.2 50.0 0.20
C 0.61 (0.48–0.74) 0.255 62.7 57.9 0.22
D 0.75 (0.61–0.89) 0.049 69.3 61.1 0.14

AI, artificial intelligence; AUC, area under the curve; CI, confidence interval.

Table 4

Comparative AUCs of AI and conventional risk models by breast density

Density category AUC (95% CI)
AI model TC lifetime model TC 10-year model Gail lifetime model Gail 5-year model
B 0.56 (0.41–0.70) 0.58 (0.45–0.70) 0.57 (0.44–0.69) 0.55 (0.43–0.68) 0.54 (0.42–0.67)
C 0.61 (0.48–0.74) 0.59 (0.46–0.71) 0.58 (0.46–0.70) 0.57 (0.45–0.69) 0.56 (0.44–0.68)
D 0.75 (0.61–0.89) 0.62 (0.48–0.76) 0.61 (0.47–0.74) 0.59 (0.45–0.72) 0.58 (0.44–0.71)

AI, artificial intelligence; AUC, area under the curve; CI, confidence interval; TC, Tyrer-Cuzick.


Discussion

Breast cancer screening strategies have evolved significantly, yet determining the optimal approach for women with mammographically dense breasts who do not meet traditional high-risk criteria remains a critical gap in clinical practice. While current guidelines provide clear recommendations for high-risk women of any density (2,15,16), there is limited guidance for intermediate-risk women with dense breast tissue. This group represents a substantial proportion of the screening population but occupies a clinical grey zone, where decisions regarding supplemental imaging are often left to individual clinician discretion or patient preference, leading to variability in care and potential underdiagnosis.

Unlike traditional models that rely on static clinical features, AI algorithms can identify complex imaging patterns that may signal an underlying risk of malignancy, even before detectable lesions emerge. This makes AI a compelling tool for personalized screening strategies, particularly for women with dense breasts. However, despite their promise, image-based AI models require robust clinical validation, particularly in women who are not identified as high risk by conventional criteria.

In our study, AI-assigned breast density provided a reproducible, objective alternative to radiologists’ subjective assessments, thereby enhancing consistency in risk stratification and reducing screening variability. Notably, AI-assigned density distribution in our cohort differed from radiologist-assigned BI-RADS categories, reflecting the well-documented divergence between visual and automated evaluations. This objectivity is critical, as visual BI-RADS density categorization is known to suffer from moderate-to-substantial inter-reader variability (17,18). By contrast, automated AI-based methods have demonstrated stronger inter-reader reliability, with some outperforming human readers in consistency, while maintaining comparable cancer risk discrimination (10). Prior work has shown that radiologist classification is influenced by image contrast, reader experience, and subjective interpretation of “masking”, whereas automated methods quantify fibroglandular tissue proportion directly from image data, thereby reducing human bias (19). Recent evidence from Da Rocha et al. further reinforces this advantage as their open-source convolutional neural network substantially outperformed the variability seen in human readers (20). These findings support the adoption of AI-based density measurement to ensure more accurate and equitable supplemental screening decisions

The AI model also consistently showed higher mean risk scores in cancer patients than in controls across all AI-assigned breast density categories, with the greatest separation observed in category D (extremely dense breasts). Although statistical significance was not achieved in all comparisons, the progressive increase in discriminatory performance with higher density supports the utility of AI models in populations where traditional mammography performs sub-optimally (3,4). In this context, the AI model’s enhanced discrimination in women with extremely dense breasts may reflect its capacity to extract subtle imaging features that are not apparent to human readers.

The model’s superior performance extended beyond density stratification. Subgroup analyses revealed a significant difference in risk scores among patients with ILC, a subtype characterized by a diffuse growth pattern and lower detectability on conventional mammography (21-23). This finding warrants particular attention, as ILC often demonstrates subtle parenchymal distortions and architectural asymmetries that may escape visual detection but could be captured by image-based AI models through higher-order textural or contextual features. The AI model’s improved performance in this subgroup suggests that deep-learning-based algorithms may identify predictive cues associated with ILC risk that are not readily apparent to human observers. Compared with traditional models, the AI model also showed better discrimination and a progressive increase in risk scores with advancing tumor grade, suggesting it may capture imaging phenotypes associated with biologically aggressive tumors.

However, the absence of statistical significance across several other subgroups and the overall cohort likely reflects multifactorial influences. The relatively small number of cancer cases, particularly within certain subtypes, limited statistical power to detect subtle differences. In addition, the inherent biological heterogeneity of breast cancers, variations in image acquisition parameters, and the diverse demographic and clinical characteristics of the study population may have introduced variability, diluting the statistical contrast between groups. Moreover, breast cancer risk prediction is intrinsically complex, as it depends on overlapping imaging, genetic, and hormonal factors that may not all be captured by imaging-based AI models alone. Despite these constraints, the consistent directionality of findings across subgroups strengthens the biological plausibility of the observed trends and underscores the need for larger, prospective, and multi-institutional validation to confirm these early signals of performance.

Critically, the AI model’s diagnostic accuracy, as measured by ROC analysis, rose substantially with increasing breast density, achieving an AUC of 0.75 in category D, a statistically significant result (P=0.049). Sensitivity (69.3%) and specificity (61.1%) in this group were also notable, suggesting meaningful clinical utility. These findings align with prior studies. Yala et al. reported an AUC of 0.68 for image-only models (10), while Eriksson et al. observed AUCs ranging from 0.65 to 0.74 in high-density cohorts (24).

Taken together, the observed subgroup differences across breast density, cancer subtype, and tumor grade suggest that the AI model may be identifying imaging biomarkers associated with both underlying cancer risk and tumor detectability, offering an opportunity for earlier diagnosis in women who would otherwise be missed by traditional approaches. These capabilities are particularly relevant in the intermediate-risk population, where decisions to pursue supplemental imaging are often subjective and inconsistent. By integrating image-based risk assessment with objective density assessment, AI models may support more equitable, precise, and individualized screening strategies.

At our institutions, MBI is offered as a supplemental screening modality for women with dense breast tissue and no other risk factors. A previous study demonstrated that MBI improves cancer detection rates and specificity in this subgroup (25). Our retrospective study focuses on predicting which women with dense breasts, not deemed high-risk by traditional clinical models, are likely to develop breast cancer in the short term. We assess the accuracy of an AI model in stratifying future cancer risk to determine whether it can more effectively guide the use of MBI for supplemental screening and evaluate the AI model’s predictive performance compared to existing tools. If effective, it could minimize the overuse of imaging, reduce patient burden, and improve early detection.

It is important to distinguish between the fundamental principles underlying traditional and AI-based risk models, as they operate on different temporal and biological scales. Conventional models such as the Gail and TC algorithms estimate long-term susceptibility to breast cancer by aggregating epidemiologic and hormonal factors—variables that evolve over years and reflect inherent biological predisposition. In contrast, image-based AI models are designed to capture short-term, imaging-derived indicators of tumor development, detecting subtle textural, architectural, or microvascular changes that may precede radiologically visible lesions. This distinction between biological susceptibility (long-term risk) and incipient detectability (short-term risk) may explain why traditional models often fail to identify women who develop cancer within a year of screening, whereas AI models—trained directly on imaging biomarkers—demonstrate higher short-term discriminatory performance even in the absence of broad statistical significance across all risk strata. This study contributes to a growing body of literature on integrating AI into clinical workflows, particularly in scenarios where conventional tools fall short. The findings may help address the gap in personalized screening for women with dense breast tissue and no other risk factors, supporting more efficient allocation of supplemental imaging resources. Moreover, given that several states have mandated breast density notification laws without clear clinical guidance, AI could serve as a decision-support tool to inform both clinicians and patients about personalized risk and screening options (26).

This study has several important limitations. First, its retrospective design inherently introduces potential biases, including selection and information bias. Second, the mammograms were obtained from three different sites within the same institution, which may have introduced some variability in imaging protocols and quality. While this is a limitation, it also reflects real-world variability and may enhance the applicability of our findings across broader, more diverse populations and clinical settings. Third, not all patients had complete data for the TC and Gail models, which may have affected the robustness of comparative analyses. Additionally, the cancer-negative group was defined based on benign or negative mammography reports, which raises the possibility of missed or undetected cancers, particularly in women with the most dense breasts. To offset this limitation, all non-cancer cases underwent at least two years of mammographic follow-up to confirm true negative status. The AI model used in this study was trained solely on mammographic images; incorporating additional clinical, sociodemographic, and genetic risk factors could potentially improve its predictive performance and is an ongoing area of AI development and research. Lastly, the number of confirmed cancer cases within the study period was relatively small, limiting statistical power in subgroup analyses. Despite these limitations, to our knowledge, this is one of the very few studies, if any, that evaluate breast cancer risk scores specifically in non–high-risk patients. This novel focus adds meaningful insight to the growing effort to personalize breast cancer screening strategies beyond traditionally high-risk populations.

Future studies should therefore prioritize several specific directions. First, larger, prospective, and multi-institutional cohorts are needed to validate these findings, particularly within distinct breast-density subgroups to confirm the model’s density-dependent behavior. Second, longitudinal studies that track cancer incidence and outcomes over time could determine whether AI-based risk stratification translates into earlier detection, reduced interval cancers, or improved survival. Third, integrating AI-generated risk scores with comprehensive clinical and genomic data could yield hybrid models that better reflect the multifactorial nature of breast cancer risk. Fourth, future research should explore how AI-guided screening strategies might optimize imaging intervals or tailor the choice of supplemental modalities (such as MBI, MRI, or ultrasound) in dense-breast populations. Fourth, future work should also include systematic cross-model evaluations—testing this AI model alongside other established AI models to determine relative performance across platforms and imaging vendors. Since Profound AI is among the first commercially available models of its kind, cross-platform, vendor-neutral validation will become increasingly important as additional AI tools emerge, ensuring the reproducibility, robustness, and generalizability of AI-driven risk assessment across diverse clinical environments. Finally, building on our observation with ILC, future studies should specifically explore AI systems optimized for lobular histology, with dedicated training datasets and multi-institutional validation cohorts. Such efforts could help determine whether AI-based risk stratification can improve early detection or screening triage in women with ILC—an area where conventional imaging remains limited.


Conclusions

This study demonstrates that an image-based AI risk model can provide enhanced discriminatory performance for breast cancer risk assessment in specific subgroups, particularly among women with extremely dense breast tissue (category D) and those with ILC. Although statistically significant differences were not observed across all comparisons, the consistent trend of higher AI-derived risk scores among patients with dense-breast and lobular subgroups suggests that AI-based imaging analysis captures subtle, biologically relevant features not discernible by traditional questionnaire-based models, particularly among non-high-risk patients. These findings highlight the complementary role of AI-derived risk tools in breast cancer screening workflows, where they may help overcome the limitations of conventional models, reduce subjectivity in breast-density classification, and better identify women who could benefit from supplemental imaging or tailored surveillance strategies.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/dss

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Mayo Clinic (No. 23-007303). The requirement for informed consent was waived due to the retrospective nature of the analysis.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  2. Mann RM, Athanasiou A, Baltzer PAT, Camps-Herrero J, Clauser P, Fallenberg EM, Forrai G, Fuchsjäger MH, Helbich TH, Killburn-Toppin F, Lesaru M, Panizza P, Pediconi F, Pijnappel RM, Pinker K, Sardanelli F, Sella T, Thomassin-Naggara I, Zackrisson S, Gilbert FJ, Kuhl CKEuropean Society of Breast Imaging (EUSOBI). Breast cancer screening in women with extremely dense breasts recommendations of the European Society of Breast Imaging (EUSOBI). Eur Radiol 2022;32:4036-45. [Crossref] [PubMed]
  3. Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E, Jong RA, Hislop G, Chiarelli A, Minkin S, Yaffe MJ. Mammographic density and the risk and detection of breast cancer. N Engl J Med 2007;356:227-36. [Crossref] [PubMed]
  4. Mandelson MT, Oestreicher N, Porter PL, White D, Finder CA, Taplin SH, White E. Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Inst 2000;92:1081-7. [Crossref] [PubMed]
  5. Sprague BL, Gangnon RE, Burt V, Trentham-Dietz A, Hampton JM, Wellman RD, Kerlikowske K, Miglioretti DL. Prevalence of mammographically dense breasts in the United States. J Natl Cancer Inst 2014;106:dju255. [Crossref] [PubMed]
  6. Berg WA, Zhang Z, Lehrer D, Jong RA, Pisano ED, Barr RG, Böhm-Vélez M, Mahoney MC, Evans WP 3rd, Larsen LH, Morton MJ, Mendelson EB, Farria DM, Cormack JB, Marques HS, Adams A, Yeh NM, Gabrielli GACRIN 6666 Investigators. Detection of breast cancer with addition of annual screening ultrasound or a single screening MRI to mammography in women with elevated breast cancer risk. JAMA 2012;307:1394-404. [Crossref] [PubMed]
  7. Hooley RJ, Greenberg KL, Stackhouse RM, Geisel JL, Butler RS, Philpotts LE. Screening US in patients with mammographically dense breasts: initial experience with Connecticut Public Act 09-41. Radiology 2012;265:59-69. [Crossref] [PubMed]
  8. Petracci E, Decarli A, Schairer C, Pfeiffer RM, Pee D, Masala G, Palli D, Gail MH. Risk factor modification and projections of absolute breast cancer risk. J Natl Cancer Inst 2011;103:1037-48. [Crossref] [PubMed]
  9. Maas P, Barrdahl M, Joshi AD, Auer PL, Gaudet MM, Milne RL, et al. Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. JAMA Oncol 2016;2:1295-1302. Erratum in: JAMA Oncol 2016;2:1374. [Crossref] [PubMed]
  10. Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology 2019;292:60-6. [Crossref] [PubMed]
  11. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577:89-94. Erratum in: Nature 2020;586:E19. [Crossref] [PubMed]
  12. D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. Reston, VA: American College of Radiology; 2013. Accessed July 18, 2025. Available online: https://www.acr.org/Clinical-Resources/Clinical-Tools-and-Reference/Reporting-and-Data-Systems/BI-RADS
  13. Sickles E, D’Orsi CJ, Bassett LW. ACR BI-RADS Mammography In: ACR BI-RADS Atlas, Breast Imaging Reporting and System Data. Reston, Va: American College of Radiology, 2013. Accessed July 19, 2025. Available online: https://edge.sitecorecloud.io/americancoldf5f-acrorgf92a-productioncb02-3650/media/ACR/Files/RADS/BI-RADS/Mammography-Reporting.pdf
  14. Pike J, Hoffmeister J. Estimating the likelihood of near-term breast cancer using ProFound AI® Risk: Theory and clinical performance [White Paper]. iCAD Inc.; 2023. Accessed July 18, 2025. Available online: https://www.icadmed.com/wp-content/uploads/2023/11/DMM279-Estimating-the-Likelihood-of-Near-Term-Breast-Cancer-White-Paper-Rev1.pdf
  15. US Preventive Services Task Force. Nicholson WK, Silverstein M, Wong JB, Barry MJ, Chelmow D, Coker TR, Davis EM, Jaén CR, Krousel-Wood M, Lee S, Li L, Mangione CM, Rao G, Ruiz JM, Stevermer JJ, Tsevat J, Underwood SM, Wiehe S. Screening for Breast Cancer: US Preventive Services Task Force Recommendation Statement. JAMA 2024;331:1918-30. Erratum in: JAMA 2024;332:1396-7. [Crossref] [PubMed]
  16. Weinstein SP, Slanetz PJ, Lewin AA, Battaglia T, Chagpar AB, Dayaratna S, Dibble EH, Goel MS, Hayward JH, Kubicky CD, Le-Petross HT, Newell MS, Sanford MF, Scheel JR, Vincoff NS, Yao K, Moy L. ACR Appropriateness Criteria® Supplemental Breast Cancer Screening Based on Breast Density. J Am Coll Radiol 2021;18:S456-73. [Crossref] [PubMed]
  17. Sartor H, Lång K, Rosso A, Borgquist S, Zackrisson S, Timberg P. Measuring mammographic density: comparing a fully automated volumetric assessment versus European radiologists' qualitative classification. Eur Radiol 2016;26:4354-60. [Crossref] [PubMed]
  18. Romanov S, Howell S, Harkness E, Gareth Evans D, Astley S, Fergie M. Comparing percent breast density assessments of an AI-based method with expert reader estimates: inter-observer variability. J Med Imaging (Bellingham) 2025;12:S22011. [Crossref] [PubMed]
  19. Destounis S, Arieno A, Morgan R, Roberts C, Chan A. Qualitative Versus Quantitative Mammographic Breast Density Assessment: Applications for the US and Abroad. Diagnostics (Basel) 2017;7:30. [Crossref] [PubMed]
  20. da Rocha NC, Barbosa AMP, Schnr YO, Peres LDB, de Andrade LGM, de Magalhaes Rosa GJ, Pessoa EC, Corrente JE, de Arruda Silveira LV. Enhancing Breast Density Assessment in Mammograms Through Artificial Intelligence. J Imaging Inform Med 2025; Epub ahead of print. [Crossref]
  21. Hilleren DJ, Andersson IT, Lindholm K, Linnell FS. Invasive lobular carcinoma: mammographic findings in a 10-year experience. Radiology 1991;178:149-54. [Crossref] [PubMed]
  22. Krecke KN, Gisvold JJ. Invasive lobular carcinoma of the breast: mammographic findings and extent of disease at diagnosis in 184 patients. AJR Am J Roentgenol 1993;161:957-60. [Crossref] [PubMed]
  23. Le Gal M, Ollivier L, Asselain B, Meunier M, Laurent M, Vielh P, Neuenschwander S. Mammographic features of 455 invasive lobular carcinomas. Radiology 1992;185:705-8. [Crossref] [PubMed]
  24. Eriksson M, Czene K, Vachon C, Conant EF, Hall P. Long-Term Performance of an Image-Based Short-Term Risk Model for Breast Cancer. J Clin Oncol 2023;41:2536-45. [Crossref] [PubMed]
  25. Rhodes DJ, Hruska CB, Conners AL, Tortorelli CL, Maxwell RW, Jones KN, Toledano AY, O'Connor MK. Journal club: molecular breast imaging at reduced radiation dose for supplemental screening in mammographically dense breasts. AJR Am J Roentgenol 2015;204:241-51. [Crossref] [PubMed]
  26. Melnikow J, Fenton JJ, Whitlock EP, Miglioretti DL, Weyrich MS, Thompson JH, Shah K. Supplemental Screening for Breast Cancer in Women With Dense Breasts: A Systematic Review for the U.S. Preventive Services Task Force. Ann Intern Med 2016;164:268-78. [Crossref] [PubMed]
Cite this article as: Ogunlade SB, Wang L, Maimone S, Robinson KA, Moran KM, Leon A, Morozov AP, Nwachukwu CT, Letter HP. Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging. Quant Imaging Med Surg 2026;16(2):123. doi: 10.21037/qims-2025-1650

Download Citation