Deep learning-assisted aortic stenosis detection and grading based on multiview versus single-view echocardiography
Original Article

Deep learning-assisted aortic stenosis detection and grading based on multiview versus single-view echocardiography

Feifei Yang1# ORCID logo, Yongming Zhang2#, Yufei Gao2, Baoquan Wang2, Xixiang Lin3, Xu Chen4, Qiushuang Wang5, Meiqing Zhang5, Xin Li6, Bohan Liu7, Peifang Zhang2, Kunlun He8*, Liwei Zhang1*

1Department of Cardiology, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, China; 2BioMind Technology, Zhongguancun Medical Engineering Center, Beijing, China; 3Department of Cardiology, General Hospital of Southern Theatre Command of PLA, Guangzhou, China; 4Department of Cardiology, The Second Medical Center of Chinese PLA General Hospital, Beijing, China; 5Department of Health Medicine, The Fourth Medical Center of Chinese PLA General Hospital, Beijing, China; 6Department of Ultrasound Diagnosis, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, China; 7Department of Cardiac Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China; 8Medical Big Data Research Center, Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Key Laboratory of Ministry of Industry and Information Technology of Biomedical Engineering and Translational Medicine, Chinese PLA General Hospital, Beijing, China

Contributions: (I) Conception and design: F Yang, L Zhang; (II) Administrative support: K He; (III) Provision of study materials or patients: X Lin, X Chen, Q Wang, M Zhang, X Li, B Liu; (IV) Collection and assembly of data: F Yang, Y Zhang, X Lin, X Chen; (V) Data analysis and interpretation: F Yang, Y Zhang, Y Gao, B Wang, P Zhang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

*These authors contributed equally to this work.

Correspondence to: Liwei Zhang, MD. Department of Cardiology, The Sixth Medical Center of Chinese PLA General Hospital, No. 6 Fucheng Road, Haidian District, Beijing 100037, China. Email: liweizh304@sina.com; Kunlun He, MD, PhD. Medical Big Data Research Center, Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Key Laboratory of Ministry of Industry and Information Technology of Biomedical Engineering and Translational Medicine, Chinese PLA General Hospital, No. 28 Fuxing Road, Haidian District, Beijing 100853, China. Email: kunlunhe@plagh.org.

Background: Advances in deep learning (DL) have shown promise in automating echocardiogram interpretation, thereby enhancing accuracy and efficiency in clinical practice. However, a fully automated pipeline for aortic stenosis (AS) analysis remains largely unexplored. This study aimed to develop a DL framework to streamline clinical AS assessment.

Methods: A total of 499 AS studies (1,996 echocardiographic view) were selected from 17,436 cases of patients with valvular heart diseases (VHDs) obtained from three hospitals to form training (n=302), validation (n=76), and internal testing (n=121) datasets, while a prospectively collected set of 3,278 consecutive echocardiograms served as a real-world test data set. The DL framework automatically classified echocardiographic views, detected the presence of AS, and employed two algorithms to assess severity: multiview and single-view.

Results: The DL model achieved high performance in AS detection in the prospective test dataset, with an area under the curve (AUC) of 0.942. The correlation between DL-graded metrics and manual measurements was excellent for aortic valve (AV) peak velocity (r=0.94; P<0.001), mean peak gradient (r=0.91; P<0.001), left ventricular outflow tract diameter (LVOTd) (r=0.81; P<0.001), AV velocity-time integral (VTI) (r=0.94; P<0.001), LVOT VTI (r=0.88; P<0.001), and AV area (r=0.87; P<0.001). Based on these metrics, the AUC of severe AS was the highest at 0.976 [95% confidence interval (CI): 0.953–1.0], significantly superior to those for moderate AS (AUC =0.907) and mild AS (AUC =0.874). The two-dimensional parasternal long-axis view method yielded comparable AUCs for all AS severities (AUC: 0.869–0.920).

Conclusions: The proposed DL algorithm has the potential to automate and enhance the efficiency of clinical workflows for AS screening and grading in echocardiography.

Keywords: Deep learning (DL); aortic stenosis (AS); echocardiography; artificial intelligence (AI)


Submitted Feb 19, 2025. Accepted for publication Aug 21, 2025. Published online Oct 17, 2025.

doi: 10.21037/qims-2025-415


Introduction

Aortic stenosis (AS) is the most prevalent valvular heart disease (VHD) worldwide, with a mounting disease burden among the aging population. The condition affects approximately 2% of individuals aged 65 years and older and 4% of those aged 85 years and older (1,2). Notably, the prevalence of moderate-to-severe AS in individuals over 75 years old is 0–100 times higher than that in those aged 18–44 years (3). If left untreated or not treated in a timely manner, symptomatic moderate and severe AS can be fatal, with 5-year mortality rates of 56% and 67%, respectively (4). Thus, for patients with AS, early screening and severity assessment are critical for risk stratification and prompt intervention.

Echocardiographic assessments, including Doppler methods, are the cornerstone in diagnosing and categorizing the degree of AS. Current guidelines endorse the use of various metrics, such as aortic valve (AV) peak velocity (Vmax), AV mean pressure gradient (MPG), and AV area (AVA) for quantitative assessment (5). These metrics require specialized expertise and considerable processing time. Furthermore, the outcomes of echocardiographic assessments are heavily influenced by the skill level of the practitioner, potentially leading to misdiagnoses and significant interobserver variability. Therefore, there is an urgent need for an automated system capable of grading AS severity in clinical practice.

The use of deep learning (DL), a branch of artificial intelligence (AI) that includes convolutional neural networks (CNNs), has significantly accelerated the development of echocardiographic study interpretation (6-8). Recent advancements consist of identification of views, segmentation of cardiac regions such as the left and right ventricles and atria, assessment of systolic and diastolic function, and the classification of diseases (9-11). The applications of DL are capable of increasing the accuracy, consistency, and efficiency of echocardiographic interpretation. However, the integration of DL to streamline the diagnostic process for AS in typical clinical scenarios has not been extensively examined.

In our study, we developed a pipeline capable of diagnosing AS and grading its severity by integrating multiple models to achieve real-time measurements of multiple two-dimensional (2D) and Doppler metrics, including AV Vmax, AV MPG, AV velocity-time integral (VTI), left ventricular outflow tract (LVOT) diameter (LVOTd), LVOT VTI, and continuity equation-derived AVA, as per the relevant guidelines (5). Additionally, we developed a simplified method for classifying AS from a single 2D parasternal long-axis (PLAX) video view, without requiring Doppler data input. The feasibility and accuracy of our models for diagnosing and grading AS severity were evaluated in a prospective test dataset with a comparison to the results provided by two expert echocardiographers. Finally, the performance of the multiview and single-view algorithms were compared in a real-world context. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-415/rc).


Methods

Study population

The DL algorithm development in this study included a retrospectively collected dataset and validation through a prospectively collected, consecutive series of echocardiographic studies from real-world practice. The retrospective dataset, detailed in Figure 1, consisted of 575 AS studies selected from a total of 17,436 cases of patients with VHDs obtained from three hospitals in China (The First Medical Center, The Fourth Medical Center, and The Sixth Medical Center of Chinese PLA General Hospital) between January 1, 2015, and September 1, 2020. Of these, 499 AS studies (1,996 echocardiographic views) had all four echocardiographic and Doppler views required for model development including (I) 2D parasternal long axis (PLAX-2D), (II) 2D AV-level parasternal short axis (PSAX-AV-2D), (III) apical five-chamber with continuous wave (CW) in the AV (A5C-AV-CW), and (IV) apical five-chamber with pulsed wave (PW) in the LVOT (A5C-LVOT-PW). These 499 AS studies were divided into training (n=302), validation (n=76), and internal testing (n=121) datasets to facilitate model development, hyperparameter optimization, and internal algorithm testing, respectively.

Figure 1 Summary of the echocardiograms used in this study. (A) Development dataset (reasons for excluding tests included those with missing view data and poor data quality) and the distribution of AS studies used for the training, validation, and test sets. (B) Prospective test dataset over a consecutive 3-month period. AS, aortic stenosis; VHDs, valvular heart diseases.

The prospective test dataset was derived from 3,278 consecutive echocardiographic studies, of which 178 were AS cases, obtained between May 1, 2023, and July 31, 2023, from The Fourth Medical Center of Chinese PLA General Hospital. All echocardiograms conducted at the hospital within the predetermined 3-month interval are included in this dataset, with no exclusions based on any criteria.

This study was registered with the Chinese clinical trial registry (ChiCTR2000030278), conducted in accordance with the Declaration of Helsinki and its subsequent amendments, and approved by the Research Ethics Committee of Chinese PLA General Hospital. Informed consent was obtained from all patients.

Echocardiography and experienced reader evaluation

Each echocardiographic study included standard 2D and color Doppler video clips, alongside still images of CW and PW Doppler flow tracings. A variety of echocardiography devices from different manufacturers were used to obtain these images, including the M9CV with a SP5-1s transducer (Mindray, Shenzhen, Guangdong, China), an Accuson SC2000 with a 4V1c transducer (Siemens Healthineers, Erlangen, Germany), the IE Elite and Epiq 7C with S5-1 and X5-1 transducers (Phillips Healthcare, Amsterdam, the Netherlands), and the Vivid E95 (GE HealthCare, Chicago, IL, USA).

With reference to the electronic medical records, two experienced physician echocardiographers, each with over two decades of expertise, determined the ground truth of AS by examining the echocardiograms. Following the 2021 European Society of Cardiology (ESC)/European Association for Cardio-Thoracic Surgery (EACTS) guideline recommendations (5), these two echocardiographers evaluated the metrics for indicating the severity of AS in the test dataset.

The quantification of AS included AV Vmax (m/sec), MPG (mmHg), AV VTI (cm) obtained via CW Doppler, LVOT VTI (cm) obtained via PW Doppler, and LVOTd (cm). These metrics were used to calculate AVA via the continuity equation as follows: AVA = [cross-sectional area (CSA)LVOT × VTILVOT]/VTIAV. The continuity equation requires the measurement of AS jet velocity through CW Doppler, measurement of LVOTd for the calculation of the CSA, and measurement of LVOT velocity via PW Doppler.

Network architectures and training protocol

The framework of the DL algorithm consists of the following three steps: (I) view classification, (II) disease detection, and (III) severity assessment. The detailed description of the model architecture is summarized in Figure 2.

Figure 2 Detailed description of the model architecture. (A) Model architecture for view classification. (B) Model architecture for AS detection. The model takes two views as input and detects the presence of AS. (C) Model architecture for AS severity assessment. If AS is found, the severity of AS is assessed by two algorithms: one quantifies key metrics related to AS severity based on multiple views according to established guidelines and the other simplified method classifies AS based on a single 2D PLAX view without requiring Doppler input. 2D, two-dimensional; 3D, three-dimensional; A5C, apical five-chamber; AS, aortic stenosis; AV, aortic valve; AVA, aortic valve area; CNN, convolutional neural network; CW, continuous wave; LVOT, left ventricular outflow tract; MPG, mean pressure gradient; PLAX, parasternal long-axis; PSAX, parasternal short axis; PW, pulsed wave; Vmax, peak velocity; VTI, velocity-time integral.

View classification

We annotated 33,404 images to develop a two-step classification method for selecting the required views for this analysis. View classification was performed via the XceptionNet (12) neural network model. View classification was performed for 29 classes in order to filter out the four views we required. First, we classified the views into 1 of 29 broad categories as detailed in Figure 3A, but the PW and CW views were not subclassified. Subsequently, we cropped sector-shaped window areas of CW and PW views and trained 3,988 images to develop another model for performing subclassification according to the position of the spectral Doppler cursor (Figure 3B).

Figure 3 Normalized confusion matrix of the view classification algorithm. (A) Normalized confusion matrix of step 1 of the view classification algorithm. The views marked in red are required for this analysis. (B) Normalized confusion matrix of step 2 of the view classification algorithm for CW and PW Doppler images. The views marked in red are required for this analysis. A4C, apical four-chamber; A5C, apical five-chamber; AV, aortic valve; CW, continuous wave; MV, mitral valve; PSAX, parasternal short axis; PV, pulmonary valve; PW, pulsed wave; TV, tricuspid valve.

Disease detection

Disease detection models are binary classification models that classify positive and normal data, with the aim of screening out positive data. The positive data in our study included mild AS, moderate AS, and severe AS. We extracted 64 frames from the echocardiographic views and preprocessed them by resizing the images to a resolution of 384×384 pixels. Subsequently, the two views (PLAX-2D and PSAX-AV-2D) were fed to two branches of the neural network. The 3D CNN, S3D (13), was used to extract features from the view in each branch. Absent views in cases were substituted with an array of zeros. The features from the two views were concatenated and passed through a classifier to generate the final prediction for the presence of stenosis. For optimization, we employed the adaptive moment estimation (ADAM) optimizer with an initial learning rate of 2×10−5 and minibatch size of 4.

Severity assessment

Multiview AS classification based on metrics

For the quantification of AS, we initially trained a segmentation model, UNet (14), on A5C-AV-CW and A5C-LVOT-PW images to delineate the contour of the waveforms representing positive blood flow through the valve during the ejection period. The optimization process included the use of the ADAM optimizer, with an initial learning rate of 1×10−3 and a minibatch size of 16. The maximum blood flow velocity (Vmax) was obtained directly from the peak of the contour. According to Bernoulli’s principle, instantaneous pressure difference (∆P) was derived from the corresponding flow velocity. The mean pressure difference was obtained by averaging the instantaneous pressure difference of all the sample points on the contour. AV VTI was determined from the area under the contour of the continuous waveforms, while LVOT VTI was derived from the area under the contour of the pulsed waveforms.

We annotated 513 images to develop an LVOT key point detection model using the PLAX-2D view. This model was designed to measure the diameter length of the LVOT. Experienced physician echocardiographers labeled the LVOT in the PLAX-2D view at the base of the AV cusps or 1–5 mm below the aortic annulus during midsystole using an inner-edge-to-inner-edge methodology.

The AVA was calculated by the AI system using the same LVOT-based continuity equation as used by the experienced physician echocardiographers, as described above.

Single-view AS classification based on PLAX-2D video

According to the labels provided by experienced echocardiographers, of the 499 AS studies in the training set, 201 had mild AS, 184 had moderate AS, and 114 had severe AS. We developed a three-class classification model based on PLAX-2D video using the residual net (ResNet) 3D DL network framework (15). Initially, we extracted 32 frames from the echocardiographic view and preprocessed them by resizing the images to a resolution of 32×256×256 pixels. Subsequently, these images were fed as input to the neural network for training. The loss function of model training adopted weighted cross-entropy to address the imbalance in the number of classification samples between the three categories. The features from the PLAX-2D view were concatenated and passed through a classifier to generate the final prediction for the severity of AS. For optimization, we used the ADAM optimizer with an initial learning rate of 1×10−4, with a momentum of 0.9 and a weight decay of 5×10−4. The learning rate was decreased by 10 times for every 20 epochs, resulting in a total of 100 epochs being trained.

Data analyses and statistical considerations

The clinical characteristics of patients included in this study are presented as the mean ± standard deviation, median and interquartile range, or as count and percentage, as appropriate. Regarding the detection and classification of AS, the DL classification model was evaluated against the ground truth. The evaluation metrics included sensitivity, specificity, accuracy, and the area under the receiver operating characteristic (ROC) curve (AUC). The 95% confidence interval (CI) was obtained via bootstrapping. Correlation coefficients between human and DL algorithm measurements were assessed with Pearson correlation. The significance of the correlation was determined according to the Pearson product-moment correlation coefficient. Bland-Altman analysis was employed to compare the quantifications of grading AS metrics between experienced echocardiographers and the DL algorithm. The analyses were performed with Python 3.6 (Python Software Foundation, Wilmington, DE, USA) and included the NumPy, Pandas, and scikit-learn libraries. A two-tailed P value of less than 0.05 was considered statistically significant.


Results

Baseline characteristics

The clinical and echocardiographic characteristics of the included studies in the AS prospective test datasets are presented in Table 1. In the cohort of 161 patients, 74 (45.96%) had mild AS, 66 (40.99%) had moderate AS, and 21 (13.05%) had severe AS. The mean age of the patients was 74±11 years, and 45.96% were male. The median ejection fraction (EF) was 53.22±6.17.

Table 1

Baseline demographic and echocardiographic characteristics

Items All AS cases (n=161) Mild AS (n=74) Moderate AS (n=66) Severe AS (n=21)
Age (years) 74±11 72±10 75±12 76±11
Male 74 (45.96) 32 (43.24) 30 (45.45) 12 (57.14)
Comorbidities
   Hypertension 71 (44.10) 31 (41.89) 30 (45.45) 10 (47.62)
   Hyperlipidemia 91 (56.52) 41 (55.41) 37 (56.06) 13 (61.90)
   Diabetes 48 (29.81) 20 (27.03) 21 (31.82) 7 (33.33)
   Coronary heart disease 54 (33.54) 23 (31.08) 23 (34.85) 8 (38.10)
Echocardiographic
   Vmax (m/s) 3.24±0.68 2.73±0.14 3.36±0.25 4.62±0.60
   AV MPG (mmHg) 20.86±10.45 14.36±2.37 21.62±4.22 41.33±14.04
   AVA (mm2) 1.09±0.61 2.11±0.54 1.17±0.12 0.72±0.35
   LV EF (%) 53.22±6.17 56.13±5.14 54.25±7.42 50.64±6.12

Data are presented as number (%) or mean ± SD. AS, aortic stenosis; AV, aortic valve; MPG, mean pressure gradient; AVA, aortic valve area; EF, ejection fraction; LV, left ventricular; SD, standard deviation; Vmax, peak velocity.

Performance of the DL architecture in view classification

As summarized in Figure 3A,3B, the DL architecture identified PLAX-2D, PSAX-AV-2D, A5C-LVOT-PW, and A5C-AV-CW with a high degree of sensitivity, ranging from 0.90 to 0.98.

Performance of the DL model in identifying AS in the prospective test dataset

The results of the model’s application to the real-world prospective test dataset are summarized in Figure 4. Relative to ground truth, the DL architecture achieved high performance for detecting AS, with an AUC of 0.942 (95% CI: 0.934–0.950), a sensitivity of 0.904 (95% CI: 0.894–0.915), a specificity of 0.923 (95% CI: 0.914–0.932), and an accuracy of 0.922 (95% CI: 0.913–0.931).

Figure 4 The ROC of the deep learning model for identifying aortic stenosis in the prospective test dataset. AS, aortic stenosis; AUC, area under the curve; CI, confidence interval; ROC, receiver operating characteristics curves.

Correlation analysis of DL and experienced echocardiographers

The results of correlation analysis between DL-derived AS metrics and the values measured by experienced echocardiographers are shown in Figure 5A-5F. The correlation between the metrics for grading AS severity obtained from DL and manual measurements was excellent. This included Vmax (r=0.94; P<0.001), MPG (r=0.91; P<0.001), LVOTd (r=0.81; P<0.001), AV VTI (r=0.94; P<0.001), and LVOT VTI (r=0.88; P<0.001). Additionally, the AVA calculated by the continuity equation also demonstrated an excellent correlation (r=0.87; P<0.001).

Figure 5 The correlation analysis between DL-derived aortic stenosis metrics and echocardiographer-measured values (A-F). (A) AV peak velocity (Vmax), (B) AV MPG, (C) LVOTd, (D) AV VTI, (E) LVOT VTI, and (F) AVA. AV, aortic valve; AVA, AV area; DL, deep learning; LVOT, left ventricular outflow tract; LVOTd, LVOT diameter; MPG, mean pressure gradient; VTI, velocity-time integral.

Comparison of AS measurements between human readers and the DL model

The mean measurements of each metric for grading AS severity, as assessed by human readers, were comparable to those obtained by the DL architecture across all degrees of AS severity. There was no significant statistical difference in AS measurements between the human readers and the DL model, with the P values being greater than 0.05. These results are presented in Table 2.

Table 2

Comparison of AS measurements between human readers and DL

Items Mild AS Moderate AS Severe AS
Human
   Vmax (m/s) 2.73±0.14 3.36±0.25 4.62±0.60
   AV MPG (mmHg) 14.36±2.37 21.62±4.22 41.33±14.04
   LVOT (cm) 1.91±0.25 1.97±0.26 1.99±0.23
   AV VTI (cm2) 56.82±9.12 74.13±11.39 105.86±27.59
   LVOT VTI (cm2) 25.72±9.90 22.27±7.16 23.57±9.19
   AVA (cm2) 2.11±0.54 1.17±0.12 0.72±0.35
DL
   Vmax (m/s) 2.72±0.13 3.31±0.27 4.43±0.64
   AV MPG (mmHg) 13.21±2.70 21.06±5.80 44.00±17.58
   LVOT (cm) 1.93±0.28 2.04±0.25 2.00±0.29
   AV VTI (cm2) 50.27±9.28 65.13±11.56 96.72±29.03
   LVOT VTI (cm2) 30.89±10.31 26.93±8.89 26.73±12.10
   AVA (cm2) 1.90±0.89 1.39±0.59 0.96±0.59

Data are presented as mean ± SD. P>0.05, no significant statistical difference between human and DL. AS, aortic stenosis; AV, aortic valve; AVA, aortic valve area; DL, deep learning; LVOT, left ventricular outflow tract; MPG, mean pressure gradient; SD, standard deviation; Vmax, peak velocity; VTI, velocity-time integral.

Bland-Altman plot analysis of the consistency between DL and experienced echocardiographers

Figure 6A-6F presents Bland-Altman plots illustrating the consistency between the DL algorithms and expert echocardiographers in terms of AS severity metrics. Across all evaluated metrics, the mean difference between the DL and human measurements demonstrated a high level of agreement and minimal bias.

Figure 6 Bland-Altman plots for the agreement between DL and human echocardiographic measurements (A-F). (A) AV peak velocity (Vmax), (B) AV MPG, (C) LVOTd, (D) AV VTI, (E) LVOT VTI, and (F) AVA. AV, aortic valve; AVA, AV area; DL, deep learning; LVOT, left ventricular outflow tract; LVOTd, LVOT diameter; MPG, mean pressure gradient; SD, standard deviation; VTI, velocity-time integral.

Multiview AS classification based on metrics

As summarized above, the correlation between the AS severity metrics determined by the DL model and those determined by echocardiographers was excellent. The ROC curves for grading AS based on these metrics are shown in Figure 7 and Table 3. The AUC and accuracy for severe AS were the highest, at 0.976 (95% CI: 0.953–1.000) and 0.994 (95% CI: 0.982–1.000), respectively. These values were significantly better than those for moderate AS, for which the AUC and accuracy were 0.907 (95% CI: 0.862–0.952) and 0.932 (95% CI: 0.893–0.971), respectively; meanwhile, for mild AS, the AUC and accuracy were 0.874 (95% CI: 0.823–0.926) and 0.907 (95% CI: 0.862–0.952), respectively.

Figure 7 The ROC curves of multiview aortic stenosis classification based on metrics. AS, aortic stenosis; AUC, area under the curve; CI, confidence interval; ROC, receiver operating characteristic.

Table 3

Multiview AS classification based on metrics

Disease (number) AUC (95% CI) Sensitivity (95% CI) Specificity (95% CI) Accuracy (95% CI)
Mild AS (n=74) 0.874 (0.823–0.926) 0.878 (0.828–0.929) 0.931 (0.892–0.970) 0.907 (0.862–0.952)
Moderate AS (n=66) 0.907 (0.862–0.952) 0.894 (0.846–0.942) 0.958 (0.927–0.989) 0.932 (0.893–0.971)
Severe AS (n=21) 0.976 (0.953–1.000) 0.952 (0.919–0.985) 1.000 (1.000–1.000) 0.994 (0.982–1.000)

AS, aortic stenosis; AUC, area under the curve; CI, confidence interval.

Single-view as classification based on PLAX-2D video

To simplify the severity assessment, we designed an alternative method for AS grading based on the PLAX-2D view. The ROC curves for AS grading are shown in Figure 8 and Table 4. The AUC for mild AS, moderate AS, and severe AS were similar, ranging from 0.869 to 0.920.

Figure 8 The ROC curves for single-view aortic stenosis classification based on two-dimensional PLAX-2D video. AUC, area under the curve; CI, confidence interval; DL, deep learning; PLAX-2D, parasternal long-axis-two-dimensional; ROC, receiver operating characteristic.

Table 4

Single-view AS classification based on PLAX-2D video

Disease (number) AUC (95% CI) Sensitivity (95% CI) Specificity (95% CI) Accuracy (95% CI)
Mild AS (n=74) 0.883 (0.834–0.933) 0.865 (0.812–0.918) 0.908 (0.863–0.953) 0.888 (0.840–0.937)
Moderate AS (n=66) 0.869 (0.817–0.921) 0.848 (0.793–0.904) 0.863 (0.810–0.916) 0.857 (0.803–0.911)
Severe AS (n=21) 0.920 (0.878–0.962) 0.857 (0.803–0.911) 0.986 (0.967–1.000) 0.969 (0.942–0.996)

AS, aortic stenosis; AUC, area under the curve; CI, confidence interval; PLAX-2D, two-dimensional parasternal long-axis.


Discussion

In this study, we developed and validated an AI-assisted system for the detection and grading of AS by using DL interpretation of multiple or single echocardiographic views. Our main findings indicate that the DL architecture can achieve high performance in detecting AS, with an AUC of 0.942 and an accuracy of 0.922. Moreover, there was an excellent correlation between DL-derived AS metrics and the values measured by experienced echocardiographers. For these metrics, which are used in the established guidelines for grading lesion severity, the AUCs for severe, moderate, and mild AS were 0.976, 0.907, and 0.874, respectively. Additionally, we developed a single-view model using DL interpretation of a PLAX-2D view, and the AUCs for severe, moderate, and mild AS were 0.920, 0.869, and 0.883, respectively.

DL techniques for analyzing echocardiographic studies have undergone considerable progress. For example, Zhang et al. introduced an automated algorithm for the identification of echocardiographic views, segmentation of images, measurements of cardiac structures and functions, and diagnosis of specific conditions (9). Ouyang et al. developed algorithms that provide automated beat-to-beat measurement of LV EF and evaluation of LV volume variations throughout the cardiac cycle (16). Huang et al. developed deep neural networks for automating the detection of abnormalities in regional myocardial wall movement based on echocardiographic videos (17). However, few studies have included DL algorithms for grading AS severity based on echocardiographic videos. Gaillard et al. reported using traditional image-processing techniques for automatic analysis of CW signals for the assessment of AS (18). Sengupta et al. reported that a machine-learning framework can identify the high-risk AS phenotype based on echocardiographic measurements (19), and Strange et al. developed an AI decision-support algorithm that uses routine echocardiographic measurements to identify severe AS phenotypes associated with high mortality (20). In our previous study, we developed a DL framework for the automatic screening of echocardiographic videos for mitral stenosis (MS), mitral regurgitation (MR), AS, and aortic regurgitation (AR). The disease classification accuracy was high, with AUCs of 0.99 for MS, 0.88 for MR, 0.97 for AS, and 0.90 for AR in the prospective test dataset (21). We also quantified key metrics of disease severity, for example, peak velocity and MPG, for the grading severity of AS. However, we did not quantify the AVA, which is the most critical metric used for grading AS severity.

In this study, we further attempted to quantify the AVA derived through the continuity equation using DL algorithms. We found that the DL-derived AS metrics closely matched those measured by human experts. The correlation of metrics for grading AS severity between DL and manual measurements was excellent, including for Vmax (r=0.94; P<0.001), MPG (r=0.91; P<0.001), LVOTd (r=0.81, P<0.001), AV VTI (r=0.94, P<0.001), LVOT VTI (r=0.88, P<0.001), and AVA (r=0.87, P<0.001). These findings are consistent with those reported by Krishna et al., who found that the Us2.ai algorithm closely matched expert human measurements for AV Vmax (r=0.97; P<0.001), MPG (r=0.94; P<0.001), and AVA (r=0.88; P<0.001) across normal AVs and all grades of AS severity (22). However, their study did not verify the accuracy of grading AS severity based on these metrics. In our study, the AUC and accuracy were, respectively, 0.976 (95% CI: 0.953–1.000) and 0.994 (95% CI: 0.982–1.000) for severe AS, 0.907 (95% CI: 0.862–0.952) and 0.932 (95% CI: 0.893–0.971) for moderate AS, and 0.874 (95% CI: 0.823–0.926) and 0.907 (95% CI: 0.862–0.952) for mild AS.

The relevant guidelines recommend grading AS severity according to multiple parameter measurements, which is highly time-consuming. As the application of point-of-care ultrasonography grows, there is an increasing need for operator-friendly diagnostic tools. These tools, based on individual 2D echocardiographic views, can enhance AS screening efficiency, especially for those with limited experience, through streamlined protocols. Holste et al. developed an automated algorithm to screen for the presence of severe AS based on a single-view 2D transthoracic echocardiography (TTE) video; the algorithm demonstrated excellent performance, with in AUC of 0.92–0.98, a sensitivity of 0.85, and a specificity of 0.96 (23). In our study, we also developed a “black box” model using DL interpretation of a PLAX-2D view to grade AS severity. The AUC and accuracy were, respectively, 0.920 (95% CI: 0.878–0.962) and 0.969 (95% CI: 0.942–0.996) for severe AS, 0.869 (95% CI: 0.793–0.904) and 0.857 (95% CI: 0.803–0.911) for moderate AS, and 0.883 (95% CI: 0.834–0.933), and 0.888 (95% CI: 0.840–0.937) for mild AS. Our study demonstrated that the performance of AS classification based on metrics was superior to that based on the PLAX-2D view in real medical practice.

Limitations

Several limitations to this study should be acknowledged. First, the retrospective dataset was sourced from three hospitals and may not fully represent the general population. Although our test set was derived from a prospective, real-world dataset of consecutively collected echocardiographic studies, the number of AS cases was relatively small due to the low prevalence rate of this disease in the prospective cohort. Second, the AVA was calculated using the continuity equation, rather than being directly measured with a 2D method. This choice is due to the significant interobserver variability associated with the 2D method among operators, a challenge that is particularly pronounced for AI. Third, we excluded low-quality images because when the AI encounters poor-quality images in clinical practice, there may be a risk of interpretation errors, necessitating further manual verification.


Conclusions

Our study developed and evaluated a novel DL framework, demonstrating its utility and effectiveness for detecting and quantifying the severity of AS. The DL algorithm closely matched expert human measurements of all relevant metrics in the assessment of AS severity. In clinical practice, the multiview model with Doppler is crucial for accurate diagnosis in complex cases and formal evaluations, while the single-view model, despite its limitations, is valuable for large-scale screening and rapid emergency assessments. Future research will focus on optimizing and integrating these models to better meet clinical needs.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-415/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-415/dss

Funding: This work was supported by National Natural Science Foundation of China (No. 82202265 to F.Y.).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-415/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Research Ethics Committee of Chinese PLA General Hospital and informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Nkomo VT, Gardin JM, Skelton TN, Gottdiener JS, Scott CG, Enriquez-Sarano M. Burden of valvular heart diseases: a population-based study. Lancet 2006;368:1005-11. [Crossref] [PubMed]
  2. Eveborn GW, Schirmer H, Heggelund G, Lunde P, Rasmussen K. The evolving epidemiology of valvular aortic stenosis. the Tromsø study. Heart 2013;99:396-400. [Crossref] [PubMed]
  3. Thaden JJ, Nkomo VT, Enriquez-Sarano M. The global burden of aortic stenosis. Prog Cardiovasc Dis 2014;56:565-71. [Crossref] [PubMed]
  4. Zakkar M, Bryan AJ, Angelini GD. Aortic stenosis: diagnosis and management. BMJ 2016;355:i5425. [Crossref] [PubMed]
  5. Vahanian A, Beyersdorf F, Praz F, Milojevic M, Baldus S, Bauersachs J, et al. 2021 ESC/EACTS Guidelines for the management of valvular heart disease. Eur Heart J 2022;43:561-632. Erratum in: Eur Heart J 2022;43:2022. [Crossref] [PubMed]
  6. Sermesant M, Delingette H, Cochet H, Jaïs P, Ayache N. Applications of artificial intelligence in cardiovascular imaging. Nat Rev Cardiol 2021;18:600-9. [Crossref] [PubMed]
  7. Litjens G, Ciompi F, Wolterink JM, de Vos BD, Leiner T, Teuwen J, Išgum I. State-of-the-Art Deep Learning in Cardiovascular Image Analysis. JACC Cardiovasc Imaging 2019;12:1549-65. [Crossref] [PubMed]
  8. Ghorbani A, Ouyang D, Abid A, He B, Chen JH, Harrington RA, Liang DH, Ashley EA, Zou JY. Deep learning interpretation of echocardiograms. NPJ Digit Med 2020;3:10. [Crossref] [PubMed]
  9. Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink-Nelson L, Lassen MH, Fan E, Aras MA, Jordan C, Fleischmann KE, Melisko M, Qasim A, Shah SJ, Bajcsy R, Deo RC. Fully Automated Echocardiogram Interpretation in Clinical Practice. Circulation 2018;138:1623-35. [Crossref] [PubMed]
  10. Chen X, Yang F, Zhang P, Lin X, Wang W, Pu H, Chen X, Chen Y, Yu L, Deng Y, Liu B, Bai Y, Burkhoff D, He K. Artificial Intelligence-Assisted Left Ventricular Diastolic Function Assessment and Grading: Multiview Versus Single View. J Am Soc Echocardiogr 2023;36:1064-78. [Crossref] [PubMed]
  11. Wang J, Xie W, Cheng M, Wu Q, Wang F, Li P, Fan B, Zhang X, Wang B, Liu X. Assessment of Transcatheter or Surgical Closure of Atrial Septal Defect using Interpretable Deep Keypoint Stadiometry. Research (Wash D C) 2022;2022:9790653. [Crossref] [PubMed]
  12. Chollet F, editor. Xception: Deep Learning with Depthwise Separable Convolutions. Conference on Computer Vision and Pattern Recognition (CVPR); 2017:1251-8.
  13. Xie S, Sun C, Huang J, Tu Z, Murphy K, editors. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. European Conference on Computer Vision (ECCV); 2018:305-21.
  14. Ronneberger O, Fischer P, Brox T, editors. U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical image computing and computer-assisted intervention. Cham: Springer; 2015.
  15. Li X, Ding L, Li W, Fang C, editors. FPGA accelerates deep residual learning for image recognition. 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC); Chengdu, China. 2017:837-40.
  16. Ouyang D, He B, Ghorbani A, Yuan N, Ebinger J, Langlotz CP, Heidenreich PA, Harrington RA, Liang DH, Ashley EA, Zou JY. Video-based AI for beat-to-beat assessment of cardiac function. Nature 2020;580:252-6. [Crossref] [PubMed]
  17. Huang MS, Wang CS, Chiang JH, Liu PY, Tsai WC. Automated Recognition of Regional Wall Motion Abnormalities Through Deep Neural Network Interpretation of Transthoracic Echocardiography. Circulation 2020;142:1510-20. [Crossref] [PubMed]
  18. Gaillard E, Kadem L, Clavel MA, Pibarot P, Durand LG. Optimization of Doppler echocardiographic velocity measurements using an automatic contour detection method. Ultrasound Med Biol 2010;36:1513-24. [Crossref] [PubMed]
  19. Sengupta PP, Shrestha S, Kagiyama N, Hamirani Y, Kulkarni H, Yanamala N, Bing R, Chin CWL, Pawade TA, Messika-Zeitoun D, Tastet L, Shen M, Newby DE, Clavel MA, Pibarot P, Dweck MRArtificial Intelligence for Aortic Stenosis at Risk International Consortium. A Machine-Learning Framework to Identify Distinct Phenotypes of Aortic Stenosis Severity. JACC Cardiovasc Imaging 2021;14:1707-20. [Crossref] [PubMed]
  20. Strange G, Stewart S, Watts A, Playford D. Enhanced detection of severe aortic stenosis via artificial intelligence: a clinical cohort study. Open Heart 2023;10:e002265. [Crossref] [PubMed]
  21. Yang F, Chen X, Lin X, Chen X, Wang W, Liu B, et al. Automated Analysis of Doppler Echocardiographic Videos as a Screening Tool for Valvular Heart Diseases. JACC Cardiovasc Imaging 2022;15:551-63. [Crossref] [PubMed]
  22. Krishna H, Desai K, Slostad B, Bhayani S, Arnold JH, Ouwerkerk W, Hummel Y, Lam CSP, Ezekowitz J, Frost M, Jiang Z, Equilbec C, Twing A, Pellikka PA, Frazin L, Kansal M. Fully Automated Artificial Intelligence Assessment of Aortic Stenosis by Echocardiography. J Am Soc Echocardiogr 2023;36:769-77. [Crossref] [PubMed]
  23. Holste G, Oikonomou EK, Mortazavi BJ, Coppi A, Faridi KF, Miller EJ, Forrest JK, McNamara RL, Ohno-Machado L, Yuan N, Gupta A, Ouyang D, Krumholz HM, Wang Z, Khera R. Severe aortic stenosis detection by deep learning applied to echocardiography. Eur Heart J 2023;44:4592-604. [Crossref] [PubMed]
Cite this article as: Yang F, Zhang Y, Gao Y, Wang B, Lin X, Chen X, Wang Q, Zhang M, Li X, Liu B, Zhang P, He K, Zhang L. Deep learning-assisted aortic stenosis detection and grading based on multiview versus single-view echocardiography. Quant Imaging Med Surg 2025;15(11):11192-11204. doi: 10.21037/qims-2025-415

Download Citation