Formal validation of a deep learning-based automated interpretation system for cardiac structure and function in adult echocardiography

Guijuan Peng; Rongbo Ling; Xiaohua Liu; Qian Liu; Xiaofang Zhong; Yuanyuan Sheng; Yingqi Zheng; Shuyu Luo; Yumei Yang; Xiaoxuan Lin; Keming Tang; Jialan Zheng; Lixin Chen; Dong Ni; Jinfeng Xu; Yingying Liu; Wufeng Xue

doi:10.21037/qims-24-1852

Original Article

Formal validation of a deep learning-based automated interpretation system for cardiac structure and function in adult echocardiography

Guijuan Peng^1#, Rongbo Ling^2,3,4#, Xiaohua Liu^1#, Qian Liu¹, Xiaofang Zhong¹, Yuanyuan Sheng¹, Yingqi Zheng¹, Shuyu Luo¹, Yumei Yang¹, Xiaoxuan Lin¹, Keming Tang^2,3,4, Jialan Zheng^2,3,4, Lixin Chen¹, Dong Ni^2,3,4, Jinfeng Xu¹, Yingying Liu¹, Wufeng Xue^2,3,4

¹Department of Ultrasound, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, China; ²The National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; ³Medical Ultrasound Image Computing (MUSIC) Laboratory, Shenzhen University, Shenzhen, China; ⁴Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China

Contributions: (I) Conception and design: Y Liu, G Peng, X Liu; (II) Administrative support: J Xu, W Xue, D Ni; (III) Provision of study materials or patients: G Peng, X Liu, Q Liu, X Zhong, Y Sheng, S Luo; (IV) Collection and assembly of data: Y Zheng, Y Yang, X Lin, J Zheng; (V) Data analysis and interpretation: R Ling, K Tang, L Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Jinfeng Xu, PhD; Yingying Liu, PhD. Department of Ultrasound, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital, Southern University of Science and Technology), 1017 Dongmen North Road, Luohu District, Shenzhen 518020, China. Email: jinfengxu@ext.jnu.edu.cn; yingyingliu@ext.jnu.edu.cn; Wufeng Xue, PhD. The National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical Ultrasound Image Computing (MUSIC) Laboratory, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, A5-502, Lihu Campus Shenzhen University, Xueyuan Road, Nanshan, Shenzhen 518037, China. Email: xuewf@szu.edu.cn.

Background: Accurate measurement of cardiac structure and function is the basis of diagnosis of cardiac diseases, but it is time-consuming and empirically-dependent. This study attempted to propose a deep learning (DL) interpretation of cardiac structure and function.

Methods: The training dataset consisted of 416 video loops and 892 Doppler images drawn from 141 patients undergoing clinical echocardiography from 2020 to 2021. Two experts labeled these images using the Pair platform. From this, DL algorithms including the Auto-Echo and Auto-Doppler were trained to measure echocardiographic parameters. Subsequently, eight sonographers with different years of echocardiographic experience labeled a validation dataset of 178 new video loops and 391 Doppler images obtained from 60 new patients. One highly trained expert annotated the external validation dataset of 90 two-dimensional (2D) videos and 120 Doppler images. The standard deviation ratio (SD ratio), Bland-Altman analysis, interclass correlation coefficient (ICC), mean absolute deviation (MAD), absolute relative deviation, and correlation analysis were employed to investigate the agreement between DL and human experts.

Results: For the structure parameters’ measurements including four-chamber dimensions, the SD ratios ranged from 0.70 to 1.02, and the ICCs showed that automated measurements were equivalent or superior to human expert measurements. The correlation coefficients were greater than 0.85 for 83.3% of the parameters, the MADs ranged from −1.5 to 1.9 mm, and the absolute relative deviations ranged from 2.5% to 9.7%. However, large absolute deviations were observed for parameters in RV-A4C view and RV, which was consistent with human readers. For Doppler parameters, including four transvalvular velocity measurements, the correlation coefficients ranged from 0.81 to 0.99, the absolute relative deviation of all pulse Doppler parameters was within 10%, and 100% (9/9) of tissue Doppler parameters were within 5%. However, the velocity-time integral (VTI) of transvalvular velocity showed large absolute relative deviations between the automated and manual measurements. Auto-Echo saved 95.4% and Auto-Doppler saved 82.5% analysis time upon human experts. In the external validation cohort, the mean absolute relative deviation for almost all structural parameters and Doppler parameters was within 10%.

Conclusions: The measurements of our DL interpretation had high accuracy, increased efficiency of the examination, and were inter-changeable with human experts’ assessment. It has shown human-like patterns of measurements, as the same trend of difference can be observed between DL and different experienced readers.

Keywords: Deep learning (DL); echocardiography; cardiac chamber; quantification; pulsed-wave Doppler

Submitted Sep 02, 2024. Accepted for publication Feb 21, 2025. Published online Mar 28, 2025.

doi: 10.21037/qims-24-1852

Introduction

The quantification of cardiac chamber size and function is pivotal for the diagnosis and prognosis of cardiac diseases, especially valvular heart diseases (1), cardiomyopathy (2,3), and transcatheter aortic valve implantation/replacement (TAVI/TAVR) (4). Due to its advantage of noninvasively assessing cardiac structure, function, and hemodynamics, echocardiography is the first-line imaging modality in the diagnosis of cardiac disease. However, the increase in cardiovascular diseases has led to a massive surge in the number of tests being performed and millions of echocardiographic fragments and images being processed (5). This is putting strain on the limited workforce of qualified physicians to provide accurate measurements on time. The quantitative evaluation of echocardiography requires extensive expertise and training, posing challenges to its widespread use in primary care and rural settings.

Quantitative evaluation of cardiac structure and function requires experts to measure end-systolic or end-diastolic diameter, and to trace the contour of pulse wave waveforms. This practice proved to be too tedious and time-consuming for routine use (6). Accurate and reliable comprehensive measurements are dependent on operator experience, which can result in significant inter- and intra-observer variability (7,8). To reduce the variability, it is suggested that the anatomical parameters and spectral measurements be measured for more than one cardiac cycle in cases with normal sinus rhythm and at least five cardiac cycles in atrial fibrillation (4,9). This compounds the above mentioned challenges. Further, the easy accessibility of ultrasound expands its application to primary medical institutions and emergency agencies. Physicians who may not be sufficiently educated and trained are required to perform the test and interpret the exams, which can result in misdiagnoses. Thus, accurate and timely measurements are becoming major problems.

Recent advances in automated image interpretation involve the whole pipeline of echocardiogram imaging and diagnosis, including image acquisition (10), view classification (11,12), structure segmentation (13-15), functional assessment and quantification (14-18), disease diagnosis (13,15,19-21), and prediction (22). However, these attempts focus solely on the quantification of limited cardiac structural parameters and systolic function or require manual identification of the keyframes first (15-17). Therefore, a fully automated workflow to comprehensively assess cardiac structure and function is still a crucial and unmet need.

Accordingly, we developed a fully automated solution to assess cardiac structure and function based on multiple standard echocardiographic videos and Doppler images. We used automatic annotators based on convolutional neural networks (CNNs) to extract clinically relevant frames (the end-systolic or end-diastolic keyframes) and quantify cardiac structure diameter (such as four chambers’ size, valvular annulus’ diameter), cardiac function relevant parameters (such as mitral inflow E wave, the tissue Doppler e’ wave of the mitral annulus, the tissue Doppler s wave of the tricuspid annulus), and hemodynamics information [such as the peak transvalvular velocity, the velocity-time integral (VTI) of transvalvular velocity]. This study aimed to develop such an algorithm for the quantification of cardiac structure diameter and cardiac function and validate the performance of the measurement prediction (Figure 1: central illustration). We present this article in accordance with the PRIME reporting checklist (23) (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1852/rc).

Figure 1 Central illustration. Formal validation of a deep learning-based automated interpretation of cardiac structure and function parameters in adult echocardiography. Model-derived echocardiographic measures were inter-changeable with human experts’ assessment. It has shown human-like patterns of measurements. AI, artificial intelligence; Ao-a, the diameter of the aortic annulus; Ao-s, the diameter of aortic sinus; LA-ap, the anteroposterior dimension of left atrium; LA-l, the long-axis dimension of the left atrium; LA-t, the transverse dimension of the left atrium; LV-ap, the anteroposterior dimension of left ventricle; LVID base-t, the basal transverse dimension of the left ventricle; LVID middle-t, the midtransverse dimension of the left ventricle; LVID-l, the long-axis dimension of the left ventricle; MV-ap, anteroposterior dimension of the mitral annulus; MV-t, transverse dimension of the mitral annulus; RA-l, the long-axis dimension of the right atrium; RA-t, the transverse dimension of the right atrium; RV base-t, the basal transverse dimension of the right ventricle; RV middle-t, the mid-transverse dimension of the right ventricle; RV-l, the long-axis dimension of the right ventricle; RV-t, the transverse dimension of the right ventricle; RVOT, right ventricular outflow tract; GCN, graph convolutional network; SD, standard deviation.

Methods

Study participants and image acquisition

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Ethical approval was granted by the Ethics Board of Shenzhen People’s Hospital (No. LL-KY-2023103-01) for using the deidentified echocardiographic and patient demographic data, and the requirement for individual consent for this retrospective analysis was waived.

The dataset consisted of 201 cases selected from 302 cases with standard echocardiographic videos and the synchronous electrocardiogram (ECG) from Shenzhen People’s Hospital between November 2020 and September 2021. Patients aged <18 years (15 patients), and those with arrhythmia (32 patients), without ECG (27 patients), incomplete videos (10 patients), and poor image quality (two or more myocardium segments were not clear, 17 patients) were excluded. Echocardiography was performed using a Vivid E95 Color Doppler Ultrasound Machine (GE Healthcare, Chicago, IL, USA), M5Sc heart probe (frequency 1.5–4.6 MHz) and connected to the synchronous ECG. The frame rate of two-dimensional (2D) videos was set between 51 and 70 frames/second, and four consecutive cardiac cycles were collected. Finally, 594 standard 2D echocardiographic videos and 1,283 Doppler images were included. The artificial intelligence (AI) model was trained and validated with this dataset.

An external validation dataset was established including 30 consecutive echocardiographic studies obtained between 1 January 2022 and 31 January 2022, from the Shenzhen People’s Hospital. Echocardiography was performed using Phillips EPIQ7C system (Philips Ultrasound, Bothell, WA, USA), S5-1 heart probe (frequency 1–5 MHz), and connected to the synchronous ECG.

The following 2D video loops were included: (I) parasternal long-axis view (PLAX); (II) apical four-chamber view (A4C); and (III) focused right ventricular apical four-chamber view (RV-A4C). The following Doppler images were included: (I) aortic valve-pulse wave (AVPW); (II) pulmonary valve-pulse wave (PVPW); (III) mitral valve-pulse wave (MVPW); (IV) tricuspid valve-pulse wave (TVPW); (V) the septal side of the mitral annulus by pulsed-wave Doppler tissue imaging (MVS-DTI); (VI) the lateral side of the mitral annulus by pulsed-wave Doppler tissue imaging (MVL-DTI); and (VII) tricuspid annulus by pulsed-wave Doppler tissue imaging (TVL-DTI). A sample echocardiographic image is shown in Figure S1. For the external validation dataset, the Doppler tissue image (DTI) was not included.

Data annotation

A total of 141 patients were randomly selected from the 201 dataset, and 416 2D videos and 892 Doppler images were included in the training dataset. A total of 178 2D videos and 391 Doppler images of 60 patients were included in the internal validation dataset. A total of 90 2D videos and 120 Doppler images were included in the external validation dataset. The validation dataset did not participate in the neural network training. The baseline characteristics of the training and validation dataset are shown in Table S1.

The original Digital Imaging and Communications in Medicine (DICOM) images were converted into AVI and PNG formats for annotation, and the public data processing and annotation platform Pair (https://aipair.com.cn/en/) was used to label the measurement-related structures and landmarks manually. All parameters were measured strictly in accordance with relevant guidelines and specifications (4,9,21,24-26), as shown in Tables S2,S3. Annotations of the structural parameters were performed both in the end-diastole (ED) and end-systole (ES) frames in one complete cardiac cycle. Meanwhile, the keyframes of ED and ES frames were annotated. For Doppler modalities, velocity trace and view-specific annotations were completed. The image annotation of the training dataset was performed by two highly trained experts with more than 15 years of experience. The internal validation dataset was annotated one by one by eight readers with different years of echocardiographic experience. The eight readers included two experts with more than 15 years of experience, three moderately experienced readers with less than eight years, and three less experienced readers with less than two years. The external validation dataset was annotated by one highly trained expert.

The deep learning (DL) workflow

The DL workflow was a supervised learning framework that included the Auto-Echo and the Auto-Doppler. The Auto-Echo was trained to automatically provide parameters related to cardiac structure in each frame of the whole sequence and then to identify clinically relevant frames in the sequence. The Auto-Doppler was trained to automatically measure cardiac function and hemodynamics. The system network is shown in Figure 2.

Figure 2 System network. The CNNs were trained on the training dataset of 416 2D videos and 892 Doppler images. Finally, we assessed its performance on a new internal validation dataset of 178 2D videos and 391 Doppler images labeled by 8 doctors with varying experience, and an external validation dataset of 90 2D videos and 120 Doppler images labeled by one highly trained expert. 2D, two-dimensional; CNNs, convolutional neural networks; ECG, electrocardiogram; PW-DTI, pulsed-wave Doppler tissue imaging.

The Auto-Echo

Landmark localization and keyframe identification are pivotal for the automated measurements of cardiac structure. With low resolution and poor image quality of echocardiogram, the CNNs may yield ambiguous points and result in inaccurate localization by predicting the landmarks’ heatmaps. We proposed to leverage the topology representation of landmarks and combine the CNN with the graph convolutional network (GCN), as shown in Figure 3. We developed a five-layer CNN as the encoder and extracted multiscale image features to reduce the prediction errors in each scale recurrently. A shared GCN module was designed to learn topology-aware graph representation from the previous scale’s graph representation and the current scale’s CNN features. A shared fully connected layer was used to regress the coordinate offset of each scale sequentially, which decreased landmark localization error.

Figure 3 Overview of landmark localization framework for Auto-Echo. The CNN encoder (blue) was used to extract multiscale image features. The representational information of the landmarks’ topology graph was learned by the GCN module. The fully connected layer (green) regresses landmarks’ coordinates offsets. The initial landmarks coordinating x_0, y_0 are the mean of the training data label. CNN, convolutional neural network; GCN, graph convolutional network.

We developed a program to identify these keyframes, including ED and ES frames. For keyframe identification, we proposed exploiting the motion curves derived from previous landmarks and the ECG signal to improve accuracy. The workflow of keyframe identification is shown in Figure 4. Firstly, the full sequence of echocardiography was passed through the landmark localization network and measurement curves were obtained. Then, a cardiac motion curve was fused by the measurement curves according to the cardiac motion, and the position of the extremum on the curve was identified as the keyframes (maximum, ED; minimum, ES). Meanwhile, we extracted the ECG from echocardiography and recognized the peak of the R wave of the ECG to assist the end-diastole detection.

Figure 4 Overview of keyframe detection workflow. The measurement curves were obtained by a landmark localization network. The ECG signal was extracted from the ultrasound image by the image process method. The red point in the motion curve is the result of keyframe detection. ECG, electrocardiogram.

With landmark localization and keyframe detection, a fully automatic quantification system of the cardiac structure from the whole echocardiographic sequence can be realized without any manual assistance.

The Auto-Doppler

For measurements of the pulsed-wave (PW) Doppler images, the classical UNET was used to segment the contours in the transvalvular spectrum first and then the relevant parameters were calculated automatically from the contours. In the segmentation phase, the spectrum of each cardiac cycle (region of interest, ROI) was first cropped according to the periodic information of the ECG below the spectral image. By analyzing the results, we can obtain information such as the vertex position and area of the waveform, from which parameters such as peak transvalvular velocity (V_max), mean transvalvular velocity (V_mean), and VTI can be calculated automatically.

For measurements of the DTIs, the ROIs were first obtained by the same cropping as above. Then the waveform was identified by thresholding-based artifacts suppression, edge extraction, and area filling. The peak speed was automatically measured from these waveforms. The overall model framework is shown in Figure 5.

Figure 5 Model design for Auto-Doppler. For the PW, taking the AVPW as an example: After the region of interest cropped from the spectrum image, the UNET segmentation model was used to segment the contour of the ROI, from which the function measurements were automatically calculated. For the DTI part, taking the MVL as an example, the contour of the ROI was filled by traditional image processing methods and was used to automatically measure the relevant metrics. AVPW, aortic valve-pulse wave; DTI, Doppler tissue image; ECG, electrocardiogram; LVDFT, left ventricular diastolic filling time; LVET, left ventricular ejection time; MVL, mitral valve lateral; MVS, mitral valve septal; MVPW, mitral valve-pulse wave; PW, pulsed-wave Doppler image; PVPW, pulmonary valve-pulse wave; ROI, region of interest; RVDFT, right ventricular diastolic filling time; RVET, right ventricular ejection time; TVL, tricuspid valve lateral; TVPW, tricuspid valve-pulse wave; VTI, velocity-time integral.

Statistical analysis

The primary outcome was the variability comparison between DL predictions and human measurements. In this work, we used the individual standard deviation ratio (SD ratio) as the study’s primary outcome. The SD ratio can be calculated as SD ratio = SD between human experts and DL algorithms/SD between human experts. The expected value of the SD ratio is 1.0 if the differences between DL algorithms and human experts have the same variability as the differences between human experts. The expected value of the SD ratio is less than 1.0 if the DL algorithms have lower variability with human experts. The same algorithm and interpretation apply to the variability differences between different experienced doctors and experts. The mean intrarater interclass correlation coefficient (ICC) was used to explore the agreement between automated and human expert measurements. During automated measurement verification, the average measurement by two experts was regarded as the expert consensus and the reference measurement value. For each measurement, the SD between the predictions and the expert consensus was calculated. The Bland-Altman analysis was used to evaluate their agreement, and the mean error and limits of agreement (LOAs, mean error ±1.96 SD), and three quantiles of mean absolute deviation (MAD), absolute relative deviation (50th, 75th, and 90th percentiles) were calculated.

One-way analysis of variance or the Kruskal-Wallis test was used for comparison of DL predictions and readers with different experiences. The Bonferroni test was used for subsequent pairwise comparisons, and the paired t-test or Wilcoxon signed-rank test was used for subgroup analysis. All statistical analyses were performed using open-source statistical Python packages (SciPy 1.5.4 and Stats models 0.12.1; https://pypi.org/) and SPSS 26.0 (IBM Corp., Armonk, NY, USA).

Results

Performance of Auto-Echo

Performance of keyframe identification

As shown in Figure 6, the Auto-Echo had high accuracy in the keyframe identification in the internal validation dataset. More than 87% of the difference was within three frames, and more than 97.5% were within five frames. The prediction accuracy of the ED frame was better than that of the ES frame. The average heart rate in the test set was 70 beats/minute, and the average frame rate of 2D echocardiography was 65 frames/second. The average frame number per cardiac cycle was 56; thus, the difference between 5 frames (8.9% of one cardiac cycle) was about 76 milliseconds.

Figure 6 The result of Auto-Echo in keyframes identification. Different colors indicate the number of frames that differ from expert annotations. A4C, apical four-chamber view; ED, end-diastole; ES, end-systole; PLAX, parasternal long-axis view; RV-A4C, right ventricular apical four-chamber view.

Cardiac structure assessments of Auto-Echo in the internal validation dataset

Tables S4,S5 and Figures 7-9 show the results of the measurement variability of DL and human measurements. The SD ratio ranged from 0.70 for the basal transverse dimension of the right ventricle (RV base-t) to 1.02 for the transverse dimension of the right atrium (RA-t), with 94.4% (17/18) below the success criterion of 1.0. From the results in Table S5, the ICCs between automated results and human measurements were higher than those between human experts for all measurements. The SD of the difference between Auto-Echo measurements and the expert consensus ranged from 1.2 mm for Ao-a and Ao-s to 4.6 mm for RV-l, and the correlation ranged from r=0.81 for TV-t to a correlation of r=0.97 for LA-ap and LVID-l, with 83.3% (15/18) larger than 0.85. The absolute relative deviations ranged from 2.5% for AO-s to 9.7% for RV middle-t, and 66.7% (12/18) of 90th percentiles of absolute relative errors were within 15%.

Figure 7 The SD ratio of deep learning and human measurements in cardiac structure and function parameters. (A) The SD ratio of Auto-Echo, readers with different experiences, and human experts. (B) The SD ratio of Auto-Doppler, readers with different experiences, and human experts. For the DL in blue, the moderately experienced doctor in orange, and the less experienced doctor in gray. Ao-a, the diameter of the aortic annulus; DL, deep learning; LA-ap, the anteroposterior dimension of left atrium; LA-l, the long-axis dimension of the left atrium; LA-t, the transverse dimension of the left atrium; LVID base-t, the basal transverse dimension of the left ventricle; LVID middle-t, the mid-transverse dimension of the left ventricle; LVID-ap, the anteroposterior dimension of left ventricle; LVID-l, the long-axis dimension of the left ventricle; RA-t, the transverse dimension of the right atrium; RV base-t, the basal transverse dimension of the right ventricle; RV middle-t, the mid-transverse dimension of the right ventricle; RV-l, the long-axis dimension of the right ventricle; SD ratio, standard deviation ratio.

Figure 8 The correlation analysis of cardiac structural diameter between automated (Auto-Echo) and expert consensus. Ao-s, the diameter of aortic sinus; LA-ap, the anteroposterior dimension of left atrium; LA-l, the long-axis dimension of the left atrium; LA-t, the transverse dimension of the left atrium; LVID-ap, the anteroposterior dimension of left ventricle; LVID base-t, the basal transverse dimension of the left ventricle; LVID middle-t, the mid-transverse dimension of the left ventricle; LVID-l, the long-axis dimension of the left ventricle; RA-t, the transverse dimension of the right atrium; RV base-t, the basal transverse dimension of the right ventricle; RV middle-t, the mid-transverse dimension of the right ventricle; RV-l, the long-axis dimension of the right ventricle.

Figure 9 The Bland-Altman analysis of Auto-Echo and expert consensus. Ao-a, the diameter of the aortic annulus; LA-ap, the anteroposterior dimension of left atrium; LA-l, the long-axis dimension of the left atrium; LA-t, the transverse dimension of the left atrium; LVID base-t, the basal transverse dimension of the left ventricle; LVID middle-t, the mid-transverse dimension of the left ventricle; LVID-ap, the anteroposterior dimension of left ventricle; LVID-l, the long-axis dimension of the left ventricle; RA-t, the transverse dimension of the right atrium; RV base-t, the basal transverse dimension of the right ventricle; RV middle-t, the mid-transverse dimension of the right ventricle; RV-l, the long-axis dimension of the right ventricle.

Performance of Auto-Echo versus doctors with different experiences in the internal validation dataset

Comparing five readers with different experiences, the SD ratios of Auto-Echo were lower than the SD ratio of different experienced doctors for all parameters (Table S4, Figure 7A). The absolute relative deviation showed that automated measurements were lower or equivalent to those of moderately experienced doctors (Table S6, Figure 10).

Figure 10 The absolute relative deviation between Auto-Echo, doctors with different experiences, and experts consensus for cardiac structure diameter measurements. The absolute relative deviation compared between expert consensus and Automated (Auto-Echo), different experienced doctors for the Auto-Echo in blue, highly experienced doctor in red, moderately experienced doctor in orange, and the less experienced doctor in gray. Ao-a, the diameter of the aortic annulus; Ao-s, the diameter of aortic sinus; LA-ap, the anteroposterior dimension of left atrium; LA-l, the long-axis dimension of the left atrium; LA-t, the transverse dimension of the left atrium; LVID base-t, the basal transverse dimension of the left ventricle; LVID middle-t, the mid-transverse dimension of the left ventricle; LVID-ap, the anteroposterior dimension of left ventricle; LVID-l, the long-axis dimension of the left ventricle; RA-t, the transverse dimension of the right atrium; RV base-t, the basal transverse dimension of the right ventricle; RV middle-t, the mid-transverse dimension of the right ventricle; RV-l, the long-axis dimension of the right ventricle.

Besides, for all the structure parameters, the same trend of difference can be observed among DL, human experts, and different experienced doctors (Table S6, Figure 10). For example, large absolute deviations were observed for parameters in RV-A4C view and RV, which was consistent with human readers.

Subgroup analysis of Auto-Echo performance in clinically relevant frames and different image qualities in the internal validation dataset

To explore the influence of phase in Auto-Echo performance, we compared the absolute relative deviations between ED and ES frames. The results showed that Auto-Echo could perform stably in most parameters for ED and ES frames. However, the absolute relative deviations of ROVT-pro, LV-ap, LVID-l, RV base-t, and RV middle-t in the ED frame were significantly smaller than that in the ES frame, whereas LA-ap and LA-t were opposite, as shown in Table S6 and Figure S2.

For the effect of image qualities (Table S6), litter influence was observed for most parameters. Increased deviations of LVID-l and RV middle-t were observed when the image quality was poor.

Cardiac structure assessments of Auto-Echo in the external validation dataset

For the external validation dataset, as shown in Table S7, the SD of the difference between automated measurements and the expert measurements ranged from 1.4 mm for Ao-a to 4.4 mm for RV middle-t, and the correlation ranged from r=0.53 for TV-t to a correlation of r=0.96 for LA-ap and LA-l. The absolute relative deviations ranged from 3.6% for AO-s to 12.5% for RV base-t. Overall, without additional adjustments, the Auto-Echo also yielded accurate measurements in the external validation cohort. The trend of measurement differences was similar to that observed in the internal validation cohort, such as the larger measurement errors for right ventricular parameters compared to other parameters.

Cardiac function assessments of the Auto-Doppler in the internal validation dataset

The SD ratio ranged from 0.61 for MVL _e’ to 1.90 for MVS_a’, with 63.3% (19/30) below the success criterion of 1.0 (Table S8, Figure 7B). The ICCs for all comparisons improved for most measurements when we added the comparisons between the automated and human experts to comparisons between human experts. The correlation between automated measurements with the expert consensus ranged from r=0.81 for RVET to r=0.99 for MVL-a’, as shown in Table S9. The absolute relative error ranged from 1.6% for TVL-S to 8.6% for MV-VTI, and within 10% for all 30 prespecified parameters, as shown in Table S9.

Performance of Auto-Doppler versus that of doctors with different experiences in the internal validation dataset

The absolute relative deviation in Table S10 showed that Auto-Doppler was lower or equivalent to moderately experienced doctors, which showed the Auto-Doppler tracing transvalvular velocity contours were better aligned with those detected by experts than differently experienced doctors (Table S10, Figure 11). However, for the same parameters with large measurement variabilities for human readers, large measurement deviations were also observed in DL estimation, such as VTI of transvalvular velocity.

Figure 11 The absolute relative deviation between Auto-Doppler, doctors with different experiences, and experts’ consensus for cardiac function measurements. The absolute relative deviation compared between expert consensus and Automated (Auto-Doppler), different experienced doctors for the Auto-Doppler in blue, highly experienced doctor in red, moderately experienced doctor in orange, and the less experienced doctor in gray. AV, aortic valve; LVDFT, left ventricular diastolic filling time; LVET, left ventricular ejection time; MV, mitral valve; PV, pulmonary valve; RVDFT, right ventricular diastolic filling time; RVET, right ventricular ejection time; TV, tricuspid valve; VTI, velocity-time integral.

Comparison of measurement time in the internal validation dataset

The measurement time of Auto-Echo was significantly shorter than the labeling time of the experts (19.1 vs. 418.5 s, P<0.001). The Auto-Doppler automatic measurement time of the four valves’ velocity and DTI of valve annulus was also significantly less than the labeling time of experts (43.1 vs. 246.1 s, P<0.001). The DL interpretations saved 95.4% and 82.5% time compared with human experts.

Cardiac function assessments of the Auto-Doppler in the external validation dataset

For the external validation dataset, as shown in Table S11, the ICCs for all measurements were greater than 0.75, and the correlation between automated measurements with the expert consensus ranged from r=0.73 for the A-wave of the tricuspid inflow (TVA) to r=0.99 for the velocity time integral of aortic valve (AV-VTI). The mean absolute relative error for all measurements was less than 10% and ranged from 2.2% for the peak transvalvular velocity of aortic valve (AV-V_max) to 9.3% for left ventricular ejection time (LVET). The VTI of transvalvular velocity showed large absolute relative deviations between the automated and manual measurements, which was similar to that observed in the internal validation.

Discussion

We developed a DL model for the automated measurement of cardiac structure and function from multi-type echocardiographic images and evaluated the performance of the measurement prediction in the internal and external validation datasets. The most important finding was that the variabilities between automated measurements and human experts are smaller or similar to variability in measurements of moderately experienced doctors for the same echocardiographic images. The MAD between automated and human expert measurements was smaller than the difference among different experienced readers for most parameters assessment. Our DL workflow has shown human-like patterns of measurements, in that large differences were observed for some parameters for which large measurement variabilities were observed in human readers. The analysis time of a full cardiac structure diameter and Doppler measurement study was about 1 minute. These results demonstrated the potential of DL algorithms to automate the assessment of multiple echocardiographic parameters in multiple cardiac cycles, which achieved inter-changeable measurements with human experts’ assessment and significantly improved time efficiency in the tedious clinical workflow.

AI-based fully automatic cardiac structure with experts-level performance

Human measurements of cardiac structure and function commonly rely on the selection of videos, beats, frames, and placement of measuring points, which is tedious, time-consuming, and leads to large variability. Previous attempts using DL for automated echocardiography interpretation showed that DL algorithms could automatically interpret cardiac diameter (14,27,28) and cardiac volumes (15,29). However, most studies have focused on a limited number of parameters with manually identifying keyframes. The algorithm in this study is a fully automated workflow that combines automated keyframe identification and many echocardiographic cardiac structure parameters, with a small disagreement between automated measurements and human experts.

In the present study, the SD ratios were below or near 1.0 for all cardiac structure parameters, indicating that the disagreement between the DL and human expert measures is lower or similar to the disagreement between the human experts. The ICCs for all comparisons improved when we added the comparisons between the automated and each human expert to comparisons among human experts. Compared with different experienced doctor measurements, the SD ratios of DL-based measurements were significantly lower, and the absolute relative deviation was lower or similar for all parameters. Therefore, differences in measurements among different experienced doctors were often more significant than the differences between DL and human expert measurements.

The DL algorithms achieved inter-changeable measurements with human experts’ assessment, which were superior or comparable accuracy to that of previous studies. For example, Howard et al. developed algorithms for automated estimation of LVID on the images of ED and ES frames, and the SD of the LVID was 3.5 mm (28). In the DL study of Duffy et al. on the video of the PLAX, the average error of left ventricle diameter measurement was 2.4–3.8 mm (27). Zhang et al. developed models with automatic measurement of left ventricle (LV) structural parameters (including LV mass, LV diastolic volume, and LV volume), and the relative error of those parameters ranged from 15% to 17% (15). Tromp et al. compared a DL interpretation of 23 echocardiographic parameters with measurements of sonographers, the correlation of the left ventricular internal diameter in diastole (LVIDd), the left ventricular internal diameter in systole (LVIDs), and the right ventricular internal diameter in diastole (RVIDd) was 0.91, 0.93, and 0.64 respectively with LOAs of 7.29, 7.8 and 11.54 mm (29). In the present study, for all diameter measurements, the average error was within 1.9 mm, and the median relative error was within 9.7%.

AI-based fully automatic cardiac function measurement with expert-level or moderately experienced doctor-level performance

Doppler echocardiography is the only non-invasive method that can provide hemodynamic information (9). Cardiac Doppler measurements require considerable training and time investment, which might not always be readily available outside the cardiology department or in resource-limited settings. Previous attempts that used DL to automate the measurement of Doppler echocardiography focused on mitral valve velocity for evaluating diastolic function. The algorithm in this study is the first to automatically analyze multiple echocardiographic Doppler images by tracing pulse wave waveforms, providing metrics related to the blood and tissue velocity of all four valves. The current study found that automated cardiac Doppler measurements were technically feasible, significantly shortened study time, and are comparable to the previous study which reported that the correlation coefficients of mitral E ranged from 0.71 to 0.97 and the correlation coefficient of AV-V_maxwas 0.97 (14,30). Importantly, this DL interpretation can function as part of a learning healthcare system by forming an accurate and rapid spectrum assessment in multiple cycles, which is recommended by guidelines.

DTI is a technique to detect and evaluate myocardial motion. DTI measurements of the mitral annulus and tricuspid annulus are important parameters for assessing left ventricular diastolic function (9,25) and right ventricular function (4,9), separately. The algorithm in this study automatically analyzed echocardiographic DTIs by providing metrics related to the velocity of the mitral annulus and tricuspid annulus. The results of this study indicate that the measurement accuracy is comparable to that of experienced doctors with a very low error. It is comparable or superior to a previous study which reported that the correlation coefficient of DTI measurements of mitral annulus varied from 0.67 to 0.96 (14,29). Reduction in interobserver variabilities: the use of automated measurements reduced interobserver variabilities between experts and beginners, indicating that it can also be used as an educational tool.

The human-like pattern of AI-based fully automatic prediction deviations

The DL workflow has shown human-like patterns of measurements, as the same trend of difference can be observed among DL, human experts, and different experienced readers. For example, the performance of our new algorithm differs in the three measurement views and four cardiac chambers, with a smaller absolute deviation in the PLAX view and larger RV than others, which was consistent with human readers. This finding is consistent with that of Howard et al., who predicted the key points of LV diameter measurement and found that the variation of LV transverse automated measurement (parallel direction of the acoustic beam) was smaller than that in longitudinal measurement (vertical direction of the acoustic beam) (28). This result may be explained by the fact that the resolution is maximum in the direction of the parallel beam of ultrasound imaging and the landmark localization of measurement parameters in the PLAX view was the parallel direction of the acoustic beam. The RV has a more complex shape, a smaller cavity, and richer muscle trabeculae than the LV, which leads to difficulty in identifying the RV apex and endocardium accurately. This is consistent with the guidelines that note that the measurement of right ventricular dimensions can vary widely due to the lack of a fixed reference point to optimize the measurement of RV-related parameters (21). Significant variability of VTI measurement was shown in DL interpretation and doctors with different experiments for all the parameters of valve velocity measurements. A possible explanation for this is that the measurement of VTI requires one familiar with the spectral characteristics of blood flow in each valve, to precisely define the start/end time of blood flow and the changing trend of blood flow velocity. Any variations of each step may eventually lead to large inter- and intra-observer variability. For those parameters with poor repeatability and highly experienced dependence, the result in the current study was very encouraging in that the model was found to have smaller variability than human experts.

The model explicitly determines the key points on the video and the tracing of blood flow velocity; physicians can directly verify how accurate the detection is and confirm the suitability of the automated measurements. Therefore, our DL model increased the interoperability of the prediction of cardiac structure and function.

The effects of phase and image quality on the performance of cardiac structure estimation

Previous attempts to automate echocardiogram measurements have added a workforce to identify keyframes, whereas the presented DL-based interpretation workflow can automate identified keyframes and performed credibly in both ED and ES phases for most cardiac structure parameter measurements. However, the measurement deviation of the ventricular parameters in ED was smaller than that in ES and opposite in the left atrium. A possible reason was that the ventricles in ED and atriums in ES had their largest volume, where the endocardium was the clearest and less disturbed by the thickened ventricular wall and the adjacent mitral or tricuspid leaflet. This was also confirmed in the study by Howard et al., who found that the accuracy of the automated measurement of left ventricular diameter in ED was better than that in ES (28).

For echocardiography, variation in image quality is a major challenge. On the question of accurate measurement in poor-quality images, this study found that the algorithm achieved comparable metrical accuracy to that in good-quality images for 88.9% (16/18) of cardiac structure parameters. The accuracy of LVID-l decreased slightly in poor image quality with an absolute relative deviation of 4.3%, which is still comparable to expert consensus. However, for the measurement of RV middle-t, the absolute relative deviation increased to 13.1% in poor images. Therefore, it can be assumed that physicians should carefully confirm the suitability of the automated measurements of RV middle-t in poor images.

Conclusions

The presented DL-based automated interpretation workflow demonstrated the potential of automated assessment in the clinical workflow for many echocardiographic parameters and most measurements were inter-changeable with human experts’ assessment. It has shown human-like patterns of measurements, as the same trend of difference can be observed between DL and human readers. These results emphasize that it increases the efficiency of the examination, improves the agreement between experts and beginners, and increases access to expert measurements and interpretation in primary medical institutions and emergency agencies.

Limitations of the study

This was a retrospective, single-center study, which may be subject to selection bias and may not represent the broader population. The relatively small sample size could lead to overfitting of the models. Further validation in multi-center, prospective trials is needed to enhance the robustness and generalizability of the algorithms.

This study tried to propose a preliminary model for multiparameter automated measurement of echocardiography. To evaluate the accuracy of automated measurement, only sinus rhythm cases and young people with normal LVEF were included. In future studies, patients with different arrhythmias and other diseases should be included to verify the applicability further.

In this study, the DL interpretation realized intelligent tracking measurement in the cardiac cycle and multiple cycles of 2D echocardiography video. However, this paper only evaluated its measurement accuracy in the keyframes and did not evaluate its tracking measurement results in the middle frames.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the PRIME reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1852/rc

Funding: This research was funded by the National Natural Science Foundation of China (No. 82102041, No. 62471313); a Project of Innovation of the Science and Technology Commission of Shenzhen City (No. JCYJ20210324113804013); a Project of International Cooperative Research of the Science and Technology Commission of Shenzhen City (Nos. GJHZ20210705142205017 and GJHZ2021070514220601); Shenzhen People’s Hospital Clinician Scientist Training Plan (No. SYWGSCGZH202302); and Natural Science Foundation of Guangdong Province (No. 2024A1515030143).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1852/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Board of Shenzhen People’s Hospital (No. LL-KY-2023103-01) and the requirement for individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Lancellotti P, Pibarot P, Chambers J, Edvardsen T, Delgado V, Dulgheru R, et al. Recommendations for the imaging assessment of prosthetic heart valves: a report from the European Association of Cardiovascular Imaging endorsed by the Chinese Society of Echocardiography, the Inter-American Society of Echocardiography, and the Brazilian Department of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging 2016;17:589-90. [Crossref] [PubMed]
Ommen SR, Mital S, Burke MA, Day SM, Deswal A, Elliott P, Evanovich LL, Hung J, Joglar JA, Kantor P, Kimmelstiel C, Kittleson M, Link MS, Maron MS, Martinez MW, Miyake CY, Schaff HV, Semsarian C, Sorajja P. 2020 AHA/ACC Guideline for the Diagnosis and Treatment of Patients With Hypertrophic Cardiomyopathy: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation 2020;142:e558-631. [Crossref] [PubMed]
Elliott PM, Anastasakis A, Borger MA, Borggrefe M, Cecchi F, et al. 2014 ESC Guidelines on diagnosis and management of hypertrophic cardiomyopathy: the Task Force for the Diagnosis and Management of Hypertrophic Cardiomyopathy of the European Society of Cardiology (ESC). Eur Heart J 2014;35:2733-79. [Crossref] [PubMed]
Lang RM, Badano LP, Mor-Avi V, Afilalo J, Armstrong A, Ernande L, Flachskampf FA, Foster E, Goldstein SA, Kuznetsova T, Lancellotti P, Muraru D, Picard MH, Rietzschel ER, Rudski L, Spencer KT, Tsang W, Voigt JU. Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. J Am Soc Echocardiogr 2015;28:1-39.e14. [Crossref] [PubMed]
Schuuring MJ, Išgum I, Cosyns B, Chamuleau SAJ, Bouma BJ. Routine Echocardiography and Artificial Intelligence Solutions. Front Cardiovasc Med 2021;8:648877. [Crossref] [PubMed]
Hatazawa K, Tanaka H, Nonaka A, Takada H, Soga F, Hatani Y, Matsuzoe H, Shimoura H, Ooka J, Sano H, Mochizuki Y, Matsumoto K, Hirata KI. Baseline Global Longitudinal Strain as a Predictor of Left Ventricular Dysfunction and Hospitalization for Heart Failure of Patients With Malignant Lymphoma After Anthracycline Therapy. Circ J 2018;82:2566-74. [Crossref] [PubMed]
Spahillari A, McCormick I, Yang JX, Quinn GR, Manning WJ. On-call transthoracic echocardiographic interpretation by first year cardiology fellows: comparison with attending cardiologists. BMC Med Educ 2019;19:213. [Crossref] [PubMed]
Schneider M, Ran H, Aschauer S, Binder C, Mascherbauer J, Lang I, Hengstenberg C, Goliasch G, Binder T. Visual assessment of right ventricular function by echocardiography: how good are we? Int J Cardiovasc Imaging 2019;35:2001-8. [Crossref] [PubMed]
Quiñones MA, Otto CM, Stoddard M, Waggoner A, Zoghbi WADoppler Quantification Task Force of the Nomenclature and Standards Committee of the American Society of Echocardiography. Recommendations for quantification of Doppler echocardiography: a report from the Doppler Quantification Task Force of the Nomenclature and Standards Committee of the American Society of Echocardiography. J Am Soc Echocardiogr 2002;15:167-84. [Crossref] [PubMed]
Schneider M, Bartko P, Geller W, Dannenberg V, König A, Binder C, Goliasch G, Hengstenberg C, Binder T. A machine learning algorithm supports ultrasound-naïve novices in the acquisition of diagnostic echocardiography loops and provides accurate estimation of LVEF. Int J Cardiovasc Imaging 2021;37:577-86. [Crossref] [PubMed]
Khamis H, Zurakhov G, Azar V, Raz A, Friedman Z, Adam D. Automatic apical view classification of echocardiograms using a discriminative learning dictionary. Med Image Anal 2017;36:15-21. [Crossref] [PubMed]
Madani A, Arnaout R, Mofrad M, Arnaout R. Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit Med 2018;1:6. [Crossref] [PubMed]
Yang F, Chen X, Lin X, Chen X, Wang W, Liu B, et al. Automated Analysis of Doppler Echocardiographic Videos as a Screening Tool for Valvular Heart Diseases. JACC Cardiovasc Imaging 2022;15:551-63. [Crossref] [PubMed]
Tromp J, Seekings PJ, Hung CL, Iversen MB, Frost MJ, Ouwerkerk W, Jiang Z, Eisenhaber F, Goh RSM, Zhao H, Huang W, Ling LH, Sim D, Cozzone P, Richards AM, Lee HK, Solomon SD, Lam CSP, Ezekowitz JA. Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study. Lancet Digit Health 2022;4:e46-54. [Crossref] [PubMed]
Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink-Nelson L, Lassen MH, Fan E, Aras MA, Jordan C, Fleischmann KE, Melisko M, Qasim A, Shah SJ, Bajcsy R, Deo RC. Fully Automated Echocardiogram Interpretation in Clinical Practice. Circulation 2018;138:1623-35. [Crossref] [PubMed]
Asch FM, Poilvert N, Abraham T, Jankowski M, Cleve J, Adams M, Romano N, Hong H, Mor-Avi V, Martin RP, Lang RM. Automated Echocardiographic Quantification of Left Ventricular Ejection Fraction Without Volume Measurements Using a Machine Learning Algorithm Mimicking a Human Expert. Circ Cardiovasc Imaging 2019;12:e009303. [Crossref] [PubMed]
Knackstedt C, Bekkers SC, Schummers G, Schreckenberg M, Muraru D, Badano LP, Franke A, Bavishi C, Omar AM, Sengupta PP. Fully Automated Versus Standard Tracking of Left Ventricular Ejection Fraction and Longitudinal Strain: The FAST-EFs Multicenter Study. J Am Coll Cardiol 2015;66:1456-66. [Crossref] [PubMed]
Salem Omar AM, Shameer K, Narula S, Abdel Rahman MA, Rifaie O, Narula J, Dudley JT, Sengupta PP. Artificial Intelligence-Based Assessment of Left Ventricular Filling Pressures From 2-Dimensional Cardiac Ultrasound Images. JACC Cardiovasc Imaging 2018;11:509-10. [Crossref] [PubMed]
Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography. J Am Coll Cardiol 2016;68:2287-95. [Crossref] [PubMed]
Sengupta PP, Huang YM, Bansal M, Ashrafi A, Fisher M, Shameer K, Gall W, Dudley JT. Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy. Circ Cardiovasc Imaging 2016;9:e004330. [Crossref] [PubMed]
Rudski LG, Lai WW, Afilalo J, Hua L, Handschumacher MD, Chandrasekaran K, Solomon SD, Louie EK, Schiller NB. Guidelines for the echocardiographic assessment of the right heart in adults: a report from the American Society of Echocardiography endorsed by the European Association of Echocardiography, a registered branch of the European Society of Cardiology, and the Canadian Society of Echocardiography. J Am Soc Echocardiogr 2010;23:685-713; quiz 786-8. [Crossref] [PubMed]
Shad R, Quach N, Fong R, Kasinpila P, Bowles C, Castro M, Guha A, Suarez EE, Jovinge S, Lee S, Boeve T, Amsallem M, Tang X, Haddad F, Shudo Y, Woo YJ, Teuteberg J, Cunningham JP, Langlotz CP, Hiesinger W. Predicting post-operative right ventricular failure using video-based deep learning. Nat Commun 2021;12:5192. [Crossref] [PubMed]
Sengupta PP, Shrestha S, Berthon B, Messas E, Donal E, Tison GH, et al. Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC Cardiovasc Imaging 2020;13:2017-35. [Crossref] [PubMed]
Aurigemma GP, Gottdiener JS, Arnold AM, Chinali M, Hill JC, Kitzman D. Left atrial volume and geometry in healthy aging: the Cardiovascular Health Study. Circ Cardiovasc Imaging 2009;2:282-9. [Crossref] [PubMed]
Nagueh SF, Smiseth OA, Appleton CP, Byrd BF 3rd, Dokainish H, Edvardsen T, Flachskampf FA, Gillebert TC, Klein AL, Lancellotti P, Marino P, Oh JK, Alexandru Popescu B, Waggoner AD. Houston, Texas; Oslo, Norway; Phoenix, Arizona; Nashville, Tennessee; Hamilton, Ontario, Canada; Uppsala, Sweden; Ghent and Liège, Belgium; Cleveland, Ohio; Novara, Italy; Rochester, Minnesota; Bucharest, Romania; and St. Recommendations for the Evaluation of Left Ventricular Diastolic Function by Echocardiography: An Update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging 2016;17:1321-60. [Crossref] [PubMed]
Galderisi M, Cosyns B, Edvardsen T, Cardim N, Delgado V, Di Salvo G, et al. Standardization of adult transthoracic echocardiography reporting in agreement with recent chamber quantification, diastolic function, and heart valve disease recommendations: an expert consensus document of the European Association of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging 2017;18:1301-10. [Crossref] [PubMed]
Duffy G, Cheng PP, Yuan N, He B, Kwan AC, Shun-Shin MJ, Alexander KM, Ebinger J, Lungren MP, Rader F, Liang DH, Schnittger I, Ashley EA, Zou JY, Patel J, Witteles R, Cheng S, Ouyang D. High-Throughput Precision Phenotyping of Left Ventricular Hypertrophy With Cardiovascular Deep Learning. JAMA Cardiol 2022;7:386-95. [Crossref] [PubMed]
Howard JP, Stowell CC, Cole GD, Ananthan K, Demetrescu CD, Pearce K, Rajani R, Sehmi J, Vimalesvaran K, Kanaganayagam GS, McPhail E, Ghosh AK, Chambers JB, Singh AP, Zolgharni M, Rana B, Francis DP, Shun-Shin MJ. Automated Left Ventricular Dimension Assessment Using Artificial Intelligence Developed and Validated by a UK-Wide Collaborative. Circ Cardiovasc Imaging 2021;14:e011951. [Crossref] [PubMed]
Tromp J, Bauer D, Claggett BL, Frost M, Iversen MB, Prasad N, Petrie MC, Larson MG, Ezekowitz JA, Solomon SD. A formal validation of a deep learning-based automated workflow for the interpretation of the echocardiogram. Nat Commun 2022;13:6776. [Crossref] [PubMed]
Gosling AF, Thalappillil R, Ortoleva J, Datta P, Cobey FC. Automated Spectral Doppler Profile Tracing. J Cardiothorac Vasc Anesth 2020;34:72-6. [Crossref] [PubMed]

Cite this article as: Peng G, Ling R, Liu X, Liu Q, Zhong X, Sheng Y, Zheng Y, Luo S, Yang Y, Lin X, Tang K, Zheng J, Chen L, Ni D, Xu J, Liu Y, Xue W. Formal validation of a deep learning-based automated interpretation system for cardiac structure and function in adult echocardiography. Quant Imaging Med Surg 2025;15(4):3093-3110. doi: 10.21037/qims-24-1852

Formal validation of a deep learning-based automated interpretation system for cardiac structure and function in adult echocardiography

Introduction

Methods

Study participants and image acquisition

Data annotation

The deep learning (DL) workflow

The Auto-Echo

The Auto-Doppler

Statistical analysis

Results

Performance of Auto-Echo

Performance of keyframe identification

Cardiac structure assessments of Auto-Echo in the internal validation dataset

Performance of Auto-Echo versus doctors with different experiences in the internal validation dataset

Subgroup analysis of Auto-Echo performance in clinically relevant frames and different image qualities in the internal validation dataset

Cardiac structure assessments of Auto-Echo in the external validation dataset

Cardiac function assessments of the Auto-Doppler in the internal validation dataset

Performance of Auto-Doppler versus that of doctors with different experiences in the internal validation dataset

Comparison of measurement time in the internal validation dataset

Cardiac function assessments of the Auto-Doppler in the external validation dataset

Discussion

AI-based fully automatic cardiac structure with experts-level performance

AI-based fully automatic cardiac function measurement with expert-level or moderately experienced doctor-level performance

The human-like pattern of AI-based fully automatic prediction deviations

The effects of phase and image quality on the performance of cardiac structure estimation

Conclusions

Limitations of the study

Acknowledgments

Footnote

References

Article Options

Download Citation

Share