Automated and accurate view classification of fetal echocardiography using convolutional neural networks
Original Article

Automated and accurate view classification of fetal echocardiography using convolutional neural networks

Jiawei Shi1,2,3#, Ying Bai1,2,3#, Quanfei Hou1,2,3#, Shukun He1,2,3, Liu Hong1,2,3, Li Cui1,2,3, Yi Zhang1,2,3, Tianshu Liu1,2,3, Wenhui Deng1,2,3, Juanjuan Liu1,2,3, Jing Ma1,2,3, Sushan Xiao1,2,3, Zhen Wang1,2,3, Yali Yang1,2,3, Li Zhang1,2,3, Haiyan Cao1,2,3, Mingxing Xie1,2,3, Jing Wang1,2,3

1Department of Ultrasound Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China; 2Clinical Research Center for Medical Imaging in Hubei Province, Wuhan, China; 3Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China

Contributions: (I) Conception and design: J Shi, H Cao, M Xie, J Wang; (II) Administrative support: H Cao, M Xie, J Wang; (III) Provision of study materials or patients: H Cao, M Xie, J Wang, L Hong, L Cui, Y Zhang; (IV) Collection and assembly of data: J Shi, Y Bai, Q Hou, S He, L Hong, L Cui, Y Zhang, T Liu, W Deng, J Liu, J Ma, S Xiao, Z Wang, Y Yang, L Zhang; (V) Data analysis and interpretation: J Shi, Y Bai, Q Hou, S He, H Cao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Jing Wang, MD, PhD; Mingxing Xie, MD, PhD; Haiyan Cao, MD, PhD. Department of Ultrasound Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1277 Jiefang Avenue, Wuhan 430022, China; Clinical Research Center for Medical Imaging in Hubei Province, Wuhan, China; Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China. Email: jingwang2004@hust.edu.cn; xiemx@hust.edu.cn; haiyancao@hust.edu.cn.

Background: Accurate view classification is imperative for the prenatal precision diagnosis of congenital heart disease (CHD) but challenges remain, including suboptimal imaging quality due to noise and artifacts, along with a high reliance on extensive operator or diagnostician expertise, which complicates this process. We aimed to analyze several convolutional neural network (CNN)-based frameworks for the automated classification of eight standard fetal echocardiographic views.

Methods: The study utilized fetal echocardiographic data collected from 953 second- and third-trimester fetuses between July 2018 and October 2019. Among them, 10,032 images were randomly selected for training and validation, while 2,071 images were used to assess the framework’s accuracy. Various advanced CNNs were trained to identify the recommended set of eight fetal echocardiographic views. Model performance was compared with the ground truth, followed by an in-depth analysis of results obtained from the optimal model. Gradient-weighted class activation mapping (Grad-CAM) was employed to visualize model decision-making.

Results: The ConvNeXt-based model achieved the highest performance with an overall test accuracy of 97.78% and an F1-score of 96.08% in classifying the eight preselected views. The accuracy of most views exceeded 95%. Although image quality impacted the model’s classification performance (P<0.001), this effect remained within acceptable limits for clinical applicability. Data visualization further elucidated that the model identified similarities across different views and utilized clinically relevant features for classification.

Conclusions: CNNs have emerged as a promising tool for facilitating accurate and efficient view classification of fetal echocardiograms. This study paves the way for the future utilization of artificial intelligence (AI) to automate fetal heart screening and potentially support the diagnosis of CHD.

Keywords: Artificial intelligence (AI); convolutional neural network (CNN); echocardiography; fetal heart; view classification


Submitted Mar 05, 2025. Accepted for publication Oct 10, 2025. Published online Nov 13, 2025.

doi: 10.21037/qims-2025-556


Introduction

Congenital heart disease (CHD) is the most prevalent type of birth defect associated with high perinatal and infant mortality (1,2). Precise prenatal diagnosis plays a crucial role in improving neonatal outcomes as it enables in-utero monitoring, effective intrauterine management, and timely referral to specialized facilities capable of providing appropriate perinatal and neonatal care (3-5). Fetal echocardiography is widely regarded as the primary tool for screening and diagnosing CHD, due to its noninvasive nature, accuracy, and lack of radiation exposure (6). However despite its advantages, population-based studies consistently report low detection rates of often as low as 30%, as well as suboptimal specificity, ranging from 40% to 50% (1). These challenges primarily stem from insufficient expertise in interpretation, as well as suboptimal image acquisition due to factors such as poor acoustic windows, small fetal heart structures, rapid heart rate, and uncontrollable fetal movement (7,8).

Artificial intelligence (AI) techniques, particularly those based on deep learning (DL) (9) techniques with convolutional neural networks (CNNs), have demonstrated remarkable capabilities in “learning” specific image features or patterns for segmentation of structures, assessment of biometric measurements, and detection of disease in fetal imaging (10-14). View classification serves as a fundamental prerequisite for computer-assisted CHD diagnostic systems, which can enhance workflow efficiency and adaptability of quantitative tools. In recent years, CNNs have been successfully applied to recognize standard fetal ultrasound views, yielding superior performance. For instance, Chen et al. (15) proposed a DL method for classifying fetal abdomen view, four-chamber view (4CV), and face view, achieving an average accuracy of 87% on a test set of 13,247 images. Burgos-Artizzu et al. (16) evaluated various CNNs for automatic classification of common fetal ultrasound planes, demonstrating comparable performance to human experts. Krishna et al. (17) developed a DL-based automated system for detecting common fetal ultrasound planes using deep feature fusion, achieving an accuracy of 95.1% in classifying the abdomen, brain, femur, thorax, cervix, and other planes.

Although prior research on the classification of fetal echocardiography exists, including studies on automated detection and localization of fetal standard scan planes, such as the 4CV, three-vessel view (3VV), left ventricular outflow tract (LVOT), and right ventricular outflow tract (RVOT) views, challenges remain (12,18-21). For instance, although SonoNet proposed by Baumgartner et al. (18) achieved an accuracy of 90.09% in retrospective frame retrieval, its precision for heart views was less than 82%. Similarly, Wu et al. (19) classified views based on anatomical structures but faced difficulties due to the need for extensive manual annotation of the structures. Arnaout et al. (12) developed an end-to-end CNN view classifier for the 3VV, 4CV, LVOT, and three-vessel-and-trachea (3VT) view, achieving an F1 score of 0.93; however, it struggled to distinguish between the 3VV and 3VT view.

To address these challenges, this study aims to collect a large real-world dataset of common two-dimensional fetal echocardiographic views and to train a CNN model capable of accurately identifying these views. To the best of our knowledge, this is the first endeavor to cover the most commonly used views for screening for fetal CHD, providing a closer approximation to routine clinical conditions. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-556/rc).


Methods

Subjects

This study aimed to develop an AI model using supervised learning to analyze a previously collected and de-identified fetal echocardiographic image dataset from a single center. The dataset was collected at the Department of Ultrasound Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. A total of 12,103 images from 953 pregnant women were included, obtained during routine clinical practice between July 2018 and October 2019. Pregnant women undergoing second- and third-trimester pregnancy screening were included, while cases of multiple pregnancies, severe CHDs, or aneuploidies were excluded. The selection of these criteria was driven by a concerted effort to achieve a balance between generalizability and the risk of degrading model performance by including rare conditions with distinct anatomical features. Additionally, images containing annotations and ultrasound measurement indicators were excluded. Following the latest guidelines of the American Society of Echocardiography and the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) (6,22), we retrieved images of the eight fetal cardiac views required for advanced fetal cardiac assessment from our institutional echocardiography database. These views included the 3VT view, 3VV view, 4CV, LVOT view, RVOT view, long axis of aortic arch (AArch) view and ductal arch (DArch) view, and short axis of the great vessels (SAGV) (Figure S1). Due to fetal movement, images were manually selected from echocardiography videos, which often contained non-standard scan planes. The Voluson E8/E10 ultrasound machine (GE Healthcare, Chicago, IL, USA) equipped with either an RM6C (4–8 MHz) or a C 2-9-D (2–9 MHz) probe was used for image acquisition. Three experienced sonographers participated in the data annotation process. To minimize human bias, all images were independently annotated and verified by two attending physicians (≥5 years’ experience), with final validation by a chief physician (≥15 years’ experience). All data was tailored to anonymize patient information.

This study was approved by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology (No. 2021-1000-03). Written informed consent was obtained from all enrolled subjects. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Image preprocessing

To ensure that our algorithm learns from the images themselves rather than the manual annotations placed by the sonographers, we took steps to remove machine and patient information from the images. The acquired images had a size of 1,136×852 pixels. We applied padding and cropping to obtain a region of interest measuring 512×512 pixels, which captured most of the field of view while excluding the vendor logo and individual information. Subsequently, we divided all images of each view into three sets: a training set comprising 60% of the images, a validation set comprising 20%, and a test set comprising the remaining 20%. This split was performed at the case level to ensure that no video frames from the test videos were used for training.

To mitigate the risk of overfitting, we employed several data augmentation techniques in the training set, including horizontal flipping, random rotation, color jitter, and random augmentation. Furthermore, we resized the images to the required dimensions of 224×224 pixels. Lastly, we normalized each image by subtracting the mean intensity value and dividing it by the standard deviation (SD) of the image pixels.

Image quality is a very important factor for the automated analysis. According to image quality including the existence of key anatomical structures, gain, zoom, artifacts, and motion blur, the 4CV views of the test set were divided into three groups: bad, median, and good images. The detailed evaluation criteria can be seen in Table S1.

Model architecture

This study benchmarked existing classification techniques on a new dataset. We evaluated a wide range of state-of-the-art CNNs for classification, covering diverse architectures, depths, total parameter counts, and processing methods [e.g., ResNet (23), DenseNet (24)]. While preserving the original network architecture, all networks were pre-trained on the ImageNet Large Scale Visual Recognition Challenge provided by OpenMMLab, followed by fully retraining with our training data. Classification decisions derive from softmax probabilities, assigning each image to the class with maximum probability.

Training details and evaluation metrics

We employed several optimization techniques to train the network. Firstly, we utilized label smoothing cross-entropy loss as the loss function and employed the stochastic gradient descent (SGD) optimizer, operating with a batch size of 64. The initial learning rate was set to 0.001, and a weight decay of 0.05 was applied. To further enhance training, we incorporated the CosineAnnealing learning rate scheduler with a warm-up start method. The model was trained for a total of 100 epochs, and the weights of the model with the highest accuracy were saved.

The experimental setup consisted of the following computer configuration: an Intel® Xeon(R) Silver 4216 CPU @ 2.10GHz served as the CPU, while an NVIDIA GeForce GTX-3090 with 24GB memory was used as the GPU. The computer operated on Ubuntu 22.04 as the operating system, and the programming language employed was Python 3.8.

For classification-related tasks, confusion matrices are calculated and visualized as heatmaps to illustrate the performance of multiview classifiers and their errors. The accuracy, recall, precision, and F1 score (the harmonic mean of precision and recall) are commonly used evaluation indicators for evaluating algorithm performance. The definitions of each of the above terms are given in Eqs. [1-4], respectively.

Accuracy=TP+TNTP+TN+FP+FN×100%

Precision=TPTP+FP

Recall=TPTP+FN

F1score=2×precision×recallprecision+recall

The area under the curve (AUC) was selected as the model evaluation metric. Defined as the area enclosed by the receiver operating characteristic (ROC) curve and the coordinate axis, AUC reflects classifier performance (higher values typically indicative of superior performance). Furthermore, to validate the view classifier’s reliance on clinically relevant features, we conducted gradient-weighted class activation mapping (Grad-CAM) on test images, highlighting the pixels or regions critical for the network’s decision-making processes.

Statistical analysis

All statistical analyses were conducted using Python version 3.9 or R version 4.3.0 (R Foundation for Statistical Computing). Continuous variables are expressed as mean ± SD when normally distributed, or median (interquartile range) [M (IQR)] otherwise. Categorical variables are presented as count (percentage) unless otherwise specified. To assess the performance of the model, various metrics, including accuracy, precision, recall, and F1 score, were employed and presented as percentages. For evaluating the model’s performance on 4CV view images with different qualities, the Kruskal-Wallis test was employed. Additionally, the Mann-Whitney U test was utilized to analyze the model’s prediction probabilities for the test set. Bootstrap resampling with 1,000 iterations was performed to estimate the 95% confidence intervals for prediction errors. All P values reported in this study were two-sided, and a significance level of P<0.05 was considered statistically significant.


Results

Fetal echocardiography dataset

Table 1 provides a concise overview of the characteristics of the final dataset. We recruited 953 participants, some of whom underwent longitudinal follow-ups, with a total of 1,023 distinct echocardiography studies conducted. The dataset comprises 12,103 images covering eight anatomical planes essential for CHD diagnosis. As the images were retrospectively collected from a historical real examination database, recommended views were inconsistently present across fetal surveys and more commonly observed in abnormal studies. The 4CV and LVOT views are the most common planes employed in clinical practice; however, in order to balance the number of images of all classes, a subset of pregnancies was collected for the purpose of this study. Other classes exhibit high variability and possess divergent probabilities of occurrence (Table 2).

Table 1

Patient characteristics

Demographics Statistics (N=953)
Age (years) 29±4
Body surface area (m2) 1.63±0.14
Gestational age (weeks) 26±2
Main echocardiographic diagnosis
   Ventricular septal defect 85 (8.92)
   Ostium primum atrial septal defect 2 (0.21)
   Coarctation of the aorta or interrupted AArch 26 (2.72)
   Pulmonary stenosis 32 (3.36)
   Aortic stenosis 18 (1.89)
   Persistent left superior vena cava 45 (4.72)
   Right AArch 28 (2.94)
   Aberrant right subclavian artery 10 (1.05)
   Ebstein’s anomaly 3 (0.31)
   Pericardial effusion 13 (1.36)
   Normal 691 (72.51)

Data are presented as mean ± SD or n (%). AArch, aortic arch; SD, standard deviation.

Table 2

Fetal echocardiography dataset statistics

Echocardiographic view Total Train Val Test
4CV 2,546/265 1,501/159 531/53 514/53
3VV 1,207/199 763/120 218/39 226/40
3VT view 1,204/230 812/137 204/47 188/46
LVOT view 2,651/303 1,771/182 478/61 402/60
SAGV 1,230/108 832/66 219/21 179/21
AArch view 2,225/334 1,399/200 426/67 400/67
DArch view 464/49 320/29 92/10 52/10
RVOT view 576/157 362/95 104/31 110/31

Data are presented as number of images/number of patients. 3VT, three-vessel trachea; 3VV, three-vessel view; 4CV, four-chamber view; AArch, aortic arch; DArch, ductal arch; LVOT, left ventricular outflow tract; RVOT, right ventricular outflow tract; SAGV, short axis of the great vessels.

Classification results

We trained a series of CNN-based models to recognize eight fetal echocardiographic views using a training and validation dataset consisting of 10,032 images. The primary outcomes of each method on the test set are outlined in Table 3. When using ResNet-50 as the baseline, the performance of Se-ResNet50 (25), Se-ResNeXt50 (26), DenseNet-161, and ConvNeXt-base (27) showed improvement, achieving an overall accuracy of over 95%. Regarding network depth, deeper nets demonstrated similar performance on the test set. The best performing model was ConvNeXt-base, which achieved an average accuracy of 97.78% and a precision of 95.31%. Overall, the variation in performance was minimal, with a difference of only 4.49% between the lowest and highest top 1 accuracy.

Table 3

Results of the wide variety of classification CNN tested for fetal echocardiography common planes recognition

Model Params (M) Top 1 Acc (%) Top 3 Acc (%) Precision (%) Recall (%) F1-score (%) Inference time (FPS)
ResNet-34 21.34 93.29 99.47 89.34 91.57 90.24 74
ResNet-50 23.55 93.96 99.52 90.12 91.46 90.60 51
ResNet-101 42.55 94.45 99.71 90.92 93.35 91.93 32
Se-ResNet50 26.09 95.32 99.90 91.67 93.65 92.51 40
ResNeXt-50 23.00 93.82 98.99 91.31 90.54 90.54 52
ResNeXt-101 42.15 94.21 99.42 91.41 91.84 91.29 34
Se-ResNeXt50 25.53 95.51 99.81 92.41 93.62 92.92 41
DenseNet-161 26.49 95.41 99.76 92.35 93.02 92.48 23
DenseNet-169 12.5 94.64 99.76 91.76 92.01 91.70 22
ConvNeXt-base 87.57 97.78 100 95.31 97.12 96.08 38

, quantitative comparisons of view classification accuracy, model size, and inference time among different CNN models were conducted on the test set. , the best performance. Higher values of top 1 Acc, top 3 Acc, precision, recall, F1-score, and FPS indicate superior model performance. Acc, accuracy; CNN, convolutional neural network; FPS, frames per second.

The view-specific classification performance of ConvNeXt-base is visualized in the confusion matrix (Figure 1A). The views achieving >98.0% accuracy were: 4CV, LVOT, AArch, and DArch. The SAGV exhibited the lowest accuracy (94.41%). Variations in zoom, depth, focus, gain, sector width, and image quality were observed among the images for each view. Clustering of the raw pixels and top-layer features using t-distributed stochastic neighbor embedding (t-SNE) revealed distinct separation of the different classes after training. However, distinguishing between 3VT and RVOT, as well as AArch and DArch, posed a challenge (Figure 1B).

Figure 1 ConvNeXt-base model successfully discriminate fetal echocardiographic views. (A) The confusion matrix illustrates correct and incorrect view classifications in the test dataset. Diagonal values represent accurately classified samples, while off-diagonal entries represent misclassifications. (B) The t-SNE algorithm is employed to facilitate the visualization of high-dimensional data in lower dimensions. Here, it depicts the successful grouping of test images corresponding to eight different fetal heart views, and distinct the clustering of images of 3VT and RVOT views. 3VT, three-vessel trachea; 3VV, three-vessel view; 4CV, four-chamber view; AArch, aortic arch; DArch, ductal arch; LVOT, left ventricular outflow tract; RVOT, right ventricular outflow tract; SAGV, short axis of the great vessels; t-SNE, t-distributed stochastic neighbor embedding.

The F1 score for view classification was 0.96 (AUC range, 0.997–1.00; Figure 2A). For the 2.22% of misclassified test images, the model’s second-highest probability prediction corresponded to the correct view in 80.4% of cases (Figure 2B). The mean probability of correct predictions significantly exceeded that of incorrect predictions (median probability: 0.93 vs. 0.80; Mann-Whitney U test, P<0.001; Figure 2C). Furthermore, image quality significantly impacted model performance in 4CV view analysis (Kruskal-Wallis test, P<0.001). However, the median prediction probabilities across different image quality grades showed minimal clinical variation: grade bad (median: 0.933; IQR, 0.932–0.934), grade median (median: 0.934; IQR, 0.933–0.935), and grade good (median: 0.934; IQR, 0.934–0.935) (Figure 2D). The Grad-CAM experiments showed that the network can focus on the clinically relevant image features and anatomical structures like those identified by cardiac sonographers (Figure 3).

Figure 2 View classification performance of the ConvNeXt-base model. (A) Receiver operating characteristic curves exhibited high consistency across view categories, with AUCs ranging from 0.994 to 1.00 (mean 0.996). (B) Classification accuracy by view category: top-prediction (white boxes) vs. top-two-prediction (blue boxes) performance. (C) Prediction probability distributions stratified by correctness, demonstrating significantly higher probabilities for correct vs. incorrect classifications (Mann-Whitney U test, P<5×10−22). (D) Box plots showing prediction probabilities for 4CV view in this test set, by image quality. 3VT, three-vessel trachea; 3VV, three-vessel view; 4CV, four-chamber view; AArch, aortic arch; AUC, area under the curve; DArch, ductal arch; LVOT, left ventricular outflow tract; RVOT, right ventricular outflow tract; SAGV, short axis of the great vessels.
Figure 3 Visualization of decision-making by ConvNeXt-base model. (A,C) Representative test images are displayed for each anatomical view. (B,D) Grad-CAM for the example images. 3VT, three-vessel trachea; 3VV, three-vessel view; 4CV, four-chamber view; AArch, aortic arch; DArch, ductal arch; Grad-CAM, gradient-weighted class activation mapping; LVOT, left ventricular outflow tract; RVOT, right ventricular outflow tract; SAGV, short axis of the great vessels.

Discussion

Principal findings

In this study, we collected a large dataset of 12,103 labeled fetal echocardiograms from a real clinical scenario and evaluated current advanced CNNs for the fine-grained classification of fetal echocardiograms. Our study demonstrates the applicability of current CNN-based automated learning models for fetal echocardiography recognition and indicates that the ConvNeXt-base model is better in terms of accuracy metrics, achieving an accuracy of 97.7% when evaluated on a test set comprising 2,071 images, surpassing previously reported results (12,18). By analyzing the confusion matrix, ROC curves, and misclassified images, it was indicated that model errors aligned with inherent clinical diagnostic uncertainties. Notably, views with a higher number of training samples, such as 4CV, LVOT, and AArch, exhibited higher accuracy. Our t-SNE dimensionality reduction visualization indicated that the network slightly confused the 3VT and RVOT views. However, the accuracy of classifying these two views was >95%. We observed a considerably elevated mean probability of accurate predictions in comparison to erroneous predictions, thereby indicating that our model was designed to discriminate across all views. After analyzing the prediction probability of 4CV views with different image qualitied, we found that bad images also had a high prediction probability. This means that the DL method based on a large real dataset may reduce uncertainties caused by reliance on image quality.

Fetal echocardiographic view classification

Fetal echocardiography is vital to prenatal screening and diagnosis of CHD, the need for accurate, automated analysis of fetal echocardiography has never been stronger (28). View classification is the key step in interpreting cardiovascular imaging. Previous works have attempted to use DL to assist with ultrasound view classification and achieved success in adult and pediatric echocardiograms (29-32). Zhang et al. (33) developed a CNN model that identified 23 standard views with 84% overall accuracy. Madani et al. (30) constructed a CNN model for identifying 15 standard views, achieving 91.7% image-level accuracy, surpassing the performance of board-certified echocardiographers who achieved 79.4% accuracy. Østvik et al. (31) successfully classified seven different views using CNNs, achieving accuracies of 98.3% and 98.9% on single frames and sequences, respectively. Similar studies were reported in pediatric echocardiography. Wu et al. (32) developed a CNN-based classification model for diagnosing congenital heart defects in children, achieving a precision of 86.5% in identifying 23 echocardiographic views through knowledge distillation. Gearhart et al. (29) trained a CNN model for the automated identification of 27 standard pediatric echocardiographic views, including anatomic sweeps, color Doppler, and Doppler tracings, achieving an accuracy of 90.3%.

Compared to adult echocardiograms, fetal echocardiographic views present unique challenges due to smaller and more variable heart structures, lower image quality, and higher sampling frequency. These factors contribute to the complexity of recognizing fetal echocardiographic views. Although studies about view classification on fetal echocardiographic images exist, they primarily focused on detecting congenital heart defects (34), explored a limited range of gestational ages (35), or classified only a subset of standard views (three or four views) (15,18,20).

Clinical implications

In our study, we collected a comprehensive dataset of eight two-dimensional standard sectional images of fetal echocardiography, following the guidelines set forth by the ISUOG (6). This approach ensures that our study is more generalizable and applicable across a wide range of gestational ages and varying image qualities commonly encountered during routine prenatal examinations. The trained models would probably benefit from a broader representation domain. This work lays the foundation for automated quantitative analysis and diagnosis support to promote efficient, accurate, and scalable analysis of fetal echocardiograms. Furthermore, a potential clinical application of our model is its ability to provide real-time quality control feedback during image acquisition.

DL models often serve as a “black box” model, and their interpretability has been a challenge (36). To address this, we used the Grad-CAM technique to create heatmap images that highlight the important regions in the model’s decision-making process. Our results showed that the model’s predictions aligned with human judgment, which increases confidence in using the model in routine clinical practice. Grad-CAM improves the interpretability of the model and enhances its clinical applicability, along with t-SNE dimensionality reduction visualization.

Research implications

Future research should explore the model’s performance with data from multiple centers and vendors, and focus on evaluating the method on the larger set of normal and complex cases. Moving forward, we intend to extend our method to classify the fetal echocardiographic videos and other imaging modalities, which can enhance the efficiency and user-friendliness of the model.

Strengths and limitations

This study has several strengths. First, this study achieves an accuracy of 97.7% using an advanced model, which surpasses previously reported results. Second, our study collected eight common fetal echocardiographic views without limitation by gestation age, achieving the true fine-grained classification of fetal echocardiographic images. Finally, a variety of visualization methods and evaluation techniques were applied, and the model exhibited better interpretability.

It is important to note some limitations of our study. Patients with complex CHDs, including single ventricle, were excluded from the study due to their rarity and the need for a larger sample size. Additionally, our analysis was conducted using images from a single center, and we did not include other imaging modalities, such as color Doppler. To facilitate external validation and improve the model’s generalizability, we are initiating a multicenter collaboration to collect data from diverse hospitals and ultrasound devices.


Conclusions

The objective of this study was to assess the feasibility of using CNNs for the automated classification of fetal echocardiography. A comprehensive analysis of various CNN models was conducted, leading to advanced results in the classification of fetal two-dimensional echocardiography views. Our findings provide evidence of the effectiveness and practical value of CNN models in identifying specific views. Moving forward, the integration of DL techniques holds promise for enhancing the accuracy, efficiency, and accessibility of fetal echocardiography, encompassing tasks such as image segmentation, biometric measurements, and disease detection.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-556/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-556/dss

Funding: This work was supported by the Fundamental Research Funds for the National Natural Science Foundation of China (Nos. 82302230, 82171961, and 82202194).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-556/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study protocol was approved by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology (No. 2021-1000-03). Written informed consent was obtained from all enrolled subjects.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Donofrio MT, Moon-Grady AJ, Hornberger LK, Copel JA, Sklansky MS, Abuhamad A, Cuneo BF, Huhta JC, Jonas RA, Krishnan A, Lacey S, Lee W, Michelfelder EC Sr, Rempel GR, Silverman NH, Spray TL, Strasburger JF, Tworetzky W, Rychik JAmerican Heart Association Adults With Congenital Heart Disease Joint Committee of the Council on Cardiovascular Disease in the Young and Council on Clinical Cardiology, Council on Cardiovascular Surgery and Anesthesia, and Council on Cardiovascular and Stroke Nursing. Diagnosis and treatment of fetal cardiac disease: a scientific statement from the American Heart Association. Circulation 2014;129:2183-242. [Crossref] [PubMed]
  2. Zhao QM, Liu F, Wu L, Ma XJ, Niu C, Huang GY. Prevalence of Congenital Heart Disease at Live Birth in China. J Pediatr 2019;204:53-8. [Crossref] [PubMed]
  3. Quartermain MD, Hill KD, Goldberg DJ, Jacobs JP, Jacobs ML, Pasquali SK, Verghese GR, Wallace AS, Ungerleider RM. Prenatal Diagnosis Influences Preoperative Status in Neonates with Congenital Heart Disease: An Analysis of the Society of Thoracic Surgeons Congenital Heart Surgery Database. Pediatr Cardiol 2019;40:489-96. [Crossref] [PubMed]
  4. Peyvandi S, De Santiago V, Chakkarapani E, Chau V, Campbell A, Poskitt KJ, Xu D, Barkovich AJ, Miller S, McQuillen P. Association of Prenatal Diagnosis of Critical Congenital Heart Disease With Postnatal Brain Development and the Risk of Brain Injury. JAMA Pediatr 2016;170:e154450. [Crossref] [PubMed]
  5. Sizarov A, Boudjemline Y. Valve Interventions in Utero: Understanding the Timing, Indications, and Approaches. Can J Cardiol 2017;33:1150-8. [Crossref] [PubMed]
  6. Moon-Grady AJ, Donofrio MT, Gelehrter S, Hornberger L, Kreeger J, Lee W, Michelfelder E, Morris SA, Peyvandi S, Pinto NM, Pruetz J, Sethi N, Simpson J, Srivastava S, Tian Z. Guidelines and Recommendations for Performance of the Fetal Echocardiogram: An Update from the American Society of Echocardiography. J Am Soc Echocardiogr 2023;36:679-723. [Crossref] [PubMed]
  7. Sun HY, Proudfoot JA, McCandless RT. Prenatal detection of critical cardiac outflow tract anomalies remains suboptimal despite revised obstetrical imaging guidelines. Congenit Heart Dis 2018;13:748-56. [Crossref] [PubMed]
  8. Sklansky M, DeVore GR. Fetal Cardiac Screening: What Are We (and Our Guidelines) Doing Wrong? J Ultrasound Med 2016;35:679-81. [Crossref] [PubMed]
  9. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44. [Crossref] [PubMed]
  10. de Siqueira VS, Borges MM, Furtado RG, Dourado CN, da Costa RM. Artificial intelligence applied to support medical decisions for the automatic analysis of echocardiogram images: A systematic review. Artif Intell Med 2021;120:102165. [Crossref] [PubMed]
  11. Chen Z, Liu Z, Du M, Wang Z. Artificial Intelligence in Obstetric Ultrasound: An Update and Future Applications. Front Med (Lausanne) 2021;8:733468. [Crossref] [PubMed]
  12. Arnaout R, Curran L, Zhao Y, Levine JC, Chinn E, Moon-Grady AJ. An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease. Nat Med 2021;27:882-91. [Crossref] [PubMed]
  13. Zamzmi G, Hsu LY, Li W, Sachdev V, Antani S. Harnessing Machine Intelligence in Automatic Echocardiogram Analysis: Current Status, Limitations, and Future Directions. IEEE Rev Biomed Eng 2021;14:181-203. [Crossref] [PubMed]
  14. Drukker L, Noble JA, Papageorghiou AT. Introduction to artificial intelligence in ultrasound imaging in obstetrics and gynecology. Ultrasound Obstet Gynecol 2020;56:498-505. [Crossref] [PubMed]
  15. Chen H, Wu L, Dou Q, Qin J, Li S, Cheng JZ, Ni D, Heng PA. Ultrasound Standard Plane Detection Using a Composite Neural Network Framework. IEEE Trans Cybern 2017;47:1576-86. [Crossref] [PubMed]
  16. Burgos-Artizzu XP, Coronado-Gutiérrez D, Valenzuela-Alcaraz B, Bonet-Carne E, Eixarch E, Crispi F, Gratacós E. Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes. Sci Rep 2020;10:10200. [Crossref] [PubMed]
  17. Krishna TB, Kokil P. Automated detection of common maternal fetal ultrasound planes using deep feature fusion. In: 2022 IEEE 19th India Council International Conference (INDICON). IEEE; 2022:1-5.
  18. Baumgartner CF, Kamnitsas K, Matthew J, Fletcher TP, Smith S, Koch LM, Kainz B, Rueckert D. SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound. IEEE Trans Med Imaging 2017;36:2204-15. [Crossref] [PubMed]
  19. Wu H, Wu B, Lai F, Liu P, Lyu G, He S, Dai J. Application of Artificial Intelligence in Anatomical Structure Recognition of Standard Section of Fetal Heart. Comput Math Methods Med 2023;2023:5650378. [Crossref] [PubMed]
  20. Sundaresan V, Bridge CP, Ioannou C, Noble JA. Automated characterization of the fetal heart in ultrasound images using fully convolutional neural networks. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE; 2017:671-4.
  21. Meng Q, Rueckert D, Kainz B. Unsupervised cross-domain image classification by distance metric guided feature alignment. In: International Workshop on Advances in Simplifying Medical Ultrasound. Cham: Springer International Publishing; 2020:146-57.
  22. Carvalho JS, Axt-Fliedner R, Chaoui R, Copel JA, Cuneo BF, Goff D, Gordin Kopylov L, Hecher K, Lee W, Moon-Grady AJ, Mousa HA, Munoz H, Paladini D, Prefumo F, Quarello E, Rychik J, Tutschek B, Wiechec M, Yagel S. ISUOG Practice Guidelines (updated): fetal cardiac screening. Ultrasound Obstet Gynecol 2023;61:788-803. [Crossref] [PubMed]
  23. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-8.
  24. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:4700-8.
  25. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:7132-41.
  26. Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1492-500.
  27. Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S. A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:11976-86.
  28. Morris SA, Lopez KN. Deep learning for detecting congenital heart disease in the fetus. Nat Med 2021;27:764-5. [Crossref] [PubMed]
  29. Gearhart A, Goto S, Deo RC, Powell AJ. An Automated View Classification Model for Pediatric Echocardiography Using Artificial Intelligence. J Am Soc Echocardiogr 2022;35:1238-46. [Crossref] [PubMed]
  30. Madani A, Arnaout R, Mofrad M, Arnaout R. Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit Med 2018;1:6. [Crossref] [PubMed]
  31. Østvik A, Smistad E, Aase SA, Haugen BO, Lovstakken L. Real-Time Standard View Classification in Transthoracic Echocardiography Using Convolutional Neural Networks. Ultrasound Med Biol 2019;45:374-84. [Crossref] [PubMed]
  32. Wu L, Dong B, Liu X, Hong W, Chen L, Gao K, Sheng Q, Yu Y, Zhao L, Zhang Y. Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation. Front Pediatr 2021;9:770182. [Crossref] [PubMed]
  33. Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink-Nelson L, Lassen MH, Fan E, Aras MA, Jordan C, Fleischmann KE, Melisko M, Qasim A, Shah SJ, Bajcsy R, Deo RC. Fully Automated Echocardiogram Interpretation in Clinical Practice. Circulation 2018;138:1623-35. [Crossref] [PubMed]
  34. Gong Y, Zhang Y, Zhu H, Lv J, Cheng Q, Zhang H, He Y, Wang S. Fetal Congenital Heart Disease Echocardiogram Screening Based on DGACNN: Adversarial One-Class Classification Combined with Video Transfer Learning. IEEE Trans Med Imaging 2020;39:1206-22. [Crossref] [PubMed]
  35. Stoean C, Stoean R, Hotoleanu M, Iliescu D, Patru C, Nagy R. An assessment of the usefulness of image pre-processing for the classification of first trimester fetal heart ultrasound using convolutional neural networks. In: 2021 25th International Conference on System Theory, Control and Computing (ICSTCC). IEEE; 2021:242-8.
  36. Pasdeloup D, Olaisen SH, Østvik A, Sabo S, Pettersen HN, Holte E, Grenne B, Stølen SB, Smistad E, Aase SA, Dalen H, Løvstakken L. Real-Time Echocardiography Guidance for Optimized Apical Standard Views. Ultrasound Med Biol 2023;49:333-46. [Crossref] [PubMed]
Cite this article as: Shi J, Bai Y, Hou Q, He S, Hong L, Cui L, Zhang Y, Liu T, Deng W, Liu J, Ma J, Xiao S, Wang Z, Yang Y, Zhang L, Cao H, Xie M, Wang J. Automated and accurate view classification of fetal echocardiography using convolutional neural networks. Quant Imaging Med Surg 2025;15(12):12582-12592. doi: 10.21037/qims-2025-556

Download Citation