Automatic deep learning method for analysis and prediction of neonatal hyperbilirubinemia in magnetic resonance imaging

Li Xu; Yitong He; Lijuan Yang; Haidong Meng; Mingmin Zhang

doi:10.21037/qims-24-1050

Original Article

Automatic deep learning method for analysis and prediction of neonatal hyperbilirubinemia in magnetic resonance imaging

Li Xu^1,2#, Yitong He^2#, Lijuan Yang³, Haidong Meng⁴, Mingmin Zhang^1,5

¹Department of Computer Science and Technology, Baotou Medical College, Inner Mongolia University of Science and Technology, Baotou, China; ²School of Information Science and Technology, Northwest University, Xi’an, China; ³Department of Radiology, Xi’an Fourth Hospital, Xi’an, China; ⁴Department of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, China; ⁵Department of Radiology, Baotou Cancer Hospital, Baotou, China

Contributions: (I) Conception and design: L Xu, Y He; (II) Administrative support: H Meng; (III) Provision of study materials or patients: L Yang; (IV) Collection and assembly of data: Y He; (V) Data analysis and interpretation: L Xu, Y He, M Zhang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Haidong Meng, PhD. Department of Information Engineering, Inner Mongolia University of Science and Technology, No. 7 Alding Street, Baotou 014010, China. Email: 102009177@btmc.edu.cn; Mingmin Zhang, MD. Department of Radiology, Baotou Cancer Hospital, No. 18 Tuanjie Street, Qingshan District, Baotou 014010, China; Department of Computer Science and Technology, Baotou Medical College, Inner Mongolia University of Science and Technology, Baotou, China. Email: 187340356@qq.com.

Background: This study conducted a comparative analysis among newborns with varying levels of hyperbilirubinemia, explored the relationships between magnetic resonance imaging (MRI) image features and serum bilirubin levels in hyperbilirubinemia, and proposed an automatic classification system based on deep learning (DL) for prediction of neonatal hyperbilirubinemia (NHB).

Methods: This retrospective study enrolled 606 consecutive neonates who had their serum bilirubin detected at the Xi’an Fourth Hospital, including 273 cases of patients and 333 cases of normal controls. After data preprocessing, MRI images were fed into the Inception-v3 network, graph convolutional network (GCN), and 3-dimensional (3D) patch-based GCN that introduced the graph attention mechanism (our GCN) for NHB analysis and classification, respectively. Multi-threshold grouping was conducted based on various serum bilirubin levels. Performance evaluation involved the area under the curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE).

Results: As the bilirubin levels gradually increased, the overall performance metrics of DL system for detecting the T1-weighted imaging signal in the pallidum region showed a significant upward trend. Our GCN for the prediction and classification of MRI image features of NHB achieved satisfactory results. When the bilirubin value exceeded 400 µmol/L, it achieved an AUC of 0.86 and ACC of 0.81, which is significantly higher than other advanced models (ACC: 72–78.3%) with the same proposed input form.

Conclusions: The DL system has the potential to automatically analyze and predict NHB on MRI.

Keywords: Neonatal hyperbilirubinemia (NHB); deep learning (DL); magnetic resonance imaging (MRI); multi-threshold analysis; classification

Submitted May 31, 2024. Accepted for publication Sep 10, 2024. Published online Nov 08, 2024.

doi: 10.21037/qims-24-1050

Introduction

Neonatal hyperbilirubinemia (NHB) is a common disease caused by elevated bilirubin levels in newborns, typically occurring within the first month after birth (1). Prolonged accumulation of excessive bilirubin in the body leads to its deposition in the brain via the blood-brain barrier, particularly in the middle and posterior portions of the globus pallidus (2,3). This deposition can result in neurological damage, giving rise to severe long-term consequences such as auditory processing disorders, visual abnormalities, abnormal positioning, and intellectual disability (4,5). However, it is important to note that early-stage neonatal bilirubin encephalopathy is reversible, and prompt treatment yields a positive prognosis by reducing the occurrence of brain injuries. Therefore, early diagnosis of NHB and timely implementation of effective intervention measures hold significant clinical importance.

Magnetic resonance imaging (MRI), as a non-radiation and non-invasive imaging technique, is extensively utilized in diagnosing neonatal cerebral diseases due to its superior soft tissue discrimination. It plays an important role in the diagnosis of NHB. NHB often affects both sides of the globus pallidus in the brain, presenting symmetrical signal increases on T1-weighted imaging (T1WI) (6). The conversion of the high signal on T1WI into a symmetrical T2-weighted high signal in the bilateral globus pallidus and subthalamic nucleus indicates a severe condition and poor prognosis (7). Therefore, the dynamic monitoring of MRI changes in NHB patients is imperative to evaluate disease progression. Ren et al. suggested that T1WI signal intensity values exceeding 1,155±63 on both sides of the globus pallidus indicate T1WI signal hyperintensity, which is considered a relatively objective metric (8,9). However, a unified standard for this judgment is currently lacking.

The detection of serum bilirubin is currently the primary clinical method for screening and diagnosing NHB, serving as the gold standard for clinical diagnosis (10). The correlation between the serum bilirubin levels and the signal intensity of the globus pallidus on T1WI is still being investigated. Wu et al. (11) found that T1WI high signals in the globus pallidus were observed when the serum bilirubin value exceeded 513.0 µmol/L. It is important to note that some infants with hyperbilirubinemia do not meet the clinical diagnosis of detection of the serum bilirubin, however, MRI examination reveals symmetrical T1WI high signal changes in the globus pallidus, a condition known as sub-clinical bilirubin brain injury (12,13). Yet, there is a lack of validation regarding the regularity of serum bilirubin levels in a dynamic state, with constant fluctuations and growth, along with the changes of corresponding MRI image features. Moreover, the value of MRI scans in predicting NHB remains unexplored as serum bilirubin levels continue to dynamically change. Furthermore, MRI relies solely on the experience of clinicians and visual interpretation, lacking a standardized quantitative index. For clinicians and radiologists, manually interpreting hundreds of MRI images of NHB constitutes a highly burdensome workload. This is particularly problematic when identifying extremely small lesions in the globus pallidus, which are frequently overlooked, leading to potential cases of missed diagnoses and misdiagnoses.

Deep learning (DL) technology has been applied widely in various domains of medical imaging in recent years (14-16). The utilization of the DL method for medical image analysis can reveal feature information that remains imperceptible to the human eye, thereby substantially enhancing diagnostic accuracy (ACC) and effectiveness (17-20). In 2021, Chen et al. (21) combined a variety of medical imaging techniques, including multi-level features of MRI to evaluate NHB. They integrated these modalities using the DenseNet model to fuse multi-modal image data. The experimental findings demonstrated that this multi-modal fusion approach significantly enhanced the classification ACC of hyperbilirubinemia, offering valuable insights for the diagnosis and treatment of this condition. Kumar et al. (22) implemented a hybrid approach by integrating machine learning (ML) and DL algorithms to predict NHB. They utilized a dataset comprising 300 neonatal biomarkers, clinical information, and disease status. The models employed included Naïve Bayes, support vector machine (SVM), random forest, and convolutional neural network (CNN). The study findings demonstrated that the DL algorithm exhibited excellent ACC and stability in predicting NHB. Wu et al. (23) employed diffusion-weighted imaging (DWI), T1WI, and T2-weighted imaging (T2WI) data as experimental samples. They utilized a CNN for the diagnosis and prediction of NHB. The study yielded significant results, surpassing traditional clinical methods by providing more accurate diagnostic outcomes. Kalbande et al. (24) introduced an automatic method utilizing DL technology for the early diagnosis of NHB. This approach employed a deep convolutional neural network (DCNN) to automatically extract features from the image of jaundice-affected eyes. The experimental findings demonstrated the efficacy of this method in diagnosing NHB and providing a quantitative evaluation of jaundice severity.

Compared to clinical diagnostic methods such as visual observation and the detection of serum bilirubin, the aforementioned methods, which combined traditional imaging with ML, can enhance the prediction ACC of NHB to some extent and have achieved automatic categorization to a certain extent. However, most relevant studies have focused on analyzing the relationship between serum bilirubin levels and MRI signals using a single threshold, without further exploration of the effects of dynamic and continuous changes in the serum bilirubin levels on T1WI signal. The analysis and demonstration of differences in MRI signals caused by varying bilirubin levels are insufficient, and few studies have evaluated the specific manifestations and trends of NHB in brain imaging. There has been a lack of investigation into the mapping relationship between continuous fluctuating bilirubin levels and the specific affected sub-regions in the brain by NHB. In addition, there is a scarcity of visualization techniques for analyzing lesion signals in brain regions through the utilization of DL technology. Establishing a DL model with reliable performance metrics has been challenging in most relevant studies due to the limited sample size of NHB patients. The utilization of DL technology for analyzing and predicting NHB is currently in the early stages of exploration and development. There is potential for further improvement and enhancement of the DL models. Therefore, this study aimed to analyze the correlation and regularity of T1WI signal changes in specific locations and regions of the brain MRI images in response to continuous dynamic changes in bilirubin levels. Furthermore, we used the DL method to develop 2-dimensional (2D) and 3-dimensional (3D) CNNs analyzing and predicting NHB based on demographic, clinical, and T1-weighted MRI, respectively, and evaluated the capacity of the DL models to predict NHB. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1050/rc).

Methods

Patient selection

This retrospective study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and was approved by the Research Ethics Committee of the Baotou Medical College (No. 2022-17). Due to the retrospective nature of the study, the collection of patients’ MRI images and relevant clinical information would not adversely affect the patients’ rights or welfare, and thus the need for individual consent was waived by the Ethics Committee of the Baotou Medical College.

We retrospectively recruited patients with NHB who had undergone head MRI imaging examination and serum bilirubin testing in the Imaging Department of the Xi’an Fourth Hospital, between April 2020 and November 2021. The inclusion criteria were as follows: (I) complete image and clinical data; (II) the samples were screened based on clinical information, and only full-term newborns (gestational age ≥37 weeks) were selected, excluding premature infants; and (III) the sample data were then further filtered based on imaging perspective, and transverse T1WI data were chosen as the experimental target. The exclusion criteria were as follows: (I) poor image quality; (II) newborns over 28 days of age or who had undergone treatment in another hospital; (III) the presence of other intracranial diseases, such as tumors or trauma; and (IV) newborns who completed the MRI scan more than 7 days after the discovery of NHB. The actual age of newborns during hospitalization was 1–18 days. The flowchart of patient inclusion is shown in Figure 1.

Figure 1 Flowchart of patient inclusion and exclusion. MRI, magnetic resonance imaging; SBD, serum bilirubin detection; T1WI, T1-weighted imaging.

Data extraction

Retrospectively, demographic and clinical data were obtained from electronic medical records. The following variables were acquired: gender, age, birth weight, and gestational age. Following the American Academy of Pediatrics Guidelines for Management of Hyperbilirubinemia (25), all the patients underwent serum bilirubin testing. The blood samples were uniformly collected, processed using the same method, and analyzed at the same clinical laboratory. The patients were then divided into NHB and non-NHB groups. When collating clinical data, the NHB classification was checked by radiologists against the standard. Imaging notes were used to collect radiologic variables. The MRI images were observed and analyzed by 2 radiologists with more than 8 years of experience in head imaging. In cases of ambiguous results, a consensus was reached through discussion. This evaluation was performed in conjunction with the analysis of all available imaging and clinical data.

MRI protocol

All images were obtained using 1.5 T detector MRI scanners (Optima 360; GE HealthCare, Wauwatosa, WI, USA). MRI examinations were performed on all patients before the treatment. The collected T1WI sequence scan parameters are provided in Table 1.

Table 1

The relevant scan parameters of the T1WI sequence of MRI

Parameters	Value
Repetition time (ms)	5,000
Echo time (ms)	125
Field of view (cm)	24
Acquisition matrix	320×320
Slice thickness (mm)	5
Inter-slice interval (cm)	1.6
Bandwidth (Hz/pixel)	41.67
Excitation	2

T1WI, T1-weighted imaging; MRI, magnetic resonance imaging.

Workflow of DL-based analysis

Figure 2 shows the framework for DL-based analysis. The proposed DL-based workflow included the annotation of brain lesions, the data registration, the multi-scale region division, the augmentation of training data, the multi-threshold group division, and the analysis and classification by different networks, including the 2D Inception-v3, 3D graph convolutional network (GCN), and 3D patch-based graph attention convolutional network (our GCN) which introduced graph attention mechanism systems.

Figure 2 Workflow of the DL-based analysis, which consists of the annotation of brain lesions by the multi-scale region division, the augmentation of training data by multi-scale translation, the multi-threshold group division, and the analysis and classification by different networks, including the 2D Inception-v3, 3D GCN, and our GCN which introduced graph attention mechanism systems. MR, magnetic resonance; DL, deep learning; Inception-v3, Inception-v3 convolutional neural network; GCN, graph convolutional network; our GCN, graph convolutional network introduced the graph attention mechanism; 2D, 2-dimensional; 3D, 3-dimensional.

We utilized multiple thresholds of serum bilirubin levels as grouping variables to systematically analyze the differences and alterations in MRI imaging with increasing bilirubin levels. Then, we evaluated the specific manifestations and trends of various bilirubin levels in brain MRI images of NHB. Moreover, the DL method and gradient-weighted class activation mapping (Grad-CAM) technique were utilized to achieve precise localization and visualization of bilirubin deposition key areas. To further explore the representative sub-regions within the lesion areas and to extract deeper feature information, we developed a 3D model for the analysis and prediction of NHB using a patch-based graph attention mechanism.

Pre-processing

To achieve position matching, shape comparison, and deformation analysis of MRI sequence, we performed image registration. The rigid registration method, using 3D-Slicer registration software (https://www.slicer.org), was employed to match the target and template samples. The data obtained after registration remained in the 3D voxel format. Each case comprised either 15 or 18 images. The 5 slices in the middle of each case sequence were chosen as representative MRI data samples for both the NHB group and the non-NHB group. The new data maintained the same labels as the original unregistered data, ensuring consistency in data labeling across the 2 groups.

Due to the presence of significant invalid regions in the registered images, extracting the regions of interest (ROI) from the slices became necessary to obtain the effective regions. We employed a multi-scale effective region division method and used a rectangular box to cover the central area of the MRI slice, including the brain lesion area of NHB, as the standard for partitioning the ROI. The center of the original image was used as the center for the ROI, where a range of square areas, including 96×96, 128×128, 160×160, 192×192, and 256×256 were partitioned. Performance indices were subsequently calculated to determine the optimal size for the ROI.

Considering the limited sample size, to address the need for a large amount of training data for DL models, the multi-scale translation technique was used to increase the training data. To generate augmented samples, the translation distance in both the horizontal and vertical directions was chosen from the range of integer values within the interval (−2, +2). There was a total of 25 combinations, excluding the case where both the horizontal and vertical distances equal zero, which represented the original image region. Therefore, it was possible to obtain a total of 24-fold augmented samples. We adopted the min-max normalization as well, eliminating the dimensional differences of the data.

Multi-threshold groups division

Our study aimed to analyze and evaluate the correlation and regularity of the changes in the image features of the corresponding brain regions with the dynamic and continuous changes in bilirubin levels using MRI sequence. Additionally, another objective of our study was to achieve predictive classification of NHB based on T1WI sequence of MRI by using the DL method. To achieve these goals, we restructured the dataset based on the bilirubin value, resulting in multiple groups with continuous intervals. We divided the data into 5 groups based on the bilirubin values. Specifically, each subsequent group incremented by 50 µmol/L, starting from 200 µmol/L. This approach allowed for a more detailed evaluation of the data and provided a clearer understanding of the results. Subsequently, we analyzed the changes in continuously multi-threshold bilirubin values and corresponding MRI image features using the 2D Inception-v3 network and 3D GCN network respectively, confirmed their correlation, and determined the critical bilirubin level value that achieved optimal predictive ability. The dataset included NHB (273 patients) and non-NHB (333 patients) who were divided as follows: 606 samples were randomly split into a training set and a testing set, with a ratio of 7:3. The training set consisted of 422 samples, whereas the testing set contained 184 samples. Table 2 illustrates the multi-threshold group distribution of the dataset.

Table 2

The multi-threshold group distribution

Groups	Bilirubin value (μmol/L)	Sample size
Groups	Bilirubin value (μmol/L)	Training dataset	Testing dataset
Group with hyperbilirubinemia (n=273)	200–249	12	5
	250–299	16	11
	300–349	67	13
	350–399	86	20
	>400	28	15
Control group with normal individuals (n=333)	<200	213	120

Analysis and prediction with Inception-v3, GCN, and our GCN system. Inception-v3, Inception-v3 convolutional neural network; GCN, graph convolutional network; our GCN, 3D patch-based graph attention convolutional network; 3D, 3-dimensional.

Model selection and construction

DL technology is commonly used in artificial intelligence (AI)-driven computer-aided diagnosis systems and has demonstrated promising prediction ACC for disease risk (26-29). According to the input, DL-based diagnosis models in medical image analysis can be mainly divided into the following 3 types (30): (I) 2D CNN using a single slice as input; (II) 2D CNN using multiple slices as input, generally consisting of images from different perspectives or scales; and (III) 3D CNN using the entire 3D volume as input. As various bilirubin deposition signs may appear in different slices, finding a slice in the MRI sequence that contains all the bilirubin deposition signs is highly unrealistic. We initially selected the approximately middle 5 consecutive slices for each brain lesion and used a 2D Inception-v3 network for the analysis and evaluation of the correlation and regularity of the continuous dynamic changes in serum bilirubin levels and corresponding MRI image features and achieved the prediction and classification of NHB. However, another major problem related to the 2D CNN using multiple images as input is that it sees the MRI sequence as many independent slices, ignoring useful continuous information between adjacent slices. Since the 3D patch can leverage inter-slice context, we built a 3D CNN model for the MRI image features. Our proposed GCN was constructed using 3D voxel and patches. To further explore the deep feature information in localized regions of MRI signs and enhance the predictive performance of our model, we introduced the graph attention mechanism within the GCN. For the specific model structure and parameters, please refer to the Supplementary material (Appendix 1). We also included 4 other advanced models: decision tree (31), SVM (32), CNN (14), and residual neural network (ResNet) (33), for comparison. As a result, both 2D Inception v3 (34) and 3D GCN (35) have achieved state-of-the-art performance in analysis and prediction for NHB.

Training strategy

Being different from starting with a series of random initialization parameters, a pre-training strategy was applied at the beginning of the model training process in this study, which can accelerate the subsequent iterations. The initial parameters were obtained by pre-training on another dataset of the same type, such as the ImageNet (36). It is the most commonly used pre-training dataset and it contributes to train robust and universal image classification models. The obtained parameters were used to initialize the model, which can promote the network as a better approach to the global optimal solution. In this work, DL-based models were applied for transfer learning with pre-trained weights from ImageNet to analyze the MRI images in our dataset. A fine-tuning technique was adopted for transfer learning. In the fine-tuning approach, 80% of the pre-trained layers in convolutional base were frozen whereas another 20% of the layers remained unfrozen for model re-training with the selected MRI dataset.

Analysis of MRI inter-layer associations

As a DL model, GCN effectively handles different types of graph-structured data. By exploring the connections between global and local aspects of 3D data, it can perform tasks such as node classification and graph classification. Subsequently, to investigate the hidden inter-layer correlation information within MRI data and extract more convincing local features from the ROI, GCN was employed in this study. The model processed brain MRI data in a graph structure for analyzing and evaluating the correlation and regularity of image features of NHB in dynamic and continuous changes of serum bilirubin levels. Furthermore, the graph attention mechanism was introduced for selecting highly representative subregions from the entire ROI, to enhance the model’s prediction ACC and accomplish the task of NHB prediction and classification.

Calculation of cosine similarity

The experimental dataset consisted of 3D voxels obtained through registration. The voxels were subdivided into small patches of equal size. The graph-structured feature vectors were obtained by preprocessing the extracted 3D patches. To ensure the selected 3D patches effectively represent their respective categories and clearly exhibit their differences from the corresponding categories, they should simultaneously satisfy the principles of significant inter-category differences and minimal intra-category differences. Candidate patches were selected by assessing the differences between voxels from different categories and the similarities within the same category. The calculation of cosine similarity was performed to measure the inter- and intra-class differences and similarities (37). Specifically, after data preprocessing, each type of patch was converted into a 1×128-dimensional feature vector. S₁ and S₂ denote the 1-dimensional vectors obtained by expanding 2 types of patches. The cosine similarity calculation formula between these 2 patches can be expressed as Eq. [1].

$S i m i l a r i t y (S_{1}, S_{2}) = \frac{S_{1} • S_{2}}{| S_{1} | \times | S_{2} |} = \frac{\sum_{i = 1}^{128} (x_{i} \times y_{i})}{\sqrt{\sum_{i = 1}^{128} x_{i}^{2}} \times \sqrt{\sum_{i = 1}^{128} y_{i}^{2}}}$ [1]

Among them, x_i and y_i respectively represent each value of feature vectors S₁ and S₂. The larger the cosine similarity value calculated by this formula, the higher the similarity between the 2 types of patches.

Graph attention mechanism

Currently, attention is arguably one of the most productive mechanisms in the DL field. Using attention is a useful way to achieve robust performance when there are many features in a network (38). Based on the overall manifestations and patterns of continuous changes in NHB in imaging studies described in the above-mentioned research, to further enhance the analysis of imaging characterization and extract more compelling key sub-regions from the ROI, as well as investigate local deep feature information, we developed a graph attention CNN. Specifically, during the training of data using the GCN, we introduced the graph attention mechanism. In the training process, each node was assigned a set of initial weight values, denoted as W₁, W₂, …, and W_n, and the output feature vector values of each node were denoted as X₁, X₂, …, and X_n. By performing element-wise multiplication and subsequent summation of the feature vectors with their corresponding weight values, the global feature vector X was obtained, as depicted in Eq. [2].

$X = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \times W_{i}$ [2]

The attention mechanism was introduced into the GCN to identify patches with high representation power. The top K representative patches were selected as new samples for predicting NHB. Figure 3 displays the flowchart of the patch-based graph attention diagnosis model. This model further verifies the impact and change rules of continuous bilirubin level changes on brain MRI signals.

Figure 3 Flowchart of hyperbilirubinemia prediction by introducing the graph attention mechanism.

Performance evaluation and visualization

Evaluation metrics

Differences exist in the brain MRI images between newborns without hyperbilirubinemia and those with hyperbilirubinemia, as well as among patients with different levels of hyperbilirubinemia. To explore whether the correlation and regularity of the MRI signal in the brain tissue significantly varies with dynamic changes in bilirubin levels and identify the deposition sites of bilirubin in the brain, we systematically divided the bilirubin values into continuous intervals, compared the discrepancies between groups of newborns without and with hyperbilirubinemia, as well as among patients with different levels of hyperbilirubinemia, and investigated alterations in T1WI performance with varying bilirubin levels. This allowed us to conduct a detailed study of the deposition sites of bilirubin in neonates with hyperbilirubinemia in their brain tissue, and locate regions with substantial variations in the corresponding signal due to dynamic changes in bilirubin levels.

During the training process of the Inception-v3 network, we utilized the Adam optimizer with a learning rate of 2×10⁻⁵, conducted 200 epochs, used a regularization parameter of 0.012, and applied a batch size of 64. During the training process of the GCN network, we utilized the Adam optimizer with a learning rate of 1×10⁻⁵, conducted 200 epochs, used a regularization parameter of 0.025, and applied a batch size of 32. We calculated various metrics to evaluate the model’s performance on the testing set, including the area under the curve (AUC), ACC, sensitivity (SEN), specificity (SPE), and others.

Model visualization

We employed the Grad-CAM technique to analyze and evaluate the specific performance and continuous change of NHB by highlighting the most important regions of the input image. The Grad-CAM technique was also implemented to visualize how the models predict and classify the MRI images of NHB into normal and abnormal classes. The feature maps from the final convolutional layer of the model were used in Grad-CAM to create an activated heatmap, which was then superimposed on the MRI image ROI to highlight the visual patterns for the class prediction. This facilitated the understanding of the predictions from the ‘black box’ of the DL models. Figure 4 elucidates the process used to evaluate and visualize the imaging performance of hyperbilirubinemia.

Figure 4 Flowchart of imaging evaluation by using Grad-CAM in hyperbilirubinemia. Grad-CAM, gradient-weighted class activation mapping; CNN, convolutional neural network; FC, fully connected layer; ReLU, rectified linear unit; GAP, global average pooling.

The implementation process of Grad-CAM is as follows: Firstly, the input image undergoes feature extraction through the convolutional layer. Then, the forward propagation process is executed to obtain both the feature layer A (typically referring to the output result of the last convolutional layer) and the network’s pre-activation output value (referring to the value prior to activation by the softmax layer). To investigate which ROIs are attended by the network, assuming $y_{c}$ represents the final prediction for the input image by the network, it becomes imperative to employ backpropagation on output result $y_{c}$ to acquire gradient information pertaining to feature layer A. Consequently, we can determine weights associated with each channel within feature layer A, followed by weighted summation and application of an activation function to generate a Grad-CAM. The Grad-CAM calculation formula can be expressed as Eq. [3].

$L_{G r a d - C A M} = R e L U (\sum_{k} a_{k}^{c} A^{k})$ [3]

Among them, k represents the k-th channel in feature layer A, A^krepresents the data contained in this channel, c represents the actual category of prediction result, and $a_{k}^{c}$ represents the weight size against A^k, and the formula of $a_{k}^{c}$ is shown in Eq. [4].

$a_{k}^{c} = \frac{1}{Z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial y_{i j}^{k}}$ [4]

Among them, $y^{c}$ represents the predicted value of the network model for the input, $A_{i j}^{k}$ represents the data of the feature layer $A$ at position $(i, j)$ in channel $k$ , and $z$ represents the product of the width and height of the feature layer.

Implementation

All models were trained with identical random initialization and training/validation split, and were built on the PyTorch framework (v1.3.1) with Python (v3.6) language (https://www.python.org). All experiments in this study were run using PyCharm (https://www.jetbrains.com/zh-cn/pycharm/). The experimental server used the Ubuntu operating system (v16.04; Canonical, London, UK) with an 8-core Intel processor (i7-9700 k, 32 GB RAM). To improve the training speed, a graphics processing unit (GPU) device (RTX 2080TI, 11 GB memory) was used for acceleration under the CUDA environment (v10.1).

Statistical analysis

The software SPSS 26.0 (IBM Corp., Armonk, NY, USA) was used for statistical analysis. Clinical characteristics were evaluated for associating NHB. Fisher’s test was utilized to compare the gender differences between the NHB group and the non-NHB group. Student’s t-test was adopted to compare the differences in age, birth weight, and gestational age between the 2 groups, and presented as the mean ± standard deviation (SD). A P value of <0.05 was considered statistically significant.

To comprehensively evaluate the models we proposed for MRI image features of NHB analyzation and prediction, we computed the ACC, SEN, and SPE in turn. We further calculated the AUC for each sign class. We performed 5-fold cross-validation in the experiments, with all results being averaged on the testing set. Metrics for each patient were obtained by majority voting on the results of all slices for that patient.

Results

Clinical characteristics

A total of 273 patients were included in the NHB group, and 333 patients were included in the non-NHB group. The differences in baseline clinical characteristics between the 2 groups are shown in Table 3. There were no significant differences observed in terms of gender, mean age, birth weight, and gestational age distribution (P>0.05).

Table 3

Clinical characteristics of 606 enrolled patients

Characteristics	Non-NHB group (n=333)	NHB group (n=273)	P value
Gender (male/female)	162/171	139/134	0.460
Age (days)	12.83±5.05	10.12±4.13	0.320
Birth weight (kg)	3.56±0.62	3.41±0.57	0.162
Gestational age (weeks)	38.38	38.47	0.792

Continuous data with a normal distribution are presented as mean ± SD. NHB, neonatal hyperbilirubinemia; non-NHB, not neonatal hyperbilirubinemia; SD, standard deviation.

Results of multi-scale ROIs

In the experiment, ROIs were designated as 96×96, 128×128, 160×160, 192×192, and 256×256. Subsequently, the 5 groups of data samples extracted with different sizes were divided into the training set and testing set according to 7:3, and then 5 groups of independent experiments were conducted. The performance metrics on the test sets were compared among different groups to determine the ROI size that yielded the highest index value. This selected effective region was then utilized for subsequent experiments.

The evaluation performance metrics included the receiver operating characteristic curve (ROC), AUC, ACC, SEN, SPE, and others. Metrics such as ACC, SEN, SPE, and AUC ranged between 0 and 1, with higher values indicating superior performance. The performance evaluation for these 5 scales on experimental test data can be obtained through calculation. The experimental results are presented in Table 4, demonstrating that 128×128 exhibits an optimal size of the ROI.

Table 4

The results of multi-scale regions of interest based on Inception-v3

Region size	AUC	ACC	SEN	SPE
96×96	0.72	0.70	0.62	0.70
128×128	0.74*	0.74*	0.64	0.87*
160×160	0.69	0.71	0.71*	0.67
192×192	0.65	0.72	0.57	0.75
256×256	0.64	0.72	0.63	0.85

*, the best results of the metrics. Inception-v3, Inception-v3 convolutional neural network; AUC, area under the curve; ACC, accuracy; SEN, sensitivity; SPE, specificity.

Analysis and prediction results for DL models based on multi-threshold groups

The study utilized a baseline bilirubin level of 200 µmol/L and divided the bilirubin levels into consecutive intervals with 50 µmol/L increments. Initially, the 2D Inception-v3 network was utilized to evaluate the imaging trends and image feature changes in the T1WI sequence among various bilirubin intervals. The experimental results are presented in Table 5. Based on the experimental results, it was evident that as the bilirubin level increases continuously, the T1WI signal enhances, and various metrics exhibit a consistent upward trend. Notably, the AUC value experienced a remarkable increase when the bilirubin value was beyond 400 µmol/L, indicating the model’s strong predictive ability and optimal performance at this threshold.

Table 5

The performance of the system utilizing the 2D Inception-v3 network for NHB assessment

Bilirubin value (μmol/L)	AUC	ACC	SEN	SPE
200–249	0.53	0.65	0.48	0.74
250–299	0.58	0.68	0.53	0.73
300–349	0.59	0.69	0.56	0.75*
350–399	0.62	0.70	0.60	0.74
>400	0.72*	0.76*	0.66*	0.75*
Mean	0.61	0.70	0.57	0.74

*, the best results of the metrics. 2D, 2-dimensional; Inception-v3, Inception-v3 convolutional neural network; NHB, neonatal hyperbilirubinemia; AUC, area under the curve; ACC, accuracy; SEN, sensitivity; SPE, specificity.

Subsequently, in our experiments using the 3D GCN network to analyze and predict NHB, we adopted the same methods for partitioning the data samples. The experimental samples consisted of constructed 3D patch graph-structured data, with a 7:3 partitioning ratio for training and testing. To verify the correlation and regularity of multi-threshold continuous grouping bilirubin levels with corresponding MRI image features and to confirm the conclusion that the model performs optimally when bilirubin values exceed 400 µmol/L, the GCN network employed the 3D patch with continuous bilirubin values as samples. The testing set commenced at 200 µmol/L, with the bilirubin values segmented into intervals of 50 µmol/L. Through evaluating the correlation between bilirubin values and model performance, the bilirubin level corresponding to the model’s highest predictive capability was determined. Table 6 illustrates the model’s prediction performance across varying intervals by using GCN.

Table 6

The performance of the system utilizing the 3D GCN network for NHB assessment

Bilirubin value (μmol/L)	AUC	ACC	SEN	SPE
200–249	0.72	0.70	0.63	0.78
250–299	0.71	0.73	0.65	0.74
300–349	0.76	0.73	0.72	0.78
350–399	0.78	0.75	0.75	0.81
>400	0.81*	0.77*	0.79*	0.83*
Mean	0.76	0.74	0.71	0.79

*, the best results of the metrics. 3D, 3-dimensional; GCN, graph convolutional network; NHB, neonatal hyperbilirubinemia; AUC, area under the curve; ACC, accuracy; SEN, sensitivity; SPE, specificity.

Comparing the prediction performance of the GCN model with that of the Inception-v3 model, it was observed that the AUC value can reach up to 0.81 and the ACC value can reach up to 0.77 when calculating the performance metrics using the GCN model. Using the Inception-v3 model, the highest values obtained for AUC and ACC were 0.72 and 0.76, respectively. Both of them outperformed the results obtained solely through visual inspection of MRI by experienced radiologists, which yielded an AUC of 0.62, ACC of 0.64, SEN of 0.61, and SPE of 0.62. Therefore, the 3D GCN model demonstrated better performance in analyzing and predicting NHB. The results above demonstrated that as the bilirubin value increased, there was a consistent upward trend in various performance metrics. This trend was especially prominent in samples with bilirubin values exceeding 400 µmol/L, where the models demonstrated optimal discriminative predictability.

Performance analysis of graph attention mechanism

To further investigate the MRI image features performance of NHB and improve the DL models’ prediction ACC, we introduced the graph attention mechanism. We selected a subset of representative patches from the original image, which served as feature vectors to represent the original samples. Subsequently, the attention mechanism of the GCN determined the weights associated with each node. By calculating the cosine similarity between the feature vector of each patch type and the global feature vector, we could identify the top 5 patch types with the highest similarity to the global feature vector. Combining these patches produced a new feature vector for this data sample. Employing supervised learning with labeled data, we fed both the class labels and feature vectors into a logistic regression model for classification prediction. The data was still partitioned into training and testing sets, following a 7:3 ratio. The results are presented in Table 7. For the details of patch selection, please refer to the Supplementary material (Appendix 1).

Table 7

The performance of the system utilizing the GCN network with the introduced graph attention mechanism for NHB assessment

Bilirubin value (μmol/L)	AUC	ACC	SEN	SPE
200–249	0.73	0.76	0.72	0.72
250–299	0.78	0.73	0.73	0.76
300–349	0.82	0.79	0.79	0.77
350–399	0.74	0.75	0.70	0.74
>400	0.86*	0.81*	0.83*	0.86*
Mean	0.79	0.77	0.75	0.77

*, the best results of the metrics. GCN, graph convolutional network; NHB, neonatal hyperbilirubinemia; AUC, area under the curve; ACC, accuracy; SEN, sensitivity; SPE, specificity.

Consequently, we could conclude that the graph attention mechanism facilitated the extraction of highly representative patches. The model’s ACC in predicting NHB could be significantly improved. The performance of the selected top 5 patch types in radiology could be further elucidated by labeling and displaying them on the original image, as depicted in Figure 5.

Figure 5 Visualization results of the top 5 types of patches. The red boxes represent the top 5 representative patches.

Upon observation, it was evident that these 5 types of patches were primarily concentrated in the globus pallidum region of the image, which aligned with the visualization results obtained from the Grad-CAM. Our GCN showed that the graph attention mechanism could be utilized to achieve more precise identification of bilirubin deposition area in the brain MRI.

Visualization and interpretability of DL models

To investigate the specific manifestations of NHB in brain MRI and determine the extent of changes in corresponding brain regions as the bilirubin level increases steadily, Grad-CAM was employed. Grad-CAM was utilized to identify the ROI that the model focuses on and provide a visual explanation of the results. The fluctuation in the heatmap intensity value correlates to the degree of attention given by the model on the highlighted locations for making decisions. For instance, the region with a warmer color (either red or near-red color) contributes the most attention by the model for the class prediction. Figure 6 illustrates 3 brain MRI images used as inputs, along with their respective Grad-CAM images generated. It also indicated that DL models managed to recognize these regions as lesion tissues, which demonstrated the models’ logic and capability in distinguishing normal and lesion tissues.

Figure 6 Original images and corresponding Grad-CAM images based on Inception-v3. Grad-CAM, gradient-weighted class activation mapping; Inception-v3, Inception-v3 convolutional neural network.

With the elevation of bilirubin levels, particular regions in MRI images experienced alterations. These changes reflected the deposition of bilirubin in the brain, which could lead to NHB and related health conditions. By analyzing these changes in MRI images, we could identify and track crucial features to diagnose the severity of hyperbilirubinemia. A sample comprising 5 slices was chosen from various bilirubin intervals, and the specific changed areas within these slices are presented in Figure 7.

Figure 7 Schematic diagram of lesion area in brain MRI with various bilirubin intervals. The red boxes represent bilirubin deposition lesion area. MRI, magnetic resonance imaging.

By observing 5 slices with different bilirubin levels and combining them with relevant clinical information, the specific characteristic features and changes of NHB on MRI could be summarized. Specifically, as the bilirubin level increased, the signal in the T1WI sequence for the globus pallidus area (the red box area in the image) demonstrated overall enhancement. The signal intensity in the rest of the image, however, remained essentially unchanged. The images were evaluated by 2 cooperative hospital radiologists with over 8 years of experience in MR image interpretation, who arrived at a consistent conclusion. A comparison between the box area in the image and the region highlighted by the model in the Grad-CAM revealed consistent identification of ROI. It demonstrated the model’s ability to identify specific manifestations of NHB in imaging and locate the ROI. These findings demonstrated the feasibility and efficacy of our DL-based approach in assisting the clinical diagnosis of NHB.

Comparison with other advanced models

On the T1-weighted MRI dataset, we compared and evaluated the performance of various advanced models for analyzing and predicting NHB. The comparative analysis demonstrated that the our GCN developed in this study achieved superior results in identifying and predicting this disease. Figure 8 displays the prediction ACC results of various models for NHB, whereas Figure 9 illustrates the ACC curves representing the prediction ability of these models.

Figure 8 Comparison in prediction accuracy among various models for NHB. ACC, accuracy; SVM, support vector machine; CNN, convolutional neural network; Inception-v3, Inception-v3 convolutional neural network; GCN, graph convolutional network; ResNet, residual neural network; our GCN, 3D patch-based graph attention convolutional network; 3D, 3-dimensional; NHB, neonatal hyperbilirubinemia.

Figure 9 Accuracy curves of various models for prediction of NHB. SVM, support vector machine; CNN, convolutional neural network; Inception-v3, Inception-v3 convolutional neural network; ResNet, residual neural network; GCN, graph convolutional network; our GCN, 3D patch-based graph attention convolutional network; 3D, 3-dimensional; NHB, neonatal hyperbilirubinemia.

Discussion

Severe hyperbilirubinemia presents a significant health risk to infants. Failure to provide timely clinical intervention significantly increases the risk of severe and irreversible brain damage in infants. MRI scans are of great value in differentiating NHB, and the MRI-based identification of NHB has gradually gained more attention in clinical practice (21-24). The MRI analysis of NHB reveals that as serum bilirubin levels dynamic and continuously rise, the increased signal intensity of the globus pallidum on T1WI sequence of MRI serves as a crucial indication of neonatal bilirubin encephalopathy. Exploring MRI image features with the dynamic changes of bilirubin levels and early detection of NHB could aid in the development of an appropriate treatment plan, which leads to a higher cure rate. The emergence of DL technology can assist and facilitate radiologists in effectively analyzing and predicting the NHB. In our study, DL methods were predominantly employed to analyze and evaluate brain MRI images of neonates with hyperbilirubinemia. We analyzed and evaluated the correlation and regularity between continuous dynamic changes in multi-threshold bilirubin levels and corresponding MRI image features in diagnosing NHB. We found that the trends in MRI image features of NHB were consistent with the corresponding continuous dynamic changes in serum bilirubin levels, and there was a positive correlation between the 2 when the serum bilirubin value exceeded 400 µmol/L. We demonstrated that DL models can analyze the dynamic changes and trends of elevated bilirubin levels in MRI imaging effectively and efficiently. Moreover, these models have also achieved good performance in discriminating brain tissues into normal and hyperbilirubinemia classes. Our results indicated that MRI image features could be considered an important auxiliary method for evaluating the changes of bilirubin deposition in NHB, and could provide clinicians with valuable information for diagnostic and therapeutic decision-making. Of note, DL models contribute to improving the diagnostic efficiency of clinicians and provide a new diagnostic strategy for NHB.

Numerous studies have consistently demonstrated that MRI has high reliability in monitoring bilirubin deposition and evaluating the severity of brain tissue lesions (6,8,13). Clinicians’ assessment of brain tissue lesions relies heavily on MRI. However, it is an empirical evaluation that lacks quantitative metrics and has a certain level of subjectivity, and the industry has yet to reach a consistent conclusion, which may affect clinicians’ treatment decision-making. Previous studies have suggested that changes in brain MRI image features are associated with serum bilirubin levels, but they have all been based on a single bilirubin threshold, neglecting to investigate the impact of continuous fluctuations in serum total bilirubin levels on MRI signals (9-12). In clinical practice, although abnormal signals can be detected in the brain MRI of infants with hyperbilirubinemia, there is a lack of in-depth analysis of the results from an imaging standpoint. The analysis and demonstration of significant regional signal differences resulting from fluctuations in bilirubin levels have been considered insufficient. In our study, we monitored the dynamic and continuous changes in bilirubin levels along with corresponding brain MRI image features. We applied DL analysis with multiple threshold values to the dynamic and continuous bilirubin changes in corresponding MRI images. Our findings confirmed the correlation between the dynamic changes in bilirubin levels and alterations in brain MRI image features. We divided the ROI into multiple scales and utilized DL models for analysis and evaluation. These various scales of ROIs were utilized as inputs for the Inception-v3 network. The optimal size for the ROI was 128×128. The prediction metrics of the model served as evaluation criteria. Then, we defined multiple groups of test intervals based on the varying levels of bilirubin values. We established test intervals by incrementing bilirubin levels by 50 µmol/L. Through a comparative analysis of the model’s prediction performance across different test intervals, we visually demonstrated the correlation and regularity of brain MRI image features with serum bilirubin levels in NHB and confirmed the development trend. We conducted performance analysis using the AUC, ACC, SEN, and SPE. Higher values indicated a greater consistency between bilirubin levels and corresponding MRI image features. Table 5 shows that as the bilirubin level increases continuously, the T1WI signal enhances, and various metrics exhibit a consistent upward trend. This finding is consistent with previous studies and confirms that dynamic changes in bilirubin levels are consistent with changes in brain MRI image features (6,8,9,11,12). Notably, the AUC value experienced a remarkable increase beyond 400 µmol/L, indicating the model’s strong analytical ability and optimal performance at this threshold, with an AUC of 0.72 and an ACC of 0.76.

Our work also placed emphasis on model prediction and classification. With the rapid development of DL technology, DL models have been used to improve the ACC of medical imaging detection and reduce false positives (14-17). DL algorithms can overcome the disadvantages of traditional ML algorithms, such as manual segmentation measurement errors and doctors’ subjectivity (32). In this study, our proposed multi-scale approach could automatically extract ROIs and cover the information of the surrounding area of the globus pallidum, providing more comprehensive imaging features. The prediction model based on Inception-v3 that we applied achieved an AUC of 0.72, ACC of 0.76, SEN of 0.66, and an SPE of 0.75, as shown in Table 5, demonstrating its effectiveness in diagnosing NHB. However, its ACC was lower than that of previous studies (23). We analyzed the reasons and found that the introduction of image information around the globus pallidus, compared to traditional manual mapping of ROI, could create interference for the model. Additionally, insufficient inter-layer correlation between MRI layers could result in decreased ACC in model prediction. We believed that it was because the 2D prediction model does not consider the implicit connections among the features of different brain MRI slices, whereas in the 3D patch-based model, the features can influence each other. To address this, we used a GCN model based on 3D patches. Meanwhile, to further explore the relationship between global and local information in MRI sequence, we used the Grad-CAM technique to identify the ROI regions. Following the principles of distinguishing between different categories and identifying similarities within the same category, several small patches were extracted from the 3D voxel data of the ROI by calculating the cosine similarity. We selected a total of 25 representative patches, which were used as input for the GCN model. We examined the impact of various patch sizes and numbers on the classification results. This model fully explored and effectively captured the inter-layer correlation within MRI sequence and extracted deep-level feature information. As a result, the GCN model achieved an AUC of 0.81, ACC of 0.77, SEN of 0.79, and SPE of 0.83, as shown in Table 6.

Moreover, we introduced the graph attention mechanism within the GCN to further enhance the model’s predictive precision and classification ACC. The top 5 most representative patches could be identified from 25 patches. These selected patches were subsequently used for the prediction and classification of NHB. This improvement resulted in a further increase in the AUC to 0.86, the ACC to 0.81, the SEN to 0.83, and the SPE to 0.86, indicating a significant improvement in the model’s prediction and classification performance, as shown in Table 7. Our performance metrics were higher than previous research findings, and it eliminated the tedious process of manually delineating ROIs and avoided the interference of human subjectivity. This also demonstrated the advanced nature of our method. Finally, by comparing with other advanced models such as decision tree, SVM, CNN, and Inception-v3, as shown in Figures 8 and 9, it demonstrated that our proposed method is superior to most models and provides a non-invasive diagnosis and treatment strategy for clinicians.

Furthermore, the connection between key sub-regions affected by NHB and continuous changes in bilirubin levels, as well as visualization observations using DL models to analyze signals in lesioned brain regions, have not been explored. Thereby, by utilizing the Grad-CAM mechanism, we accurately located the sub-ROIs within the MR slice, as shown in Figures 6 and 7. In combination with the analysis and evaluation of the image by medical professionals, specifically the enhanced signal in the basal ganglia and globus pallidus regions of the brain MRI as the bilirubin value continued to rise. These findings also provide compelling evidence concerning the specific manifestations and patterns of change associated with NHB in the field of radiology.

Overall, in our study, the end-to-end DL-based NHB diagnostic models displayed excellent performance in analysis and prediction. These findings indicated that our method to analyze and classify NHB is significantly superior to radiologists and traditional ML methods. In addition, computer-aided decision support systems have the potential to eliminate interobserver variability and address problems related to inexperienced users and biased expert judgments in NHB diagnosis.

This study had several limitations. First, the dataset was sourced from a single cooperative hospital and was relatively small in size. DL models rely on large datasets, and incorporating multi-center, multi-style image data can significantly enhance their performance and robustness. Second, for this experiment, only T1WI of brain MRI sequences were utilized, excluding other MRI sequences. For future studies, it is recommended to include and adopt multiple MRI sequences to optimize the utilization of available information. Additionally, from a medical imaging standpoint, although this study could identify the specific radiological manifestations and observe the changing trends of NHB, it did not extensively explain histopathology and instead focused on presenting only the results of image-based analysis. In future research, the conclusions of this paper can be elaborated upon using a multi-omics perspective.

Conclusions

Our study confirmed that dynamic changes in bilirubin levels are consistent with changes in brain MR image features. We used DL models based on multiple thresholds to analyze the trend of continuous dynamic changes in bilirubin levels, and effectively predict the NHB into normal and lesion classes by using MRI image features, achieving satisfactory performance. In our research, we introduced the graph attention mechanism, which enhanced the deep feature information of MRI and the interlayer associations that were not apparent with traditional ML technology, thus improving the model’s prediction performance and providing more accurate classification results. Our study shows that DL-based systems can automatically detect and classify NHB on MRI, and that our GCN has high SEN and ACC when the bilirubin value exceeded 400 µmol/L. These results illustrate the potential use of this technique in a clinically relevant setting. Findings from this study can be expected to provide a reliable reference and valuable insights for decision-making in clinical practice and provide a scientific basis for early diagnosis of NHB.

Acknowledgments

Funding: This work was supported by the Inner Mongolia Natural Science Foundation of China (grant No. 2023LHMS06001), the Inner Mongolia Scientific of Higher Education Institution Foundation of China (grant No. NJZY23092), the Inner Mongolia Health Science and Technology Foundation of China (grant No. 202201399), and the Baotou Health Science and Technology Foundation of China (grant No. wsjkkj2022119).

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1050/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1050/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and was approved by the Research Ethics Committee of the Baotou Medical College (No. 2022-17). Due to retrospective nature of the study, the collection of patients’ MR images and relevant clinical information would not adversely affect the patients’ rights or welfare, and thus the need for individual consent was waived by the Ethics Committee of the Baotou Medical College.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Olusanya BO, Kaplan M, Hansen TWR. Neonatal hyperbilirubinaemia: a global perspective. Lancet Child Adolesc Health 2018;2:610-20. [Crossref] [PubMed]
Bhutani VK, Wong RJ, Stevenson DK. Hyperbilirubinemia in Preterm Neonates. Clin Perinatol 2016;43:215-32. [Crossref] [PubMed]
Guedalia J, Farkash R, Wasserteil N, Kasirer Y, Rottenstreich M, Unger R, Grisaru Granovsky S. Primary risk stratification for neonatal jaundice among term neonates using machine learning algorithm. Early Hum Dev 2022;165:105538. [Crossref] [PubMed]
Liu K, He H, Hua Z. Clinical analysis of acute bilirubin encephalopathy in 227 neonates, 2012. J Clin Pediatr 2012;30:840-4.
Shapiro SM, Riordan SM. Review of bilirubin neurotoxicity II: preventing and treating acute bilirubin encephalopathy and kernicterus spectrum disorders. Pediatr Res 2020;87:332-7. [Crossref] [PubMed]
Yokochi K. Magnetic resonance imaging in children with kernicterus. Acta Paediatr 1995;84:937-9. [Crossref] [PubMed]
Barkovich AJ. MR of the normal neonatal brain: assessment of deep structures. AJNR Am J Neuroradiol 1998;19:1397-403.
Ren Q, Kang X, Zheng J. Application value of MRI in neonatal bilirubin encephalopathy, 2010. J Shandong Univ Med Sci 2010;48:91-3.
Yan R, Han D, Wang S. Diagnostic value of MRI and MRS In neonatal hyperbilirubinemia, 2014. Clin Radiol 2014;33:1743-7.
Maisels MJ, Bhutani VK, Bogen D, Newman TB, Stark AR, Watchko JF. Hyperbilirubinemia in the newborn infant > or =35 weeks’ gestation: an update with clarifications. Pediatrics 2009;124:1193-8.
Wu M, Wang X. Research progress of magnetic resonance and magnetic resonance spectroscopy in bilirubin encephalopathy, 2007. Chin Med Imaging Technol 2007;23:796-9.
Fang X, Yan W, Zheng X, Lan B, Zhang K. Value of MRI in the diagnosis of subclinical bilirubin brain injury, 2015. J Guangzhou Med Univ 2015;43:66-8.
Okumura A, Hayakawa F, Maruyama K, Kubota T, Kato K, Watanabe K. Single photon emission computed tomography and serial MRI in preterm infants with kernicterus. Brain Dev 2006;28:348-52. [Crossref] [PubMed]
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60-88. [Crossref] [PubMed]
Mazurowski MA, Buda M, Saha A, Bashir MR. Deep learning in radiology: An overview of the concepts and a survey of the state of the art with focus on MRI. J Magn Reson Imaging 2019;49:939-54. [Crossref] [PubMed]
Raj RJS, Shobana SJ, Pustokhina IV, Pustokhin DA, Gupta D, Shankar K. Optimal feature selection-based medical image classification using deep learning model in internet of medical things, 2020. IEEE Access 2020;8:58006-17.
Zhou J, Zhang Y, Chang KT, Lee KE, Wang O, Li J, Lin Y, Pan Z, Chang P, Chow D, Wang M, Su MY. Diagnosis of Benign and Malignant Breast Lesions on DCE-MRI by Using Radiomics and Deep Learning With Consideration of Peritumor Tissue. J Magn Reson Imaging 2020;51:798-809. [Crossref] [PubMed]
Jain R, Jain N, Aggarwal A, Hemanth DJ. Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images, 2019. Cogn Syst Res 2019;57:147-59.
Zhang H, Cheng Y, Chen Z, Cong X, Kang H, Zhang R, Guo X, Liu M. Clot burden of acute pulmonary thromboembolism: comparison of two deep learning algorithms, Qanadli score, and Mastora score. Quant Imaging Med Surg 2022;12:66-79. [Crossref] [PubMed]
Qu J, Shen C, Qin J, Wang Z, Liu Z, Guo J, Zhang H, Gao P, Bei T, Wang Y, Liu H, Kamel IR, Tian J, Li H. The MR radiomic signature can predict preoperative lymph node metastasis in patients with esophageal cancer. Eur Radiol 2019;29:906-14. [Crossref] [PubMed]
Chen X, Wang Z, Zhan Y, Wang P. Multi-modal Fusion with Dense Connection for Acute Bilirubin Encephalopathy Classification. Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, Proceedings, Part II 2021:716-28.
Kumar Y, Patel N P, Koul A, Gupta A. Early prediction of neonatal jaundice using artificial intelligence techniques. 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), Gautam Buddha Nagar, India, 2022:222-6.
Wu M, Shen X, Lai C, You Y, Zhao Z, Wu D. Detecting acute bilirubin encephalopathy in neonates based on multimodal MRI with deep learning. Pediatr Res 2022;91:1168-75. [Crossref] [PubMed]
Kalbande D, Majumdar A, Dorik P, Prajapati P, Deshpande S. Deep Learning Approach for Early Diagnosis of Jaundice. In: Gupta D, Khanna A, Hassanien AE, Anand S, Jaiswal A. editors. International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, Springer, Singapore 2022;492:387-95.
Kemper AR, Newman TB, Slaughter JL, Maisels MJ, Watchko JF, Downs SM, Grout RW, Bundy DG, Stark AR, Bogen DL, Holmes AV, Feldman-Winter LB, Bhutani VK, Brown SR, Maradiaga Panayotti GM, Okechukwu K, Rappo PD, Russell TL. Clinical Practice Guideline Revision: Management of Hyperbilirubinemia in the Newborn Infant 35 or More Weeks of Gestation. Pediatrics 2022;150:e2022058859. [Crossref] [PubMed]
Liu Z, Ji B, Zhang Y, Cui G, Liu L, Man S, Ding L, Yang X, Mao H, Wang L. Machine Learning Assisted MRI Characterization for Diagnosis of Neonatal Acute Bilirubin Encephalopathy. Front Neurol 2019;10:1018. [Crossref] [PubMed]
Althnian A, Almanea N, Aloboud N. Neonatal Jaundice Diagnosis Using a Smartphone Camera Based on Eye, Skin, and Fused Features with Transfer Learning. Sensors (Basel) 2021;21:7038. [Crossref] [PubMed]
Deng H, Zhou Y, Wang L, Zhang C. Ensemble learning for the early prediction of neonatal jaundice with genetic features. BMC Med Inform Decis Mak 2021;21:338. [Crossref] [PubMed]
Zeng S, Wang Z, Zhang P, Yin Z, Huang X, Tang X, Shi L, Guo K, Liu T, Wang M, Qiu H. Machine learning approach identifies meconium metabolites as potential biomarkers of neonatal hyperbilirubinemia. Comput Struct Biotechnol J 2022;20:1778-84. [Crossref] [PubMed]
Zhan K, Wang Y, Zhuo Y, Zhan Y, Yan Q, Shan F, Zhou L, Chen X, Liu L. An uncertainty-aware self-training framework with consistency regularization for the multilabel classification of common computed tomography signs in lung nodules. Quant Imaging Med Surg 2023;13:5536-54. [Crossref] [PubMed]
Chen MM, Chen MC. Modeling road accident severity with comparisons of logistic regression, decision tree and random forest. Information 2020;11:270.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM 2017;60:84-90.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016:770-8.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015:1-9.
Bruna J, Zaremba W, Szlam A, LeCun Y. Spectral Networks and Deep Locally Connected Networks on Graphs. Proc. ICLR 2014:1312-26.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009:248-55.
Nguyen HV, Bai L. Cosine similarity metric learning for face verification. In: Kimmel R, Klette R, Sugimoto A. editors. Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg 2011;6493:709-20.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017:30-47.

Cite this article as: Xu L, He Y, Yang L, Meng H, Zhang M. Automatic deep learning method for analysis and prediction of neonatal hyperbilirubinemia in magnetic resonance imaging. Quant Imaging Med Surg 2024;14(12):8502-8519. doi: 10.21037/qims-24-1050