# A brain structure learning-guided multi-view graph representation learning for brain network analysis

## Introduction

Research on mental disorders has become a crucial direction in neuroscience and clinical medicine over the past few decades. Traditionally, clinical analysis primarily focused on patients’ apparent symptoms, such as changes in behaviour, emotion, and cognitive function (1). Data collection methods mainly include interviews, behavioural observations, neuropsychological tests, questionnaires, and physiological measurements (2). However, these approaches often rely heavily on subjective assessments (3), making them susceptible to the influence of the clinician’s experience and observation perspective, which can result in inadequate reliability and consistency of results. Additionally, these methods often provide only limited static information, making it challenging to fully comprehend the dynamic changes and development trends of these diseases.

With the ongoing development of research methods, clinical analysis has gradually transitioned from focusing on apparent symptoms to exploring deeper neurobiological foundations. Among these methods, brain network analysis stands out as a powerful tool, offering avenues for comprehending mental disorders (4). The resting-state brain network represents the brain activity recorded when a subject is not engaged in specific cognitive tasks. It is typically measured using neuroimaging techniques such as resting-state functional magnetic resonance imaging (rs-fMRI) (5). Unlike brain activity during specific tasks, the resting-state brain network reflects intrinsic functional connections and interactions without explicit external stimuli. Research on the resting-state brain network facilitates comprehension of the coordination among diverse brain regions and reveals the neural mechanisms involved in the development of mental diseases.

Traditional statistical analysis methods have played a crucial role in resting-state brain network research, especially in understanding differences in brain network structure and disease states. These methods typically include statistical hypothesis tests, such as *t*-tests, analysis of variance (ANOVA), correlation analysis, regression analysis and non-parametric tests, which aimed at discerning distinctions between patient groups and healthy controls (6,7). Their advantage lies in their intuitive and interpretable nature, which facilitates the identification of significant alterations in specific brain regions or connections. However, traditional statistical analysis methods are limited in their capacity to model individual differences, thereby hindering the in-depth exploration of brain networks. Furthermore, these methods often overlook the dynamic and intricate nature of brain networks, making them less suitable for handling high-dimensional and nonlinear relationships.

In contrast, machine learning methods offer the capability to process high-dimensional data and mine potential patterns, including nonlinear relationships, dynamic changes, and multimodal information (8-10). Concurrently, these methods can also model individual variability. For instance, in Alzheimer’s disease diagnosis, Shahparian *et al.* (11) proposed a method based on latent low-rank features and a support vector machine (SVM). This method enables efficient diagnosis of healthy categories, mild stages of disease, or Alzheimer’s disease stages by calculating time series of anatomical regions, extracting features using latent low-rank representations, and applying SVM classifiers. Similarly, Lama and Kwon (12) used Node2vec graph embedding to convert graph features into feature vectors, and applied a combination of regularized extreme learning machines (RELM) and linear support vector machine (LSVM) to achieve effective Alzheimer’s disease detection.

In recent years, deep learning methods, a rapidly developing branch of machine learning, have aroused widespread interest in neuroscience. These methods, characterized by multi-layered neural network structures, simulate the human brain’s information processing mechanisms, offering new perspectives and approaches to brain network research. Their successes in image, speech and natural language processing (NLP) (13,14) have inspired the interest in applying deep learning to neuroscience. By using deep learning models to brain imaging data, researchers can more accurately predict neural activity, cognitive task, and even the onset of mental disorders. For instance, Huang *et al.* (15) proposed a functional brain network analysis method based on a static-dynamic convolutional neural network (CNN). Haweel *et al.* (16) presented an early autism diagnosis method based on discrete wavelet transform (DWT) and CNN.

However, deep learning methods often overlook the local structure of nodes, which is crucial for understanding the functions and interactions within brain networks. Graph convolutional networks (GCNs) are a type of deep learning model that processes graph-structured data (17). Compared with traditional deep learning methods, GCN can naturally handle graph-structured data, adapt to irregularities, capture node structures, consider dynamics, and handle heterogeneous nodes and edges. For instance, in a schizophrenia recognition study conducted by Yin *et al.* (18), they found that compared to CNN, GCN could better identify schizophrenia patients from healthy controls. Likewise, Liu *et al.* (19) demonstrated that hierarchical GCN constructed using multi-scale atlases were more effective in diagnosing brain disorder using graph-structured data. However, in constructing brain functional networks, current approaches often convert continuous functional connectivity strength into binary relationships (connected or disconnected) (20), simplifying the network structure and reducing computational and storage demands. Nevertheless, this approach may overlook crucial information, particularly in cases with rich connectivity strength gradients. Additionally, threshold selection is subjective, and different researchers or studies may choose different thresholds, leading to result inconsistencies.

To address these problems, the paper proposes a brain structure learning-guided multi-view graph representation learning method to enhance the flexibility of current brain network analysis and improve the diagnosis accuracy (ACC) of mental disorders. Specifically, to address the inflexibility of simple thresholds in brain network binarization, multiple thresholds are used to generate brain networks at various sparsity levels. This approach enhances modeling capability for brain structural diversity by integrating information from different network sparsities. Considering noise edges and data inconsistency in brain networks, the study introduces graph pooling to optimize the brain network representation, providing more reliable input for GCNs. Subsequently, the study designs a multi-view GCN to capture the complexity and variability of brain structure. Finally, an attention mechanism is used to enhance or weaken the contributions of different views, improving the integration of information and the model’s performance in multi-view learning. It is noteworthy that this study uses the Smith atlas (21) instead of the Automated Anatomical Labeling (AAL) atlas. The Smith atlas is derived from rs-fMRI data via independent component analysis (ICA). Compared with the traditional AAL atlas, the Smith atlas can offer a superior characterization of resting-state brain networks. Meanwhile, considering the impact of the number of supernodes on graph pooling, an in-depth analysis is conducted in the results section. The main contributions of this study can be summarized as follows:

- We build a multi-view brain network for each subject using multiple thresholds, which can consider the diversity of brain structures and improve the modeling capability.
- We introduce a graph structure learning algorithm that adopts a supervised learning scheme and can adaptively build a clean coarsened-graph network. Compared with the original brain network, using the coarsened-graph network facilitates brain network representation learning and disease diagnosis.
- We propose an attention-based adaptive multi-view fusion method, which dynamically adjusts the contributions of different views through an attention mechanism. This enhances the model’s utilization of information from each view, further improving the classification performance of mental disorders.

Experiments conducted on the Autism Brain Imaging Data Exchange (ABIDE) (22) and the Mexican Cocaine Use Disorders (SUDMEX CONN) (23) datasets demonstrate the proposed framework’s applicability and superior performance in various scenarios.

The rest of the paper is organized as follows: in Section “Related work”, we present related work. Section “Methods” describes the detailed mathematical formulation and framework of the proposed method. In Section “Results”, the experimental design, evaluation metrics, and datasets are introduced, and the effectiveness and robustness of the proposed method are demonstrated through quantitative and qualitative analyses. Section “Discussion” discusses the proposed method and future work. Finally, Section “Conclusions” concludes our work.

### Related work

*Graph neural network (GNN)*

As an emerging deep learning model, GCN was first proposed by Kipf and Welling in 2016 (24). GCN draws inspiration from traditional CNNs and graph theory, aiming to address the limitations of CNNs in handling graph-structured data. Traditional deep learning models are typically designed for regular data structures, such as images and sequences, and perform poorly when applied to irregular graph-structured data (25). GCNs were proposed to adaptively handle irregular graph structures so that the neural network can better capture the topological relationships and characteristics between nodes. This capability is crucial in many practical applications, such as social networks (26), where nodes represent users and edges define relationships between users. Through GCNs, the associations between users in social networks can be more effectively analyzed to infer potential social circles and user interests. As research on GCNs progressed, several improved models, including inductive representation learning on large graphs (GraphSAGE) (27) and Graph Attention Network (GAT) (28), were introduced, enriching the GCN family. These models enhance the modeling capabilities for graph-structured data by introducing attention mechanisms, aggregation strategies and more, making GCNs more flexible and practical across diverse fields.

Due to their superior performance in handling graph-structured data and capturing complex relationships, GCNs are becoming increasingly popular in mental disorder research (29,30). The human brain functional network is a large and complex graph, where brain regions can be represented as nodes and their connections as edges. Traditional analytical methods often fail to capture the intricate interactions and information transfer within the brain, focusing instead on specific regions and overlooking the highly interconnected nature of the entire brain network. GCNs, through multi-layer graph convolution operation, can update each node’s representation and integrate local and global information. This allows for a more comprehensive understanding of brain region relationships, which is crucial for revealing the global features of mental disorders and understanding how the brain works as a whole. Furthermore, in the study of mental disorders, the advantage of GNNs lies not only in their ability to fuse global information, but also in their powerful capability for individualized modeling. Due to individual variations, this has significant implications for the etiology and treatment of mental disorders. GCNs can more accurately analyze individual variations by modeling the brain functional network of each person (31).

*Multi-view brain network analysis*

In neuroscience and medical imaging, brain network analysis is gaining increasing attention. Traditional methods often fail to capture the full diversity of the brain’s complex networks (32). To overcome these limitations, researchers are increasingly turning to multi-view approaches to gain a more comprehensive and accurate understanding of the nervous system’s complex structure and function. Current multi-view brain network analysis methods primarily include two categories: multi-modal network research based on different imaging methods (31) and multi-view network research based on single imaging to construct different brain networks (32,33). Multimodal network research aims to integrate brain image data from various imaging techniques, such as structural magnetic resonance imaging (sMRI), functional magnetic resonance imaging (fMRI), electroencephalography (EEG) and positron emission tomography (PET), to obtain a more comprehensive representation of brain networks. For instance, Zhou *et al.* (34) utilized multimodal data from sMRI, fMRI, and PET to propose a sparse interpretable GCN for identifying and classifying Alzheimer’s disease. By integrating multimodal data, this method can more accurately identify brain network characteristics related to Alzheimer’s disease, offering a new perspective for diagnosis and understanding of the disease.

The second type, multi-view network research using single imaging to construct different brain networks, focuses on using different brain atlases or constructing brain networks with varying sparsities to capture the brain network’s complex characteristics and improve diagnostic ACC (33,35). This approach better captures the full picture of the data by integrating information from multiple views, enhancing the model’s generalization ability and robustness, and enabling more reliable diagnosis and prediction with new data. For instance, Zhang *et al.* (33) constructed different brain networks based on multiple brain atlases and used multi-task learning algorithms for joint feature selection, thereby improving the diagnosis of mild cognitive impairment (MCI). This study aims to enhance the performance of the second type of multi-view network diagnosis using single imaging to construct different brain networks.

*Attention mechanism*

Inspired by neuroscience, the attention mechanism was initially applied in NLP to simulate the human visual and cognitive systems’ selective attention mechanism (36). In deep learning, the attention mechanism began with the Neural Turing Machine in 2014 (37) and has since been developed in numerous subsequent works. In 2015, Bahdanau *et al.* (38) introduced the attention mechanism to neural machine translation, enabling the model to assign different weights to information at different input sequence positions, thereby improving translation effectiveness. Since then, the attention mechanism has been extensively used in computer vision and speech processing. In particular, the emergence of the self-attention mechanism and the multi-head attention mechanism (39) has further expanded their applications. The self-attention mechanism allows the model to allocate weights based on the internal data relationship when processing sequences, while the multi-head attention mechanism enhances the model’s ability to handle multi-level information by operating multiple attention heads in parallel. Due to its excellent characteristics, the attention mechanism has also been widely used in GCNs. At the node level, attention mechanisms calculate the correlation and weights between nodes, allowing the model to focus on neighbors relevant to the current node for better adaptation to local information. At the edge level, attention mechanisms enhance the understanding of graph topology by focusing on the weights of different edges, especially when dealing with heterogeneous graphs or multiple relationships within a graph. Additionally, the graph-level attention mechanism enables the model to comprehend global information by considering the overall structure and characteristics of the entire graph.

## Methods

The framework of the proposed method is shown in *Figure 1*. We first construct a multi-view resting-state brain network. Then, we apply graph pooling for graph structure learning to eliminate noise edges as well as data inconsistencies, and design a multi-view GCN to extract rich information. Furthermore, we apply an attention-based adaptive module for the view fusion process. Since views are correlated, it is reasonable to assume that different views share a common representation. Therefore, the view attention mechanism is shared among the views. Finally, we introduce the learning objectives.

**Figure 1**The framework of our proposed method. rs-fMRI, resting-state functional magnetic resonance imaging; FC, functional connectivity; KNN, K-nearest neighbor; avg, average.

### Multi-view graph construction

For the rs-fMRI time-series $T={\left\{{t}_{1}^{\left(n\right)},{t}_{2}^{\left(n\right)},\cdots {t}_{I}^{\left(n\right)}\right\}}_{n=1}^{N}$, ${t}_{i}^{\left(n\right)}\in {R}^{L}$ represents the blood oxygen level-dependent (BOLD) signal of the *i*-th region of interest (ROI) for the *n*-th subject, *N* denotes the total number of subjects, and *I* represents the number of brain regions/ROIs. This study employs the atlas derived from rs-fMRI using ICA by Smith *et al.* (21) to extract the ROIs, resulting in *I*=70. The initial step involves calculating the functional connectivity of all ROIs using Pearson correlation (PC). Let ${B}^{\left(n\right)}=\left({b}_{ij}^{\left(n\right)}\right)\in {R}^{I\times I}$ be the functional connectivity matrix for the *x* subject, then the element ${b}_{ij}^{\left(n\right)}\in \left[-1,1\right]$ in the matrix ${B}^{\left(n\right)}$ represents the PC coefficient between the $i\text{-th}$ ROI and the $j\text{-th}$ ROI, defined as follows:

$${b}_{ij}^{\left(n\right)}=\frac{{\left({t}_{i}^{\left(n\right)}-{\overline{t}}_{i}^{\left(n\right)}\right)}^{T}\left({t}_{j}^{\left(n\right)}-{\overline{t}}_{j}^{\left(n\right)}\right)}{\sqrt{{\left({t}_{i}^{\left(n\right)}-{\overline{t}}_{i}^{\left(n\right)}\right)}^{T}\left({t}_{i}^{\left(n\right)}-{\overline{t}}_{i}^{\left(n\right)}\right)}\sqrt{{\left({t}_{j}^{\left(n\right)}-{\overline{t}}_{j}^{\left(n\right)}\right)}^{T}\left({t}_{j}^{\left(n\right)}-{\overline{t}}_{j}^{\left(n\right)}\right)}}$$

where ${\overline{t}}_{i}^{\left(n\right)}$ and ${\overline{t}}_{j}^{\left(n\right)}$ represent the mean vector corresponding to ${t}_{i}^{\left(n\right)}\in {R}^{L}$ and ${t}_{j}^{\left(n\right)}\in {R}^{L}$, respectively. Considering brain regions/ROIs as nodes $V=\left\{{v}_{1},\cdots ,{v}_{I}\right\}$, and functional connectivity ${b}_{ij}^{\left(n\right)}$ between the paired nodes ${v}_{i}$ and ${v}_{j}$ as edge ${e}_{ij}$, we can take each brain functional connectivity matrix as a graph $G=\left\{V,E\right\}$.

Due to the density caused by PC, there is a lot of noisy/redundant information in the graph. In this study, we use K-nearest neighbor (KNN) to sparsify the constructed graph (40). Specifically, we prioritize the edges according to the functional connectivity strength of each node and retain only the top *V* important edges. Therefore, the topology of graph *G* can be described by an adjacency matrix $A=\left({a}_{ij}\right)\in {R}^{I\times I}$, where ${a}_{ij}=1$ if there is a connection between the $i\text{-th}$ ROI and the $j\text{-th}$ ROI, otherwise ${a}_{ij}=0$. However, different *V* values determine different levels of graph topology. To avoid the information loss and bias caused by a single threshold, this study constructs a multi-view brain network with different dense connections ($V=\left\{7,\text{\hspace{0.17em}}14,\text{\hspace{0.17em}}21\right\}$). In general, smaller *V* values retain fewer connections, resulting in sparser graphs, while larger *V* values yield denser graphs.

### Graph structure learning

The constructed brain network still suffers from noise edges and data inconsistencies (41). Graph pooling, a technique in GCN graph representation learning, is used to coarsen original graphs with high-level noisy edges. In this study, we utilize graph pooling (42) as an initial step to acquire a clean coarsened-graph structure that provides reliable input for subsequent GCN analysis, as shown in *Figure 2*. Let *A* be the adjacency matrix, and *X* be the feature matrix. To merge the nodes and edges in the graph while retaining its structure and characteristics, we adopt a soft cluster assignment matrix *S* and an embedding matrix *E* to generate a new coarsened adjacency matrix *A'*, and the feature matrix *X'* of the nodes/clusters in the coarsened graph. Mathematically, this process is expressed as:

$$X\prime ={S}^{T}E\in {R}^{C\times F}$$

$$A\prime ={S}^{T}AS\in {R}^{C\times F}$$

where *C* represents the number of supernodes (or clusters), and *F* is the dimension of the coarsened graph feature matrix. The matrices *S* and *E* are generated through two independent GNN modules. The embedding matrix *E* is generated by the GNN module $GN{N}_{l,embed}$ with *A* and *X* as inputs. Simultaneously, the soft cluster assignment matrix *S* is computed via the SoftMax operation applied to the outputs of the GNN module $GN{N}_{l,pool}$ utilizing *A* and *X* as inputs. This is expressed as follows:

$$E=GN{N}_{l,embed}\left(A,X\right)$$

$$S=softmax\left(GN{N}_{l,pool}\left(A,X\right)\right)$$

### Multi-view GCN

After graph structure learning, we can obtain three coarsened graphs $\left\{{G}_{1}\prime ,{G}_{2}\prime ,{G}_{3}\prime \right\}$, where ${A}_{k}{}^{\prime}$ and ${X}_{k}{}^{\prime}$ represent the adjacency matrix and feature matrix of $k\text{-th}$ coarsened graph, respectively. We adopt GCN to aggregate the graph structure ${A}_{k}{}^{\prime}$ and node features ${X}_{k}{}^{\prime}$ to obtain the latent representation ${Z}_{k}$. GCN is a graph-based deep learning model, which is a first-order approximation of graph convolutions in the spectral domain. The objective of GCN is to learn node representations so that adjacent nodes in the graph have similar representations. Multi-layer GCN learns layer-by-layer transformation by stacking multiple spectral graph convolutional layers. The spectral graph convolutional of each layer is represented by the nodes of the previous layer and information is transferred through the structure of the graph. This layer-by-layer transformation enables the model to gradually focus on more abstract and global graph structure features, thereby improving the modeling ability of complex relationships.

The propagation of GCN can be expressed as:

$${Z}^{\left(l+1\right)}=\sigma \left({\tilde{D}}^{-\frac{1}{2}}\tilde{A}{\tilde{D}}^{-\frac{1}{2}}{Z}^{\left(l\right)}{W}^{\left(l\right)}\right)$$

where ${W}^{\left(l\right)}$ is the weight matrix, $\sigma (\cdot )$ represents the activation function, $\tilde{A}=A+{I}_{N}$ is the adjacency matrix with self-connection added, $\tilde{D}$ is a node degree matrix with diagonals *${\tilde{d}}_{ii}={\displaystyle \sum _{j}\left({\tilde{a}}_{ij}\right)}$*, ${Z}^{\left(l\right)}$ is the node feature matrix, and ${Z}^{\left(0\right)}=X$.

In the study, we stack two graph convolutional layers as employed in (43). Hence, the formula can be represented as:

$${Z}_{k}=GCN\left({A}_{k}\prime ,{X}_{k}\prime \right)={\sigma}_{1}\left(\overline{A}{\sigma}_{0}\left(\overline{A}{X}_{k}{W}^{\left(0\right)}\right){W}^{\left(1\right)}\right)$$

where $\overline{A}={\tilde{D}}^{-\frac{1}{2}}\tilde{A}{\tilde{D}}^{-\frac{1}{2}}$, ${\sigma}_{0}$ and ${\sigma}_{1}$ are ReLU activation functions.

### Attention-based adaptive view fusion

Fusing view-specific representations is a crucial step for achieving multi-view collaboration. Traditional methods usually combine the representations by simply concatenating or adding them, which may not capture complex relationships between views. To address this, we introduce an attention mechanism to assign different weights to different views, making them focused on information that plays a key role in collaboration. For each representation ${Z}_{k}$, we associate a query matrix ${Q}_{k}\in {R}^{C\times P}$ and a key matrix ${K}_{k}\in {R}^{C\times P}$ with it as follows:

$${Q}_{k}={Z}_{k}\cdot {W}_{Q}$$

$${K}_{k}={Z}_{k}\cdot {W}_{K}$$

Due to the observations from multiple views being varied but highly related, we make ${W}_{Q}$ and ${W}_{K}$ shared by all views. To obtain the important information across different views, we compute the average of all query matrices ${Q}_{k}$ and concatenate the key matrices ${K}_{k}$:

$$Q=avg\left[{Q}_{1},{Q}_{2},{Q}_{3}\right]$$

$$K=\left\{{K}_{1},{K}_{2},{K}_{3}\right\}$$

Then, the propagated information among all views is as follows:

$${\left\{{\widehat{Z}}_{k}\right\}}_{k=1}^{3}={\left\{\frac{softmax\left(Q\cdot {K}_{k}^{T}\right)\cdot {Z}_{k}}{\sqrt{p}}\right\}}_{k=1}^{3}$$

To fuse the propagated information, we construct a fusion layer that learns adaptive weights:

$$\widehat{Z}={\displaystyle \sum _{k=1}^{3}{w}_{k}{Z}_{k}}$$

$${w}_{k}=\sigma \left({\widehat{Z}}_{k}{W}_{f}+{b}_{f}\right)$$

where ${w}_{k}$ is the weight of $k\text{-th}$ view, learned by a single-layer multilayer perceptron (MLP) with the $k\text{-th}$ embeddings as input. *Figure 3* illustrates our attention-based adaptive view fusion on three views.

### Learning objective

The objectives of the model mainly include graph structure learning and graph classification.

*Graph structure learning*

The aim of graph structure learning is to learn the graph structure to remove the noisy connections in the brain network. To achieve this goal, two losses, auxiliary link prediction loss and entropy loss, are employed:

$${L}_{L{P}_{k}}={\Vert {A}_{k},{S}_{k}\cdot {S}_{k}{}^{T}\Vert}_{F}\text{\hspace{1em}}{L}_{E}{}_{{}_{k}}=\frac{1}{C}{\displaystyle \sum _{c=1}^{C}H}\left({S}_{{k}_{c}}\right)$$

$${L}_{GSL}={\displaystyle \sum _{k=1}^{3}\left({L}_{LP}{}_{{}_{k}}+{L}_{E}{}_{{}_{k}}\right)}$$

where ${\Vert \cdot \Vert}_{F}$ denotes the Frobenius norm, $H$ denotes the entropy function, and ${S}_{{k}_{C}}$ is the $c\text{-th}$ row of ${S}_{k}$.

*Graph classification*

For graph classification, we use cross-entropy as the loss, thus the loss can be formulated as:

$${L}_{GC}={\displaystyle \sum _{i=1}^{N}\left[{y}_{i}\text{log}{p}_{i}+\left(1-{y}_{i}\right)\text{log}\left(1-{p}_{i}\right)\right]}$$

where ${y}_{i}$ is the truth value of $i\text{-th}$ subject, and ${p}_{i}$ is the predicted SoftMax probability for the $i\text{-th}$ subject. In this way, the final learning objective is:

$$L=\alpha {L}_{GSL}+\left(1-\alpha \right){L}_{GC}$$

where *α* is a hyper-parameter that balances the importance of the two losses.

## Results

### Datasets

We evaluate our proposed framework on two mental disorder datasets: the ABIDE dataset (22) and the SUDMEX CONN dataset (23). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

*ABIDE*

ABIDE contains data collected from multiple sites, each with differing data collection protocols and participant demographics. This heterogeneity can impact the performance of the algorithm and thus affect the accurate verification of its effectiveness (44). To mitigate these external factors, we chose the site with the largest number of samples (NYU site) for our study. This is intended to enhance the internal consistency of the dataset and minimize inter-site differences, thereby allowing a more accurate evaluation of the performance of our algorithm. The Configurable Pipeline for the Analysis of Connectomes (CPAC) (45) is used for image preprocessing. We obtained 172 high-quality rs-fMRI images, including 74 patients with autism spectrum disorder (ASD) and 98 normal controls.

*SUDMEX CONN*

SUDMEX_CONN is a cocaine use disorder dataset collected at the National Psychiatric Institute in Mexico City by Dr. Eduardo A. Garza-Villarreal and Jorge J. Gonzalez Olvera (23). The dataset contains 72 patients with cocaine use disorder and 63 normal controls. We used fMRIPrep (46) to process rs-fMRI. fMRIPrep consists of standard preprocessing steps, including head motion correction, spatial transformation, removal of non-neural signal components (such as respiratory, cardiac and motion-related signals), and intensity normalization of MRI signals, ensuring that researchers obtain consistent results. By examining the results of fMRIPrep preprocessing, we found that a large number of abnormal images in the rs-fMRI of a patient with cocaine use disorder, perhaps due to the patient’s abnormal head movement during data collection. We excluded this abnormal rs-fMRI, and obtained 71 cocaine use disorder data and 63 normal controls data.

### Experimental setup

All experiments in this section are conducted on a Linux server running the Ubuntu operating system. The server has two 12th Gen Intel(R) Core (TM) i9-12900K processors, each with a clock frequency of 3.20 GHz, and a total memory capacity of 64 GB RAM. Additionally, the server is configured with two NVIDIA GeForce RTX 3,090 GPUs, each with 24 GB of RAM. In terms of model training, we selected the Adam optimizer with a learning rate set to 0.001 to ensure rapid convergence during the training process. To fully consider the diversity of data, 10-fold cross-validation is adopted to evaluate the model. The number of supernodes is a key parameter for specifying node clustering in graph pooling, which largely determines the scale of graph coarsening. Supernodes help simplify the graph structure and mitigate noisy edges and data inconsistencies by reducing the number of nodes in the original graph. The number of supernodes is empirically set to half the number of input nodes. The impact of the number of supernodes on the model will be discussed later. To evaluate the performance of the method, we use ACC as the main measurement (gold standard) and combine it with three other widely used indicators for analysis. These indicators include: sensitivity (SEN), specificity (SPE), and area under the receiver operating characteristic curve (AUC). By combining these indicators, we can more comprehensively evaluate all aspects of the method’s performance, and ensure its reliability and effectiveness in different application scenarios.

### The impact of the brain atlas

Unlike most existing works that use the AAL atlas provided by the Montreal Neurological Institute (MNI) (47), the study uses the Smith atlas extracted from rs-fMRI images by Smith *et al.* (21) using ICA to construct the brain network. To verify the advantages of the Smith atlas, we compare the performance of our method on two atlases. The results of two atlases are shown in *Figure 4*. It can be found that Smith Atlas can achieve better performance on two datasets. Specifically, except for SPE on ABIDE, the performance of Smith atlas is better than the AAL atlas on other metrics. For example, compared with the AAL atlas, the Smith atlas achieves an average ACC/AUC of 74.48%/72.34% on ABIDE, and an average ACC/AUC of 76.15%/68.6% on SUDMEX CONN, resulting in a 1.73%/8.34% increase in ACC/AUC on ABIDE, and 4.56%/5.14% on SUDMEX CONN. It reveals that the brain atlas extracted from rs-fMRI is more conducive for graph representation learning.

**Figure 4**The effect of brain atlas on graph classification performance. ABIDE, Autism Brain Imaging Data Exchange; SUDMEX CONN, Mexican Cocaine Use Disorders; AUC, area under the receiver operating characteristic curve; SPE, specificity; SEN, sensitivity; ACC, accuracy; AAL, Automated Anatomical Labeling.

### The impact of the supernode number

In graph structure learning, the number of supernodes for graph pooling is an important hyperparameter. To show the impact of different numbers of supernodes on performance, we employ different percentages of supernodes, ranging from 10% to 90%, to test our proposed method. The ACC and AUC are shown in *Figure 5*. It can be found that the number of supernodes has little impact on ACC, but the AUC value increases with the number of supernodes until it reaches half of the input nodes and then shows a downward trend. This suggests that a moderate number of supernodes helps to better capture key graph structural features. However, when the number of supernodes is too large, noise nodes and edges will inevitably be introduced, thus reducing the performance of the model. Therefore, there are trade-offs to consider when choosing the number of supernodes. Too few supernodes may fail to capture important features of the graph structure, while too many supernodes may introduce unnecessary noise and affect the generalization ability of the model.

**Figure 5**The impact of different numbers of supernodes on graph classification performance. ABIDE, Autism Brain Imaging Data Exchange; SUDMEX CONN, Mexican Cocaine Use Disorders; ACC, accuracy; AUC, area under the receiver operating characteristic curve.

### Comparison with the prior works on brain network analysis

To evaluate the effectiveness of the proposed method, we perform a comparative analysis with current state-of-the-art approaches, including SVM (48), long short-term memory (LSTM) (49), GCN (50), dynamic graph convolutional neural network (EDGE-CONV) (51), hierarchical graph representation learning with differentiable pooling (DiffPool-GCN) (52) and GraphSAGE (53), on the ABIDE and SUDMEX CONN datasets. To ensure the comparability of results, we adopted the same gold standard and constructed the brain network based on the Smith atlas. The comparison results in terms of ACC, SEN, SPE and AUC are shown in *Table 1*. We can see that our method achieves better or comparable performance on both tasks compared with previous brain network classification approaches. The ACC, SEN, SPE and AUC of our method are 74.48%, 57.68%, 88.00%, 72.34% respectively on ABIDE and 76.15%, 86.07%, 64.76%, 68.60% respectively on SUDMEX CONN. Specifically, our method achieves significant improvements compared to the traditional SVM, demonstrating the effectiveness of deep learning methods. Compared with the basic LSTM, GCN and EDGE-CONV, on the ABIDE dataset, our method achieves comparable performance in SEN and SPE, but the ACC/AUC is significantly improved by about 6.90%/11.67%, 6.47%/13.22% and 3.47%/6.72%. On the SUDMEX CONN dataset, the performance of SPE is comparable, but the ACC/SEN/AUC are increased by approximately 4.45%/8.75%/3.88%, 7.41%/2.50%/8.60% and 6.75%/4.46%/8.87%. Remarkable results demonstrate the advantages of multi-view brain networks. DiffPool-GCN and Graph-SAGE are two graph models that obtain the new nodes/graphs representation through graph pooling or clustering. Compared with DiffPool-GCN and Graph-SAGE, our method also shows better performance on most indicators for these two tasks. The main reason is that the noise connections in the initially constructed brain network are not considered, which poses a challenge to learning a good representation. Tables S1,S2 also show the impact of the AAL atlas on the performance of existing methods. The experimental results show that our method has achieved significant improvements in most indicators, which is consistent with the above research results. This further verifies the effectiveness and superiority of our method. Besides, to enhance the transparency of the model training process, we show one round of training loss in Figure S1. It can be seen from the figure that as the iteration epoch increases, the loss gradually decreases. This shows that the model is constantly learning and gradually improving its predictive capabilities. At the same time, the figure also shows the ACC of the model. It can be observed that during the training process, the ACC improves significantly, indicating that the performance of the model is continuously optimized. After reaching a certain number of epochs, the training loss tends to be stable, accompanied by certain fluctuations.

**Table 1**

Method | ABIDE | SUDMEX CONN | |||||||
---|---|---|---|---|---|---|---|---|---|

ACC (%) | SEN (%) | SPE (%) | AUC (%) | ACC (%) | SEN (%) | SPE (%) | AUC (%) | ||

SVM | 63.82 | 56.61 | 69.44 | 66.40 | 65.49 | 71.61 | 58.57 | 64.24 | |

LSTM | 67.58 | 37.32 | 90.67 | 60.67 | 71.70 | 77.32 | 65.71 | 64.72 | |

GCN | 68.01 | 36.96 | 92.00 | 59.12 | 68.74 | 83.57 | 51.90 | 60.00 | |

EDGE-CONV | 71.01 | 59.82 | 79.33 | 65.62 | 69.40 | 81.61 | 54.52 | 59.73 | |

DiffPool-GCN | 71.01 | 61.25 | 78.67 | 62.57 | 73.96 | 87.14 | 58.10 | 66.39 | |

GraphSAGE | 67.48 | 37.32 | 90.89 | 71.26 | 68.02 | 87.14 | 45.71 | 62.42 | |

Our method | 74.48 | 57.68 | 88.00 | 72.34 | 76.15 | 86.07 | 64.76 | 68.60 |

ABIDE, Autism Brain Imaging Data Exchange; ACC, accuracy; SEN, sensitivity; SPE, specificity; AUC, area under the receiver operating characteristic curve; SUDMEX CONN, Mexican Cocaine Use Disorders; SVM, support vector machine; LSTM, long short-term memory; GCN, graph convolutional network; EDGE-CONV, dynamic graph convolutional neural network; DiffPool-GCN, hierarchical graph representation learning with differentiable pooling; GraphSAGE, inductive representation learning on large graphs.

### Ablation study

In this section, we employ an ablation study to demonstrate the effectiveness of our framework design. Specifically, we compare the proposed method with a basic or a combination of multiple intermediate component methods. *Table 2* lists the experimental results and resource consumption [params, floating-point operations per second (FLOPs)] of different methods. We can find that the ACC/AUC of GSL-GCN is better than GCN by 2.39%/6.15% and 3.79%/6.21% on ABIDE and SUDMEX CONN datasets, respectively, which proves our hypothesis that the original complex graph structure hinders GCN’s graph embedding learning, through graph structure learning, can construct a common and clean brain network to improve the classification performance of the model. Compared with GCN, GCN-mv achieves an average ACC/AUC of 71.01%/69.36% on ABIDE, and an average ACC/AUC of 70.93%/67.54% on SUDMEX CONN, resulting in a 4.77% and 11.28% increase in ACC and AUC on ABIDE, and 4.50% and 7.83% on SUDMEX CONN. The observation implies the effectiveness of the multi-view brain network embedding learning scheme. With the attention-based adaptive view fusion, GCN-mv-vf outperforms GCN-mv by 2.32%/2.42% and 3.8%/1.94% regarding ACC/AUC on ABIDE and SUDMEX CONN, respectively. The improvement benefits from the attention-based adaptive view fusion module that can effectively capture inherent correlations of different views. Furthermore, by integrating all modules (i.e., multi-view brain network, graph structure learning and attention-based adaptive view fusion) into GCN, we can see that the ensemble method yields better results on most metrics for both tasks. In terms of resource consumption, with the integration of modules, both the number of params and FLOPs of the model show significant growth. Compared with the basic GCN, the number of params and FLOPs of the final integrated GSL-GCN-mv-vf model increased by nearly 7 times. Nonetheless, the final number of params and FLOPs are still considered lightweight and within acceptable limits in current deep learning methods. In addition, to further demonstrate the effectiveness of the final integrated GSL-GCN-mv-vf model, we used T-distributed Stochastic Neighbor Embedding (t-SNE) to visualize the features learned in the final hidden layer, as shown in *Figure 6*. By observing *Figure 6*, we can find that in the raw data, different types of data are mixed. After feature learning, different types of data show clear aggregation, indicating that our method effectively learns the inherent patterns of the data.

**Table 2**

Dataset | Method | GCN | GSL | Mv. | Vf. | ACC (%) | SEN (%) | SPE (%) | AUC (%) | Params | FLOPs |
---|---|---|---|---|---|---|---|---|---|---|---|

ABIDE | GCN | √ | – | – | – | 66.24 | 33.21 | 90.67 | 58.08 | 12.99K | 9.09K |

GSL-GCN | √ | √ | – | – | 68.63 | 55.71 | 78.00 | 64.23 | 36.18K | 41.29K | |

GSL-GCN-mv | √ | √ | √ | – | 72.75 | 61.61 | 81.56 | 63.01 | 69.34K | 50.25K | |

GCN-mv | √ | – | √ | – | 71.01 | 44.64 | 92.00 | 69.36 | 38.72K | 27.01K | |

GCN-mv-vf | √ | – | √ | √ | 73.33 | 64.82 | 80.44 | 71.78 | 64.65K | 45.35K | |

GSL-GCN-mv-vf | √ | √ | √ | √ | 74.48 | 57.68 | 88.00 | 72.34 | 88.54K | 58.19K | |

SUDMEX CONN | GCN | √ | – | – | – | 66.43 | 91.43 | 37.14 | 59.71 | 12.99K | 9.09K |

GSL-GCN | √ | √ | – | – | 70.22 | 66.25 | 75.00 | 65.92 | 36.18K | 41.29K | |

GSL-GCN-mv | √ | √ | √ | – | 73.90 | 85.71 | 60.00 | 66.05 | 69.34K | 50.25K | |

GCN-mv | √ | – | √ | – | 70.93 | 90.18 | 49.05 | 67.54 | 38.72K | 27.01K | |

GCN-mv-vf | √ | – | √ | √ | 74.73 | 84.64 | 62.86 | 69.48 | 64.65K | 45.35K | |

GSL-GCN-mv-vf | √ | √ | √ | √ | 76.15 | 86.07 | 64.76 | 68.60 | 88.54K | 58.19K |

GCN, normal graph convolutional network; GSL, graph structure learning; Mv., the use of multi-view graphs; Vf., the use of attention-based adaptive view fusion; ACC, accuracy; SEN, sensitivity; SPE, specificity; AUC, area under the receiver operating characteristic curve; Params, Parameters; FLOPs, floating-point operations per second; ABIDE, Autism Brain Imaging Data Exchange; GSL-GCN, graph structure learning-based graph convolutional network; GSL-GCN-mv, graph structure learning-based multi-view graph convolutional network; GCN-mv, multi-view graph convolutional network; GCN-mv-vf, multi-view graph convolutional network with attention-based adaptive view fusion; GSL-GCN-mv-vf, graph structure learning-based multi-view graph convolutional network with attention-based adaptive view fusion; SUDMEX CONN, Mexican Cocaine Use Disorders.

**Figure 6**t-SNE visualization of features learned in the last hidden layer before and after training. (A) t-SNE visualization on the ABIDE dataset. (B) t-SNE visualization on the SUDMEX CONN dataset. t-SNE, T-distributed Stochastic Neighbor Embedding; ABIDE, Autism Brain Imaging Data Exchange; SUDMEX CONN, Mexican Cocaine Use Disorders.

## Discussion

### Summary

This paper proposes a brain structure learning-guided multi-view graph representation learning method, aiming to enhance flexibility in brain network analysis and improve the ACC of mental disorder diagnosis. The main reason why this method is superior to existing methods is that it comprehensively considers multi-view information and effectively captures the complexity and variability of brain networks through graph structure learning and attention mechanisms. Specifically, the multi-view graph representation learning method improves the ability to model the diversity of brain structures by integrating brain network information at different sparsity levels. Graph structure learning further optimizes the representation of brain networks, using graph pooling technology to remove noise edges and data inconsistencies, thereby providing more reliable input to subsequent GCNs. This method can more accurately reflect the actual structure of brain networks and improve the ACC of mental disorder diagnosis. Compared with the traditional AAL atlas, the Smith atlas based on rs-fMRI data better characterizes the resting-state brain network and provides a more reliable input for graph representation learning. Experimental results show that the performance of the model using the Smith atlas on the ABIDE and SUDMEX CONN datasets is significantly better than that of the model using the AAL atlas, which is specifically reflected in the significant improvement in ACC and AUC. This shows that the Smith atlas extracted based on ICA can more effectively capture the resting state characteristics of the brain network and provide more representative data for mental disorder diagnosis. In addition, the reasonable selection of the number of supernodes also optimizes the model performance to a certain extent and effectively avoids the introduction of noise nodes and edges. Experiments show that a moderate number of supernodes can better capture key graph structural features, thereby improving the AUC of the model, while too many or too few supernodes may lead to a decline in model performance. By rationally selecting the number of supernodes, this method can reduce the computational complexity of the model while maintaining a high classification ACC, further improving the practicality and robustness of the model.

### Future work

Despite achieving certain results, there are still some areas worth exploring. First, although this study successfully used the Smith atlas to extract information from rs-fMRI, there are other atlases (54-56). Comparative studies of these atlases can provide a comprehensive understanding of their impact on model performance, thereby providing more options and implications for brain network analysis. Secondly, this study proposes a supervised learning scheme to adaptively construct coarsened graph networks. However, how to use more complex graph structure learning models or combine other advanced deep learning techniques to improve the modeling capabilities and robustness of coarsened graph networks is a problem that needs further research. With the continuous development of deep learning, we can introduce more model structures (57,58) into the construction process of coarsened graph networks to obtain better modeling effects. Furthermore, this study also proposes an adaptive multi-view fusion method based on an attention mechanism, aiming to dynamically adjust the contributions of different views, thereby improving the model’s utilization efficiency of each view. However, there are still limitations in the design and application of attention mechanisms. Future research can explore how to better capture the correlation between different perspectives and how to effectively balance different perspectives, thereby further improving the performance and applicability of multi-view fusion methods.

## Conclusions

Although GCN has made significant progress in brain network analysis, providing opportunities for in-depth exploration and understanding of mental disorders, the construction of brain networks and the presence of noise edges and data inconsistencies pose challenges for GCN graph embedding learning. This paper proposes a brain structure learning-guided multi-view graph representation learning for brain network analysis. The core idea is to integrate graph structure learning and multi-view graph embedding learning to improve the classification performance of brain network data. Multi-view graph embedding learning enables us to acquire data from different perspectives and integrate them into a unified representation, thereby deepening our understanding of brain networks. Graph structure learning can capture important structural information in the brain network and provide more accurate feature representation for classification tasks. Extensive experiments on the public ABIDE and SUDMEX CONN datasets show that our method outperforms traditional methods and state-of-the-art techniques in brain network classification tasks. This indicates that the proposed brain structure learning-guided multi-view graph representation learning has potential applications in brain network analysis. In the future, we will continue to improve the method and explore its application in other neuroscience tasks.

## Acknowledgments

During the preparation of this work, the authors used ChatGPT in order to improve language and readability. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

*Funding:* The work was supported by

## Footnote

*Conflicts of Interest:* All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-578/coif). The authors have no conflicts of interest to declare.

*Ethical Statement:* The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

*Open Access Statement:* This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

## References

- Chavez-Baldini U, Nieman DH, Keestra A, Lok A, Mocking RJT, de Koning P, Krzhizhanovskaya VV, Bockting CLH, van Rooijen G, Smit DJA, Sutterland AL, Verweij KJH, van Wingen G, Wigman JTW, Vulink NC, Denys D. The relationship between cognitive functioning and psychopathology in patients with psychiatric disorders: a transdiagnostic network analysis. Psychol Med 2023;53:476-85. [PubMed]
- Halvorsen MB, Kildahl AN, Kaiser S, Axelsdottir B, Aman MG, Helverschou SB. Applicability and Psychometric Properties of General Mental Health Assessment Tools in Autistic People: A Systematic Review. J Autism Dev Disord 2024; Epub ahead of print. [Crossref] [PubMed]
- Bufano P, Laurino M, Said S, Tognetti A, Menicucci D. Digital Phenotyping for Monitoring Mental Disorders: Systematic Review. J Med Internet Res 2023;25:e46778. [Crossref] [PubMed]
- Fritze S, Brandt GA, Volkmer S, Daub J, Krayem M, Kukovic J, Schwarz E, Braun U, Northoff G, Wolf RC, Kubera KM, Meyer-Lindenberg A, Hirjak D. Deciphering the interplay between psychopathological symptoms, sensorimotor, cognitive and global functioning: a transdiagnostic network analysis. Eur Arch Psychiatry Clin Neurosci 2024; Epub ahead of print. [Crossref] [PubMed]
- Cattarinussi G, Bellani M, Maggioni E, Sambataro F, Brambilla P, Delvecchio G. Resting-state functional connectivity and spontaneous brain activity in early-onset bipolar disorder: A review of functional Magnetic Resonance Imaging studies. J Affect Disord 2022;311:463-71. [Crossref] [PubMed]
- Kajimura S, Margulies D, Smallwood J. Frequency-specific brain network architecture in resting-state fMRI. Sci Rep 2023;13:2964. [Crossref] [PubMed]
- Zhao L, Bo Q, Zhang Z, Li F, Zhou Y, Wang C. Disrupted default mode network connectivity in bipolar disorder: a resting-state fMRI study. BMC Psychiatry 2024;24:428. [Crossref] [PubMed]
- Ceccarelli F, Mahmoud M. Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition. Pattern Anal Applic 2022;25:493-504. [Crossref]
- Hajian G, Morin E, Etemad A. Multimodal estimation of endpoint force during quasi-dynamic and dynamic muscle contractions using deep learning. IEEE Transactions on Instrumentation and Measurement 2022;71:1-11. [Crossref]
- Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022;23:bbab569. [Crossref] [PubMed]
- Shahparian N, Yazdi M, Khosravi MR. Alzheimer disease diagnosis from fMRI images based on latent low rank features and support vector machine (SVM). Current Signal Transduction Therapy 2021;16:171-7. [Crossref]
- Lama RK, Kwon GR. Diagnosis of Alzheimer's Disease Using Brain Network. Front Neurosci 2021;15:605115. [Crossref] [PubMed]
- Dong S, Wang P, Abbas K. A survey on deep learning and its applications. Computer Science Review 2021;40:100379. [Crossref]
- Hosseini MP, Lu S, Kamaraj K, Slowikowski A, Venkatesh HC. Deep Learning Architectures. In: Pedrycz W, Chen SM, editors. Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, Springer, 2020;866:1-24.
- Huang J, Wang M, Ju H, Shi Z, Ding W, Zhang D SD-CNN. A static-dynamic convolutional neural network for functional brain networks. Med Image Anal 2023;83:102679. [Crossref] [PubMed]
- Haweel R, Shalaby A, Mahmoud A, Seada N, Ghoniemy S, Ghazal M, Casanova MF, Barnes GN, El-Baz A. A robust DWT-CNN-based CAD system for early diagnosis of autism using task-based fMRI. Med Phys 2021;48:2315-26. [Crossref] [PubMed]
- Zhang Q, Chang J, Meng G, Xu S, Xiang S, Pan C. Learning graph structure via graph convolutional networks. Pattern Recognition 2019;95:308-18. [Crossref]
- Yin G, Chang Y, Zhao Y, Liu C, Yin M, Fu Y, Shi D, Wang L, Jin L, Huang J, Li D, Niu Y, Wang B, Tan S. Automatic recognition of schizophrenia from brain-network features using graph convolutional neural network. Asian J Psychiatr 2023;87:103687. [Crossref] [PubMed]
- Liu M, Zhang H, Shi F, Shen D. Hierarchical Graph Convolutional Network Built by Multiscale Atlases for Brain Disorder Diagnosis Using Functional Connectivity. IEEE Trans Neural Netw Learn Syst 2023; Epub ahead of print. [Crossref] [PubMed]
- Nicolini C, Bifone A. Modular structure of brain functional networks: breaking the resolution limit by Surprise. Sci Rep 2016;6:19250. [Crossref] [PubMed]
- Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. Correspondence of the brain's functional architecture during activation and rest. Proc Natl Acad Sci U S A 2009;106:13040-5. [Crossref] [PubMed]
- Nielsen JA, Zielinski BA, Fletcher PT, Alexander AL, Lange N, Bigler ED, Lainhart JE, Anderson JS. Multisite functional connectivity MRI classification of autism: ABIDE results. Front Hum Neurosci 2013;7:599. [Crossref] [PubMed]
- Angeles-Valdez D, Rasgado-Toledo J, Issa-Garcia V, Balducci T, Villicaña V, Valencia A, Gonzalez-Olvera JJ, Reyes-Zamorano E, Garza-Villarreal EA. The Mexican magnetic resonance imaging dataset of patients with cocaine use disorder: SUDMEX CONN. Sci Data 2022;9:133. [Crossref] [PubMed]
.Kipf TN Welling M Semi-Supervised Classification with Graph Convolutional Networks. arXiv: 1609.02907,2016 .Zhao W Xu C Cui Z Zhang T Jiang J Zhang Z Yang J. When Work Matters: Transforming Classical Network Structures to Graph CNN. arXiv: 1807.02653,2018 .- Yu J, Yin H, Li J, Gao M, Huang Z, Cui L. Enhancing social recommendation with adversarial graph convolutional networks. IEEE Transactions on Knowledge and Data Engineering 2020;34:3727-39. [Crossref]
- Hamilton WL, Ying R, Leskovec J. Inductive representation learning on large graphs. Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach, California, USA: Curran Associates Inc., 2017:1025-35.
- Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. Stat 2017;1050:10-48550.
- Pan J, Lin H, Dong Y, Wang Y, Ji Y. MAMF-GCN: Multi-scale adaptive multi-channel fusion deep graph convolutional network for predicting mental disorder. Comput Biol Med 2022;148:105823. [Crossref] [PubMed]
- Racherla AS, Sahu R, Bhattacharjee V. Chapter Three - A graph convolutional network based framework for mental stress prediction. In: Jain S, Pandey K, Jain P, Seng P, editors. Artificial Intelligence, Machine Learning, and Mental Health in Pandemics. Academic Press, 2022:73-92.
- Liu L, Wang YP, Wang Y, Zhang P, Xiong S. An enhanced multi-modal brain graph network for classifying neuropsychiatric disorders. Med Image Anal 2022;81:102550. [Crossref] [PubMed]
- Luo G, Li C, Cui H, Sun L, He L, Yang C. Multi-View Brain Network Analysis with Cross-View Missing Network Generation. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 2022:108-15.
- Zhang Y, Zhang H, Adeli E, Chen X, Liu M, Shen D. Multiview Feature Learning With Multiatlas-Based Functional Connectivity Networks for MCI Diagnosis. IEEE Trans Cybern 2022;52:6822-33. [Crossref] [PubMed]
- Zhou H, Zhang Y, Chen BY, Shen L, He L. Sparse Interpretation of Graph Convolutional Networks for Multi-Modal Diagnosis of Alzheimer's Disease. Med Image Comput Comput Assist Interv 2022;13438:469-78.
- Zhang M, Long D, Chen Z, Fang C, Li Y, Huang P, Chen F, Sun H. Multi-view graph network learning framework for identification of major depressive disorder. Comput Biol Med 2023;166:107478. [Crossref] [PubMed]
- Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ, Zhang SH, Martin RR, Cheng MM, Hu SM. Attention mechanisms in computer vision: A survey. Comp. Visual Media 2022;8:331-68. [Crossref]
Graves A Wayne G Danihelka I. Neural Turing Machines. arXiv: 1410.5401,2014 .Bahdanau D Cho K Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv: 1409.0473,2014 .- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Part of Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.
- Noman F, Ting CM, Kang H, Phan RC, Ombao H. Graph Autoencoders for Embedding Learning in Brain Networks and Major Depressive Disorder Identification. IEEE J Biomed Health Inform 2024;28:1644-55. [Crossref] [PubMed]
- Sporns O. Contributions and challenges for network models in cognitive neuroscience. Nat Neurosci 2014;17:652-60. [Crossref] [PubMed]
- Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J. Hierarchical graph representation learning with differentiable pooling. Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018.
- Chu Y, Ren H, Qiao L, Liu M. Resting-State Functional MRI Adaptation with Attention Graph Convolution Network for Brain Disorder Identification. Brain Sci 2022;12:1413. [Crossref] [PubMed]
- Wang M, Huang J, Liu M, Zhang D. Modeling dynamic characteristics of brain functional connectivity networks using resting-state functional MRI. Med Image Anal 2021;71:102063. [Crossref] [PubMed]
- Craddock C, Benhajali Y, Chu C, Chouinard F, Evans A, Jakab A, Khundrakpam BS, Lewis JD, Li Q, Milham M. The neuro bureau preprocessing initiative: open sharing of preprocessed neuroimaging data and derivatives. Front Neuroinform 2013; [Crossref]
- Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, Kent JD, Goncalves M, DuPre E, Snyder M, Oya H, Ghosh SS, Wright J, Durnez J, Poldrack RA, Gorgolewski KJ. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat Methods 2019;16:111-6. [Crossref] [PubMed]
- Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 2002;15:273-89. [Crossref] [PubMed]
- Rodrigues ID, de Carvalho EA, Santana CP, Bastos GS. Machine Learning and rs-fMRI to Identify Potential Brain Regions Associated with Autism Severity. Algorithms 2022;15:195. [Crossref]
- Guo X, Tinaz S, Dvornek NC. Characterization of Early Stage Parkinson’s Disease From Resting-State fMRI Data Using a Long Short-Term Memory Network. Front. Neuroimaging 2022;1:952084. [Crossref] [PubMed]
- Han S, Sun Z, Zhao K, Duan F, Caiafa CF, Zhang Y, Solé-Casals J. Early prediction of dementia using fMRI data with a graph convolutional network approach. J Neural Eng 2024; [Crossref] [PubMed]
- Li A. BrainMixup: Data Augmentation for GNN-based Functional Brain Network Analysis. 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022:4988-992.
- Azevedo T, Passamonti L, Lio P, Toschi N. A deep spatiotemporal graph learning architecture for brain connectivity analysis. Annu Int Conf IEEE Eng Med Biol Soc 2020;2020:1120-3. [Crossref] [PubMed]
- Venkatapathy S, Votinov M, Wagels L, Kim S, Lee M, Habel U, Ra IH, Jo HG. Ensemble graph neural network model for classification of major depressive disorder using whole-brain functional connectivity. Front Psychiatry 2023;14:1125339. [Crossref] [PubMed]
- Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo XN, Holmes AJ, Eickhoff SB, Yeo BTT. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb Cortex 2018;28:3095-114. [Crossref] [PubMed]
- Pauli WM, Nili AN, Tyszka JM. A high-resolution probabilistic in vivo atlas of human subcortical brain nuclei. Sci Data 2018;5:180063. [Crossref] [PubMed]
- Seitzman BA, Gratton C, Marek S, Raut RV, Dosenbach NUF, Schlaggar BL, Petersen SE, Greene DJ. A set of functionally-defined brain regions with improved representation of the subcortex and cerebellum. Neuroimage 2020;206:116290. [Crossref] [PubMed]
- Fei Z, Guo J, Gong H, Ye L, Attahi E, Huang B. A GNN Architecture With Local and Global-Attention Feature for Image Classification. IEEE Access 2023;11:110221-33.
- Ko SM, Cho S, Jeong DW, Han S, Lee M, Lee H. Grouping matrix based graph pooling with adaptive number of clusters. Proceedings of the AAAI Conference on Artificial Intelligence 2023;37:8334-42. [Crossref]

**Cite this article as:**Wang T, Ding Z, Yang X, Chen Y, Lu C, Sun Y. A brain structure learning-guided multi-view graph representation learning for brain network analysis. Quant Imaging Med Surg 2024;14(9):6294-6310. doi: 10.21037/qims-24-578