Pulmonary vessel segmentation in computed tomography images: a cascaded approach combining U-Net and parameter-adaptive fully connected conditional random fields

Zhaofeng Xue; Ying Sun; Guiyuan Tong; Zhaojie Wang; Xinzhuo Zhao

doi:10.21037/qims-24-2008

Original Article

Pulmonary vessel segmentation in computed tomography images: a cascaded approach combining U-Net and parameter-adaptive fully connected conditional random fields

Zhaofeng Xue^1#, Ying Sun^2#, Guiyuan Tong¹, Zhaojie Wang¹, Xinzhuo Zhao¹

¹Department of Electrical Engineering, Shenyang University of Technology, Shenyang, China; ²Department of Radiation Medicine, General Hospital of Northern Theater Command, Shenyang, China

Contributions: (I) Conception and design: X Zhao, Z Xue; (II) Administrative support: X Zhao; (III) Provision of study materials or patients: Y Sun; (IV) Collection and assembly of data: Y Sun, X Zhao; (V) Data analysis and interpretation: Z Xue, G Tong, Z Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Xinzhuo Zhao, PhD. Department of Electrical Engineering, Shenyang University of Technology, No. 111 Shenliao West Road, Shenyang Economic and Technological Development Zone, Shenyang 110870, China. Email: zhaoxinzhuo@sut.edu.cn.

Background: A precise pulmonary vessel segmentation algorithm serves as a powerful auxiliary tool for physicians, enabling them to diagnose various pulmonary diseases with greater accuracy and efficiency. This technology also customizes rational treatment plans tailored to individual patients, alleviating their burden and effectively reducing unnecessary medical resource waste. This study proposes a cascaded algorithm to improve the accuracy of pulmonary vessel segmentation in computed tomography (CT) images.

Methods: This study presents a cascaded model integrating convolutional networks for biomedical image segmentation (U-Net) and parameter-adaptive fully connected conditional random fields (PA-FCCRFs) to efficiently extract pulmonary vessels in CT images. In the initial phase, U-Net is employed to preliminarily segment pulmonary vessels in the lung region. However, convolutional neural network (CNN) with local receptive fields struggles to effectively model long-distance pixel dependencies, often leading to mis-segmentation of lung tissues. To address this issue, we incorporate fully connected conditional random fields (FCCRFs) into the framework for refined segmentation. With fully connected structure, FCCRFs can model dependencies between each pixel and all the other pixels. Moreover, Bayesian optimization is employed to automatically tune internal parameters for optimal performance.

Results: Our method demonstrates significant improvements in pulmonary vessel segmentation outcomes, with the Precision increasing from 73.14±10.67 to 90.24±4.63, F1 improving from 82.67±6.86 to 91.85±3.41, and Hausdorff distance decreasing from 35.12±6.04 to 30.86±2.71. To validate the cascaded PA-FCCRFs strategy, we preliminarily segment pulmonary vessels using AH-Net and V-Net, followed by optimization using PA-FCCRFs. Experimental results showcase substantial enhancements in the accuracy of CNN-based vascular segmentation after PA-FCCRFs optimization.

Conclusions: These findings validate that the cascaded PA-FCCRFs approach effectively segments pulmonary vessels, supporting the diagnosis of pulmonary diseases and promising applications in clinical settings.

Keywords: Pulmonary vessel segmentation; convolutional networks for biomedical image segmentation (U-Net); fully connected conditional random fields (FCCRFs); Bayesian optimization

Submitted Sep 22, 2024. Accepted for publication Mar 24, 2025. Published online Jun 03, 2025.

doi: 10.21037/qims-24-2008

Introduction

Medical image segmentation is a crucial component of medical diagnostic assistance technology. Accurate segmentation of pulmonary vessels holds substantial value in diagnosing pulmonary diseases. For instance, the vascular generation status in pulmonary tumors—during their early, progressive, and metastatic stages—can serve as a critical clinical imaging diagnostic indicator, aiding physicians in assessing the malignancy of tumors. The separation of pulmonary vessels from other regions of interest allows for the precise measurement of the heterogeneity of interstitial lung diseases, as well as pulmonary perfusion (1). Accurate segmentation results of the pulmonary vasculature can assist physicians in swiftly identifying the location and extent of thrombosis, thereby enabling the formulation of more effective treatment plans for patients with pulmonary embolism. By quantitatively measuring and analyzing the morphology and distribution of the pulmonary vessels based on the segmentation results, clinical practitioners can more precisely assess the severity of pulmonary hypertension.

Convolutional networks for biomedical image segmentation (U-Net) (2) has gained widespread application in various medical image segmentation tasks. Simultaneously, various modified convolutional neural networks (CNNs) tailored for specific segmentation tasks have been derived by adjusting the structure of U-Net. These variants have demonstrated superior performance across a spectrum of specific medical image segmentation tasks. For example, Fu et al. (3) extended the U-Net architecture to create the M-Net framework by incorporating multi-scale input layers and lateral output layers, facilitating the segmentation of optic discs and cups in fundus images. Chen et al. (4) developed the Bridged U-Net for prostate segmentation by bridging two U-Net models. Cui et al. (5) introduced U-Net++, employing orthogonal fusion strategies for two-and-a-half-dimensional networks and multi-plane networks to segment lung vessels. However, due to the utilization of fixed-size convolutional kernels in the convolutional layers of CNN, these layers face challenges in effectively integrating features from long-distance pixels when extracting image features. Simultaneously, precise modeling of relationships between long-distance pixels becomes challenging (6,7).

In response to this issue, some researchers have proposed the use of transformer architectures to assist models in better capturing the dependencies between long-distance pixels in medical images. For instance, Chen et al. have proposed the employment of a transformer model based on self-attention mechanisms to replace the encoder of U-Net, thereby developing the TransUNet model, with the aim of achieving more precise segmentation. Naqvi et al. (8) have introduced a novel transformer-based denoising model for medical imaging. This model incorporates a new type of deep and wide residual block, which is utilized to learn the underlying noise patterns in medical images. Unlike traditional residual blocks, this innovative residual block employs dilated convolution operations to capture the correlations between long-distance pixels within the image. Additionally, the study has proposed the application of the multi head self-attention mechanism to guide image reconstruction, enabling the model to simultaneously attend to multiple parts of the input sequence. This approach more effectively utilizes the dependencies of long-distance pixels captured by dilated convolutions, aiding the model in identifying and removing noise while preserving crucial details and structural information within the image.

Additionally, fully connected conditional random fields (FCCRFs) are also identified as a highly effective method for addressing the issue of local receptive field in convolution operations, which has been proposed to optimize the medical image segmentation results. For example, Fu et al. (3) achieved precise segmentation of retinal vessels by cascading the fully convolutional neural network (FCN) with FCCRFs. Li et al. (9) achieved lung fields segmentation by integrating U-Net with FCCRFs, and obtained high Dice coefficients and Jaccard indices on the Japanese Society of Radiological Technology (JSRT) dataset. Additionally, FCCRFs have been widely applied in various medical image segmentation tasks. For example, Kadoury et al. (10) utilized FCCRFs to segment brain gliomas in magnetic resonance imaging (MRI) images. Orlando et al. (11) used FCCRFs for retinal vessel segmentation. FCCRFs represent a graph-structured model, which maps the input image into a graph structure, where each pixel is transformed into a node. Additionally, each node is connected to all other nodes through an edge, symbolizing the correlation between two nodes. This fully connected structure effectively mitigates the locality issue inherent in convolutional operations in CNN.

However, the internal structure of FCCRFs presents numerous hyperparameters, and the configuration of these hyperparameters plays a crucial role in determining the final optimization outcome. Therefore, the challenge lies in how to appropriately set the parameters within FCCRFs to harness its optimal optimization performance. In this study, the parameter-adaptive fully connected conditional random fields (PA-FCCRFs) were proposed through the integration of the Bayesian optimization algorithm. This approach enables the automatic tuning of the internal parameters of FCCRFs using Bayesian optimization techniques.

In this study, we initially employed the U-Net for the preliminary segmentation of vessels in pulmonary computed tomography (CT) images. The localized nature of convolutional operations within U-Net erroneously classified some tissues as blood vessels, resulting in coarse segmentation outcomes. Therefore, we propose PA-FCCRFs model to optimize the predictions generated by U-Net. This model incorporates Bayesian optimization into FCCRFs, enabling automatic adjustments of the internal parameters to achieve the optimal optimization performance for the FCCRFs.

To validate the superior optimization effect of the PA-FCCRFs on U-Net’s vascular segmentation results, we conducted a comprehensive ablation experiment on a small-scale dataset that included 10 groups of pulmonary CT scans. The results of the ablation experiment revealed a marked improvement in the accuracy of U-Net’s vascular segmentation following optimization with the PA-FCCRFs. Specifically, the Precision value increased from 73.14±10.67 to 90.24±4.63, the F1 rose from 82.67±6.86 to 91.85±3.14 and the Hausdorff distance decreased from 35.12±6.04 to 30.86±2.71. Additionally, AH-Net and V-Net were trained for blood vessel segmentation in lung CT images and PA-FCCRFs were applied to optimize their segmentation results to verify that PA-FCCRFs have stable optimization performance. The results indicate that the cascaded PA-FCCRFs model exhibits more precise vascular segmentation performance compared to CNN, supporting the diagnosis of pulmonary diseases and promising applications in clinical settings.

The contributions of this paper can be summarized as follows:

We proposed a cascaded model integrating U-Net and PA-FCCRFs to efficiently extract pulmonary vessels in CT images inspired by previous researches (11-13).
We incorporated Bayesian optimization techniques into FCCRFs to automatically tune internal parameters for optimal performance.
We trained U-Net, V-Net and AH-Net to validate the optimization effects of PA-FCCRFs on CNN’s predictions.
We present this article in accordance with the TRIPOD + AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2008/rc).

Methods

Dataset

The dataset utilized in this experiment was introduced at the 2020 International Symposium on Image Computing and Digital Medicine (ISICDM). This is an annual conference organized by the International Society of Digital Medicine. It features challenges aimed at addressing significant theoretical, algorithmic, and application-oriented problems in these fields. Our dataset, which includes 10 groups of CT scans, was developed for the ISICDM challenge and labeled by the organizers. The students were trained by a medical imaging expert to perform image annotation, ensuring accuracy based on professional anatomical knowledge. The dataset comprises 10 groups of CT scans, with each group consisting 400 to 500 Digital Imaging and Communications in Medicine (DICOM) files of the original lung CT scan slices. These slices have a section thickness of less than 2 mm and a resolution of 512×512, along with a JPG file for image annotation. Simultaneously, we conducted morphological operations on the 10 cases for preprocessing, thereby extracting the pulmonary region from DICOM files for model training. Subsequently, the U-Net was trained using the Leave-One-Out Cross-Validation approach.

The principles of U-Net and PA-FCCRFs cascade algorithm

The core of U-Net lies in its distinctive encoder-decoder architecture, featuring skip connections between the encoder and decoder for the transmission of feature information. The operational principles of U-Net for image feature extraction and its structural details are extensively discussed in the original article (2).

In this study, FCCRFs are represented by the conditional probability distribution $P (y | I)$ , and the specific expression of FCCRFs is as follows:

$P (y | I) = \frac{1}{Z (I)} e^{(- E (y | I))}$ [1]

where $E (y | I)$ represents the energy function, I corresponds to the vascular segmentation results of U-Net, y denotes the optimization result of FCCRFs, and $Z (I)$ signifies the normalization constant. For a given input sequence I, the output sequence y corresponding to the maximum conditional probability $P (y | I)$ represents the optimized results of the FCCRFs. Therefore, the optimization process of FCCRFs is delineated as the minimization of the energy function (11), as expressed in the following specific formula:

$y = \arg \min E (y | I)$ [2]

The specific expression of the energy function $E (y | I)$ is as follows:

$E (y | I) = \sum_{i} P_{u n a r y} (I_{i}) + \sum_{(i, j) \in C} P_{p a i r} (I_{i}, I_{j}, f_{i}, f_{j})$ [3]

Within this context, I_i denotes the assigned category of the i-th pixel in the predictive results of U-Net, and f_i represents the binary feature extracted from the original predicted image corresponding to that pixel. $P_{u n a r y} (I_{i})$ stands for the unary potential function, signifying the probability of the i-th pixel in the predictive results of U-Net being associated with its category I_i. $P_{p a i r} (I_{i}, I_{j}, f_{i}, f_{j})$ functions as the pairwise potential function, representing the similarity calculated between the i-th and j-th pixels based on their binary features. In this experiment, we employed the fast inference algorithm proposed by Krähenbühl et al. (14) for calculating the pairwise potential function values between all pixel pairs. This method treats the pairwise potential function as a linear combination of several Gaussian kernel functions, specified as follows:

$P_{p a i r} {(I_{i}, I_{j}, f_{i}, f)}_{j} = u (I_{i}, I_{j}) \sum_{m = 1}^{M} w_{p a i r}^{(m)} k^{(m)} (f_{i}^{(m)}, f_{j}^{(m)})$ [4]

In this context, u represents the compatibility function, the specific expression of the compatibility function is as follow:

$u (I_{i}, I_{j}) = {\begin{matrix} 1 & I_{i} \neq I_{j} \\ 0 & I_{i} = I_{j} \end{matrix}$ [5]

W_pair represents the weight of Gaussian kernel function k. The Gaussian kernel function is employed to calculate the similarity between two pixels based on their binary features, f_iand f_i. The specific form of the Gaussian kernel function is as follows:

$k {(f_{i}, f)}_{j} = e^{- \frac{{| f_{i} - f_{j} |}^{2}}{2 θ^{2}}}$ [6]

Within this context, θ serves as the bandwidth parameter, employed to regulate the impact of differences in binary features between two pixels on the computation of their similarity. From this formula, it is evident that if the binary feature disparity between pixels of different classes is larger, the corresponding binary potential function value between them becomes smaller. Conversely, in the scenario where binary features between pixels of different classes are highly similar, there is a substantial probability of misclassification for one of the pixels, resulting in a larger value for their pairwise potential function.

The pairwise potential function employed in this experiment is composed of a linear combination of Gaussian bilateral and Gaussian spatial kernel functions. The specific expression of the pairwise potential function in this study is as follows:

$u (I_{i}, I_{j}) \sum_{m = 1}^{2} w_{p a i r}^{(m)} k^{(m)} (f_{i}^{(m)}, f_{j}^{(m)}) = u (I_{i}, I_{j}) [w_{p a i r}^{(1)} e^{- \frac{{| L_{i} - L_{j} |}^{2}}{2 θ_{a}^{2}} - \frac{{| R_{i} - R_{j} |}^{2}}{2 θ_{β}^{2}}} + w_{p a i r}^{(2)} e^{- \frac{{| L_{i} - L_{j} |}^{2}}{2 θ_{γ}^{2}}}]$ [7]

Within this context, L corresponds to the positional features of the pixel, and R represents the grayscale features of the pixel. $w_{p a i r}^{(1)}$ and $w_{p a i r}^{(2)}$ represent the weight coefficients of the Gaussian bilateral kernel function and the Gaussian spatial kernel function, respectively. The parameters $w_{p a i r}^{(1)}$ and $w_{p a i r}^{(2)}$ respectively determine the respective influence of the computation outcomes of the Gaussian bilateral kernel function and the Gaussian spatial kernel function on the similarity calculation results between two pixels. θ_a and θ_γ represent the position feature bandwidth parameter of the Gaussian bilateral kernel function and the Gaussian spatial kernel function, respectively. They regulate the impact of positional feature disparities between two pixels on the similarity calculation results between them. θ_β signifies the grayscale feature bandwidth parameter of the Gaussian bilateral kernel function, which governs the impact of grayscale feature disparities between two pixels on the similarity calculation results between them. During the optimization of the segmentation results obtained from U-Net, FCCRFs adjust the pixel’s category based on the disparity in positional features and grayscale features from each pixel to all other pixels, ultimately minimizing the energy function. This effectively addresses the limitations of convolutional operations in CNN. Overall, the energy function in FCCRFs is expressed as follows:

$E (y | I) = \sum_{i} P_{u n a r y} (I_{i}) + \sum_{(i, j) \in C} u (I_{i}, I_{j}) [w_{p a i r}^{(1)} e^{- \frac{{| L_{i} - L_{j} |}^{2}}{2 θ_{a}^{2}} - \frac{{| R_{i} - R_{j} |}^{2}}{2 θ_{β}^{2}}} + w_{p a i r}^{(2)} e^{- \frac{{| L_{i} - L_{j} |}^{2}}{2 θ_{γ}^{2}}}]$ [8]

Additionally, to ensure the rational adjustment of pixel categories by FCCRFs based on positional feature disparities and grayscale feature disparities. An initial set of adjustment ranges for the internal parameters of FCCRFs was established, with the specific ranges for each parameter defined as follows: $w_{p a i r}^{(1)} = [1 : 50]$ , $θ_{a} = [5 : 70]$ , $θ_{β} = [10 : 70]$ , $w_{p a i r}^{(2)} = [1 : 50]$ , $θ_{γ} = [5 : 70]$ . Subsequently, the Bayesian optimization algorithm was introduced into FCCRFs to search for the optimal internal parameters. Specifically, the Dice coefficient corresponding to optimization results of FCCRFs under different parameters is employed as the objective function for Bayesian optimization, with the five internal parameters of FCCRFs (denoted as $w_{p a i r}^{(1)}$ , θ_a, θ_β, $w_{p a i r}^{(2)}$ , and θ_γ) serving as the parameters to be optimized. Simultaneously, the Tree-structured Parzen Estimator (TPE) algorithm is utilized as a surrogate function to establish a probability distribution model based on historical information, and the expected improvement (EI) algorithm is employed as the acquisition function to search for the optimal parameters of FCCRFs. The specific iterative principles of Bayesian optimization, including the establishment of a probability distribution model using the TPE algorithm and the search for the optimal parameters based on the probability distribution model using the EI algorithm, are thoroughly detailed in the article (15). The framework of the U-Net and PA-FCCRFs cascade algorithm is illustrated in Algorithm 1.

Algorithm 1 U-Net and PA-FCCRFs cascade algorithm framework

Input: ParameterSpace R, UNet_output I, Original image for test T,
Total number of iterations Num, Surrogate function S,
Acquisition function EI Historical information H,
Output: Optimal optimization results of vascular segmentation y

1

H \leftarrow ϕ

2 RandomSelect parameters x in R,

3 For

t \leftarrow 1

to Num

4

Optimized Re sult \leftarrow F C C R F s (x *, I, T)

5

O (x) \leftarrow EvaluateDice (Optimized Re sult)

6

H \leftarrow H \cup (x, O (x))

7

M_{t} \leftarrow S (H)

8

x \leftarrow \arg \max_{x} E I (M_{t})

9 Identify the optimal parameters x* to H,

10

y \leftarrow F C C R F s (x *, I, T)

11 Return y

Training configurations

In the present study, the maximum number of training epochs for all models was set to 70, with a batch size of 3. The weight initialization strategy utilized for all models was the Kaiming initialization. The loss function applied during training was binary cross-entropy (BCE). The optimizer employed was stochastic gradient descent (SGD), with an initial learning rate of 0.01 and a momentum of 0.9. To prevent overfitting during training, weight decay was set to 0.0001. The learning rate was adjusted using a polynomial decay strategy throughout the training process, facilitating rapid learning in the early stages and refined adjustments in the later stages.

Results

Ablation studies were conducted to compare different architectures and components of the PA-FCCRFs. Specifically, we compared the performance of U-Net alone, U-Net combined with naive adaptive-fully connected conditional random fields (NA-FCCRFs), and U-Net combined with PA-FCCRFs. To further ensure the generalizability and stability of our model, in addition to U-Net, we also employed other CNN models, such as V-Net and AH-Net. The results clearly demonstrate that the cascaded PA-FCCRFs model significantly outperforms the other configurations, validating the effectiveness of our proposed approach.

Specifically, in the segmentation results produced by U-Net, the majority of the pulmonary vessels within the lung area are successfully segmented; however, a considerable number of misclassified pixels are also present. This leads to a Recall of 96.71±3.19 for the segmentation results, while the Precision is only 73.14±10.67. Following optimization with PA-FCCRFs, the Precision is markedly increased from 73.14±10.67 to 90.24±4.63, with only a slight decrease in Recall from 96.71±3.19 to 94.46±2.26, thereby achieving precise segmentation of the pulmonary vessels. Concurrently, the results of the three ablation experiments uniformly demonstrate that the segmentation accuracy of the CNN + PA-FCCRFs (which will be used to denote the CNN and PA-FCCRFs cascaded model in the subsequent sections of this article) is significantly higher than that of CNN alone. An analysis of the segmentation results from different models in the three ablation experiments is provided in Table 1.

Table 1

Evaluation metrics and t-test results for segmentation outcomes across models

Model	Precision	Recall	F1	Hausdorff distance
U-Net	73.14±10.67	96.71±3.19	82.67±6.86	35.12±6.04
U-Net + NA-FCCRFs	73.01±10.20	95.87±3.19	82.37±4.56	35.49±5.91
U-Net + PA-FCCRFs	90.24±4.63^†	94.46±2.26	91.85±3.41^†	30.86±2.71
V-Net	57.77±8.31	85.11±5.85	69.03±6.28	49.90±2.23
V-Net + NA-FCCRFs	71.46±8.20*	84.50±6.08	74.64±2.38	48.28±1.33
V-Net + PA-FCCRFs	81.9±10.60^†	82.28±6.79	81.25±2.36^†	38.83±3.87^†
AH-Net	70.68±9.26	83.69±5.22	75.92±4.07	37.85±4.87
AH-Net + NA-FCCRFs	80.06±6.90*	82.56±3.21	81.09±2.96*	33.74±5.86
AH-Net + PA-FCCRFs	89.8±3.81^†	81.95±3.49	85.52±3.83^†	27.50±3.38

Data are presented as mean ± standard deviation. *, the P values, calculated using the segmentation results of the CNNs and cascaded NA-FCCRFs models, were all less than 0.05, indicating statistically significant differences. ^†, the P values, calculated using the segmentation results of the CNNs and cascaded PA-FCCRFs models, were all less than 0.05, indicating statistically significant differences. AH-Net, 3d anisotropic hybrid network; CNN, convolutional neural network; NA-FCCRFs, naive adaptive-fully connected conditional random fields; PA-FCCRFs, parameter adaptive-fully connected conditional random fields; U-Net, convolutional networks for biomedical image segmentation; V-Net, fully convolutional neural networks for volumetric medical image segmentation.

In this study, the method of controlled variables was initially employed to adjust each internal parameter of FCCRFs sequentially within the predefined parameter adjustment ranges. Within these ranges, an optimal set of parameters was identified as follows: $w_{p a i r}^{(1)} = 35$ , $θ_{α} = 20$ , $θ_{β} = 15$ , $w_{p a i r}^{(2)} = 15$ , $θ_{γ} = 50$ . FCCRFs equipped with this set of parameters are referred to as naive adaptive-fully connected conditional random fields (NA-FCCRFs). Using the method of controlled variables, parameters were optimized within a certain range with a step size of 5 or 10. However, due to the larger step size inherent in the parameter tuning method of controlled variables, this approach is limited in its ability to seek the global optimal solution. To address this issue, Bayesian optimization algorithm was employed in the current study to automate the parameter tuning within the established adjustment ranges. The set of parameters resulting from this search was as follows: $w_{p a i r}^{(1)} = 37$ , $θ_{α} = 11$ , $θ_{β} = 17$ , $w_{p a i r}^{(2)} = 17$ , $θ_{γ} = 53$ . The results of the three ablation experiments consistently indicate that the segmentation accuracy of the CNN + PA-FCCRFs is significantly higher than that of the CNN + NA-FCCRFs. The analysis of the segmentation results of the two models in the three ablation experiments and the comparative graphs of the segmentation results are presented in Table 1 and Figure 1, respectively.

Figure 1 Comparative analysis of segmentation results between CNN + PA-FCCRFs and CNN + NA-FCCRFs. U-Net + PA-FCCRFs, U-Net + NA-FCCRFs stand for U-Net and PA-FCCRFs cascaded model, U-Net and NA-FCCRFs cascaded model, respectively. (A), (B), and (C) respectively represent the comparative analyses of segmentation results between CNN + PA-FCCRFs and CNN + NA-FCCRFs across the three ablation experiments. Ave, average; AH-Net, 3d anisotropic hybrid network; CNN, convolutional neural network; HD, Hausdorff distance; NA-FCCRFs, naive adaptive-fully connected conditional random fields; PA-FCCRFs, parameter adaptive-fully connected conditional random fields; U-Net, convolutional networks for biomedical image segmentation; V-Net, fully convolutional neural networks for volumetric medical image segmentation.

Additionally, we conducted a search for the optimal set of internal parameters for FCCRFs, following the adjustment range used in the research by Chen et al. (16). Utilizing the Bayesian optimization algorithm, an optimal parameter set was identified: $w_{p a i r}^{(1)} = 9$ , $θ_{α} = 55$ , $θ_{β} = 6$ , $w_{p a i r}^{(2)} = 3$ , and $θ_{γ} = 3$ . Typically, parameters found within this adjustment range are expected to exhibit superior optimization performance. The results indicate that after optimization with this set of parameters, the number of false positive (FP) pixels in the U-Net segmentation results was significantly reduced, with the precision increasing from 73.14±10.67 to a notable 95.74±0.99. However, this optimization also led to a severe issue of over optimization, which resulted in a decrease in recall from 96.71±3.19 to 59.98±14.33, and a decline in the F1 score from 82.67±6.86 to 71.96±11.01. Consequently, the parameters adjusted within our initially set range demonstrate better optimization performance for medical image segmentation outcomes.

Addressing the dependency between long-distance pixels in images, the transformer architecture has also proven to be a highly effective method. Consequently, this study compared U-Net + PA-FCCRFs with the Transformer-based segmentation model, TransUNet, which is the state-of-the-art (SOTA) for the Synapse segmentation task. TransUNet was trained using a leave-one-out cross-validation strategy. The results of the cross-validation tests reveal that the UNet + PA-FCCRFs exhibits similar Precision to TransUNet; however, the UNet + PA-FCCRFs achieves higher values in terms of Recall and F1 score. It can be observed that the segmentation results of TransUNet contain fewer FP pixels, whereas the UNet + PA-FCCRFs demonstrate a higher segmentation coverage rate. A comparative analysis of the segmentation results of the two models is illustrated in Figure 2.

Figure 2 Comparative analysis of segmentation results between U-Net + PA-FCCRFs and TransUNet. Ave, average; HD, Hausdorff distance; PA-FCCRFs, parameter adaptive-fully connected conditional random fields; U-Net + PA-FCCRFs, U-Net and PA-FCCRFs cascaded model; U-Net, convolutional networks for biomedical image segmentation.

Typically, the application of an effective data preprocessing algorithm can significantly enhance the segmentation performance of a model. Therefore, a medical image deblurring method proposed by Sharif et al. (17) was employed for preprocessing the data used in this study. This method has been recognized as the SOTA model in multiple medical imaging deblurring tasks. Following this preprocessing, both the U-Net and TransUNet models were retrained with the leave-one-out cross-validation technique. The results revealed a slight reduction in precision for the segmentation outcomes of both models. The Recall associated with the U-Net segmentation results decreased from 96.71±3.19 to 85.82±1.36, and the Recall for the TransUNet segmentation results declined from 85.30±6.05 to 83.18±3.19. The findings suggest that the deblurring process applied to the data may have removed some lung area edges and vascular structures. Consequently, the number of FP and TP pixels in the model’s segmentation results was reduced to a certain degree. This also accounts for the significant decrease in Recall for the U-Net segmentation results, while the Precision saw only a modest decline. The specific values for the four evaluation metrics corresponding to the segmentation results are presented in Table 2.

Table 2

Comparison of four evaluation metrics for segmentation results of models trained before and after data preprocessing

Model	Precision	Recall	F1	Hausdorff distance
U-Net	73.14±10.67	96.71±3.19	82.67±6.86	35.12±6.04
U-Net^$	73.09±5.68	85.82±1.36	78.95±2.38	37.50±3.28
TransUNet	89.51±1.85	85.30±6.05	86.67±1.92	32.31±1.31
TransUNet^$	88.33±2.78	83.18±3.19	85.64±2.55	32.73±1.87

Data are presented as mean ± standard deviation. ^$, models trained on data that has been preprocessed using a medical image deblurring algorithm. U-Net, convolutional networks for biomedical image segmentation.

The comparative illustration of segmentation results between CNN, cascaded NA-FCCRFs model, and cascaded PA-FCCRFs model is presented in Figure 3. In the segmentation results produced by the CNN, numerous lung area edges were misclassified as pulmonary vessels, as indicated by the red circles in Figure 3B. In the optimization results of the NA-FCCRFs, a significant number of misclassified pixels remained uncorrected, as shown within the red circles in Figure 3C. Conversely, in the optimization results of the PA-FCCRFs, the vast majority of misclassified pixels were successfully optimized, leading to a notable improvement in segmentation accuracy. However, there were also instances of over-optimization, as highlighted by the red circles in Figure 3D.

Figure 3 Partially extracted images for vascular segmentation results from CNN, cascaded NA-FCCRFs and cascaded PA-FCCRFs. (A) The original lung area image; (B) the segmentation results of CNN; (C) the segmentation results of CNN and NA-FCCRFs cascaded model; (D) the segmentation results of CNN and PA-FCCRFs cascaded model; (E) the manually annotated images. In the segmentation results of the CNN, numerous lung area edges were misclassified as pulmonary vessels, as indicated by the regions within the red circles in (B). In the segmentation results of CNN and NA-FCCRFs cascaded model, many misclassified pixels remained unoptimized, as shown by the areas within the red circles in (C). Conversely, in the segmentation results of the CNN and PA-FCCRFs cascaded model, the vast majority of misclassified pixels were successfully optimized, resulting in a significant improvement in segmentation accuracy. However, there were also instances of over-optimization, as evidenced by the regions within the red circles in (D). CNN, convolutional neural network; NA-FCCRFs, naive adaptive-fully connected conditional random fields; PA-FCCRFs, parameter adaptive-fully connected conditional random fields.

The three-dimensional reconstruction and comparison of vascular segmentation results between CNN and cascaded PA-FCCRFs model is presented in Figure 4. In the vascular 3D reconstruction results from CNN, it is evident that certain regions outside the lung parenchyma are misclassified as pulmonary vessels, as highlighted in the black-elliptical area. Conversely, in the vascular 3D reconstruction results from the cascaded PA-FCCRFs model, a significant improvement in the precision of vessel segmentation is observed, with only minimal misclassified points remaining.

Figure 4 Three-dimensional reconstruction of vascular segmentation results from different models. (A) The 3D reconstruction of vascular segmentation results achieved by CNN; (B) showcases the 3D reconstruction of vascular segmentation results achieved by CNN and PA-FCCRFs cascaded model; (C) the 3D reconstruction of manually annotated images. In the vascular 3D reconstruction results from CNN, it is evident that certain regions outside the lung parenchyma are misclassified as pulmonary vessels, as highlighted in the black-elliptical area. Conversely, in the vascular 3D reconstruction results from the cascaded PA-FCCRFs model, a significant improvement in the precision of vessel segmentation is observed, with only minimal misclassified points remaining. 3D, three dimensional; CNN, convolutional neural network; PA-FCCRFs, parameter adaptive-fully connected conditional random fields.

The three-dimensional vascular skeleton reconstruction and comparison of vascular segmentation results between CNN and cascaded PA-FCCRFs model is presented in Figure 5. In the skeleton extraction results from CNN, a substantial number of misclassified points and breakpoints are observed. In contrast, the skeleton extraction results from the cascaded PA-FCCRFs model exhibit a noticeable reduction in the quantity of misclassified points and breakpoints.

Figure 5 Partially extracted images for 3D vascular skeleton extraction from different models’ vascular segmentation results. (A) The 3D vascular skeleton extracted from the vascular segmentation results obtained through CNN; (B) the 3D vascular skeleton extracted from the vascular segmentation results obtained through CNN and PA-FCCRFs cascaded model; (C) the 3D vascular skeleton extracted from manually annotated images. In the skeleton extraction results from CNN, a substantial number of misclassified points and breakpoints are observed. In contrast, the skeleton extraction results from the cascaded PA-FCCRFs model exhibit a noticeable reduction in the quantity of misclassified points and breakpoints. 3D, three dimensional; CNN, convolutional neural network; PA-FCCRFs, parameter adaptive-fully connected conditional random fields.

Discussion

In this work, we introduced the PA-FCCRFs model by incorporating Bayesian optimization into FCCRFs. The precise segmentation of pulmonary vessels in pulmonary CT scans is achieved by cascading U-Net with PA-FCCRFs. To validate the outstanding segmentation performance of U-Net + PA-FCCRFs, we conducted ablation experiments. It was observed that the U-Net + PA-FCCRFs model obviously outperformed the U-Net and the U-Net + NA-FCCRFs model. Additionally, the cascaded model was compared with the SOTA method for the Synapse segmentation task, TransUNet. The results revealed that the U-Net + PA-FCCRFs exhibited higher segmentation accuracy compared to TransUNet. Furthermore, to affirm the stable optimization effects of PA-FCCRFs, we separately trained V-Net and AH-Net models and performed two additional ablation experiments. The results of both ablation experiments consistently demonstrated that the CNN + PA-FCCRFs approach exhibited the optimal segmentation performance. These findings validate that the cascaded PA-FCCRFs approach effectively segments pulmonary vessels, supporting the diagnosis of pulmonary diseases and promising applications in clinical settings.

Bayesian optimization algorithms were employed to automatically tune the internal parameters of FCCRFs within the range proposed by Chen et al. (16). However, the results indicated that the optimal parameters identified within this adjustment range did not yield satisfactory optimization performance. In contrast, the parameters discovered in the adjustment range utilized in this study demonstrated superior optimization performance. The discrepancy is speculated to arise from several factors. Firstly, the study by Chen et al. (16) focused on natural image segmentation tasks, whereas this study was concerned with medical image segmentation, which is characterized by more complex backgrounds and uneven sample distribution. Consequently, it may be necessary to search for optimal parameters within different ranges when optimizing the segmentation results of different types of images.

Additionally, in the CNN-based vascular segmentation results, the majority of misclassified pixels were located at the edges of the lung area, exhibiting positional differences from the lung vessel pixels. Therefore, taking into account the positional feature differences between these pixels might enable FCCRFs to more accurately calculate the correlation between misclassified pixels and lung vessel pixels. This, in turn, could facilitate precise adjustment of the misclassified pixels’ categories based on the correlation calculation results. Hence, it is believed that the setting of θ_α should allow the positional feature differences between pixels to have a significant impact on the correlation calculation outcomes.

Consequently, compared to the maximum value of the θ_α parameter adjustment range set in the study by Chen et al., (16) a relatively smaller maximum value for the θ_α parameter adjustment range was established in this study. This ensures that the Bayesian optimization algorithm can search for appropriate parameters within this range, allowing FCCRFs to more reasonably consider the positional feature differences between pixels when calculating their correlations.

In the field of image segmentation, in addition to employing traditional image segmentation models such as U-Net, the use of You Look Only Once (YOLO) algorithm for image segmentation tasks has been explored by some researchers. For instance, Bolya et al. (18) improved upon the YOLO framework to develop an efficient real-time instance segmentation approach. They utilized a Feature Pyramid Network to merge multi-scale features and introduced the concept of prototype masks to detect the mask coefficients for each object, ultimately generating the final segmentation masks. Zhao et al. (19) achieved real-time, high-quality segmentation by integrating a unified segmentation head into YOLO and employing context feature fusion techniques. However, YOLO is primarily designed for real-time object detection tasks, which results in a shallower network architecture compared to traditional segmentation models. This inherent characteristic negatively impacts segmentation accuracy. Moreover, the backgrounds of medical images are often complex, and medical image segmentation tasks typically require the identification and segmentation of smaller structures, such as tumors and blood vessels. The grid design of YOLO hinders its ability to effectively focus on pixel-level information in medical image segmentation tasks, leading to the loss of critical information and causing issues with missed detections.

Due to the data utilized in this study consisting of 10 sets of lung CT scan images, the segmentation performance of various models was assessed using the 10-fold cross-validation approach. Throughout the ten rounds of cross-validation, the cascaded PA-FCCRFs model demonstrated superior and consistent segmentation capabilities. However, given the limited amount of data used in this study, further validation is required to ascertain the model’s generalization performance when faced with a broader range of unknown clinical lung CT scan images. Additionally, the 10 sets of lung CT scan images used in this study all represented healthy lung tissue, and the stability and superiority of the model’s segmentation performance when applied to CT images of patients with various lung diseases require further validation for clinical application. Consequently, in future work, we intend to collect additional public datasets and collaborate with hospitals to acquire a more diverse range of clinical lung CT scan images, thereby further validating the model and enhancing its generalizability and stability.

Conclusions

This study presents a cascaded model combining U-Net and PA-FCCRFs to achieve accurate pulmonary vessel segmentation in CT images. The proposed method effectively addresses the limitations of CNNs in modeling long-range pixel dependencies, significantly improving segmentation performance, as evidenced by the notable increases in Precision (73.14±10.67 to 90.24±4.63) and F1-score (82.67±6.86 to 91.85±3.41), as well as the reduction in Hausdorff distance (35.12±6.04 to 30.86±2.71). Furthermore, the generalization capability of PA-FCCRFs is validated through its successful optimization of segmentation results from other CNN-based models (AH-Net, V-Net, and TransUNet). These findings demonstrate that the cascaded PA-FCCRFs strategy enhances pulmonary vessel segmentation accuracy, offering a reliable tool for clinical diagnosis and treatment planning of pulmonary diseases.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD + AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2008/rc

Funding: This study was funded by National Natural Science Funds for Young Scholar (No. 62101357), Liaoning Province Science and Technology Joint Fund (No. 2023-BSBA-256), and 2023 Liaoning Province Artificial Intelligence Innovation Development Plan Project (No. 2023JH26/10200013).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2008/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Li Y, Dai Y, Yu N, Duan X, Zhang W, Guo Y, Wang J. Morphological analysis of blood vessels near lung tumors using 3-D quantitative CT. J Xray Sci Technol 2019;27:149-60. [Crossref] [PubMed]
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Proceedings of the Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. 2015 Oct 5-9. Munich, Germany: Springer; 2015: 234-241.
Fu H, Cheng J, Xu Y, Wong DWK, Liu J, Cao X. Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep Network and Polar Transformation. IEEE Trans Med Imaging 2018;37:1597-605. [Crossref] [PubMed]
Chen W, Zhang Y, He J, Qiao Y, Chen Y, Shi H, et al. Prostate Segmentation using 2D Bridged U-net. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN); 2019 Jul 14-19. Budapest, Hungary: 2019;1-7. doi: 10.1109/IJCNN.2019.8851908.
Cui H, Liu X, Huang N. Pulmonary Vessel Segmentation Based on Orthogonal Fused U-Net++ of Chest CT Images. In: Medical Image Computing and Computer Assisted Intervention -- MICCAI 2019. 2019 Oct 13-17. Shenzhen, China: Springer; 2019:293-300.
Hu H, Zhang Z, Xie Z, Lin S. Local Relation Networks for Image Recognition. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019 Oct 27-Nov 2. Seoul, Korea (South): IEEE; 2019:3463-72.
Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J. Stand-Alone Self-Attention in Vision Models. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019 Dec 8-14. Vancouver, Canada: Curran Associates Inc; 2019: 68-80.
Naqvi RA, Haider A, Kim HS, Jeong D, Lee SW. Transformative Noise Reduction: Leveraging a Transformer-Based Deep Network for Medical Image Denoising. Mathematics 2024;12:2313.
Li Y, Wang B, Shi W, Miao Y, Yang H, Jiang Z. Lung Segmentation via Deep Learning Network and Fully-Connected Conditional Random Fields. In: International Conference on Bio-Inspired Computing: Theories and Applications. 2020 Oct 23-25. Qingdao, China: Springer; 2021:396-405.
Kadoury S, Abi-Jaoh N, Valdes PA. Higher-Order CRF Tumor Segmentation with Discriminant Manifolds. In: Proceedings of the Medical Image Computing and Computer-Assisted Intervention – MICCAI. 2013. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013:719-26.
Orlando JI, Prokofyeva E, Blaschko MB. A Discriminatively Trained Fully Connected Conditional Random Field Model for Blood Vessel Segmentation in Fundus Images. IEEE Trans Biomed Eng 2017;64:16-27. [Crossref] [PubMed]
Fu H, Xu Y, Wong DWK, Liu J. Retinal vessel segmentation via deep learning and fully-connected conditional random fields. In: Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE; 2016 Apr 13-16. Prague, Czech Republic: IEEE; 2016:698-701.
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2015 Dec 7-13. Santiago, Chile: ICCV, 2015; Inter:1529-37.
Krähenbühl P, Koltun V. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2011 Dec 12-15. Granada, Spain: Curran Associates Inc; 2011: 109-117.
Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. In: Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS 2011). 2011 Dec 12-15. Granada Spain: NIPS; 2011:1-9.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 2018;40:834-48. [Crossref] [PubMed]
Sharif SMA, Naqvi RA, Mehmood Z, Hussain J, Ali A, Lee SW. MedDeblur: Medical Image Deblurring with Residual Dense Spatial-Asymmetric Attention. Mathematics 2023;11:115.
Bolya D, Zhou C, Xiao F, Lee YJ. YOLACT: Real-Time Instance Segmentation. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Sep 9-15, 2019, Leeds, UK. New York: IEEE, 2019:9156-965.
Zhao H, Qi X, Shen X, Shi J, Jia J. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In: Computer Vision – ECCV. 2018 Sep 8-14. Munich, Germany: Springer; 2018:405-20.

Cite this article as: Xue Z, Sun Y, Tong G, Wang Z, Zhao X. Pulmonary vessel segmentation in computed tomography images: a cascaded approach combining U-Net and parameter-adaptive fully connected conditional random fields. Quant Imaging Med Surg 2025;15(6):4896-4909. doi: 10.21037/qims-24-2008

Pulmonary vessel segmentation in computed tomography images: a cascaded approach combining U-Net and parameter-adaptive fully connected conditional random fields

Introduction

Methods

Dataset

The principles of U-Net and PA-FCCRFs cascade algorithm

Training configurations

Results

Table 1

Table 2

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share