Original Article
PAC-P2T: pyramid atrous convolution with pyramid pooling Transformer for polyp segmentation
Abstract
Background: Colonoscopy is the gold standard method for investigating the gastrointestinal tract. Localizing the polyps in colonoscopy images is crucial for colonoscopy screening, and it also plays a significant role in the following treatment, e.g., polyp resection. Although several Convolutional Neural Networks (CNNs) and Transformer-based methods have achieved substantial progress in polyp extraction, they still face difficulties in effectively addressing challenges like morphological variability and blurred edges in polyp segmentation, which limits their clinical application. This study aims to propose a more effective polyp segmentation method for clinical use by effectively combining the advantages of pyramid pooling and atrous convolution.
Methods: We propose a pyramid atrous convolution (PAC) with pyramid pooling Transformer (P2T) for polyp segmentation, namely PAC-P2T. Considering the effectiveness of P2T in contextual feature extraction, we first adopt P2T as the encoder to extract powerful features. Leveraging the ability of atrous convolution to extract features at the same scale but with varying receptive fields, we introduce a multi-layer PAC feature extraction module (MPAF) combined with a channel attention mechanism, thereby enhancing information flow in the decoder branch. In addition, to progressively expand the receptive fields while preserving image details, we integrate the single-level atrous convolution feature fusion module (SLAF) into each encoder side branch, promoting hierarchical feature propagation to subsequent lower-level branches.
Results: Experimental results conducted on five public colorectal polyp segmentation datasets demonstrate that PAC-P2T outperforms several state-of-the-art polyp extraction networks, which verify the effectiveness of the mechanism of simultaneously combining the pyramid pooling and PAC for polyp segmentation. In particular, compared with PraNet, PAC-P2T improved mean Dice coefficient/mean intersection over union (mDice/mIoU) by 18.7%/17.7% on ETIS and by 10%/9.6% on CVC-ColonDB.
Conclusions: By incorporating atrous convolution as a key feature extraction unit and integrating PAC into the Pyramid Pooling structure, the proposed PAC-P2T enhances the robustness of polyp region extraction, providing strong support for computer-aided polyp segmentation in clinical applications.

