[Neucom] GPPT: Graph Pyramid Pooling Transformer for Visual Scene

2025-06-06 19:56:22 科研 1531

简介

Zhi-Peng Li, Wen-Jian Liu, Xin Sun, Yi-Jie Pan, Valeriya Gribova, Vladimir Fedorovich Filaretov, Anthony G. Cohn, De-Shuang Huang,"GPPT: Graph Pyramid Pooling Transformer for Visual Scene",Neurocomputing,2025

In the field of computer vision, network architectures are critical to the performance of tasks. Vision Graph Neural Network (ViG) has shown remarkable results in handling various vision tasks with their unique characteristics. However, the lack of multi-scale information in ViG limits its expressive capability. To address this challenge, we propose a Graph Pyramid Pooling Transformer (GPPT), which aims to enhance the performance of the model by introducing multi-scale feature learning. The core advantage of GPPT is its ability to effectively capture and fuse feature information at different scales. Specifically, it first generates multi-level pooled graphs using a graph pyramid pooling structure. Next, it encodes features at each scale using a weight-shared Graph Convolutional Neural Network (GCN). Then, it enhances information exchange across scales through a cross-scale feature fusion mechanism. Finally, it captures long-range node dependencies using a transformer module. The experimental results demonstrate that GPPT achieves exceptional performance across various visual scenes, including image classification, and object detection, highlighting its generality and validity.

full paper

文章链接：http://lizhipengtj.top/article/17/

[Sci Rep] Interpretable deep learning of single-cell and epigenetic data reveals novel molecular insights in aging

0条评论