| Issue |
Wuhan Univ. J. Nat. Sci.
Volume 31, Number 1, February 2026
|
|
|---|---|---|
| Page(s) | 1 - 9 | |
| DOI | https://doi.org/10.1051/wujns/2026311001 | |
| Published online | 06 March 2026 | |
Deep Learning and Intelligent Perception
CLC number: TP391.41
MS-RWKV-UNet: Multi-Head Scan Receptance Weighted Key Value UNet for Medical Image Segmentation
基于多头扫描策略加权键值(RWKV)网络的医学图像分割方法
1
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
(杭州电子科技大学 计算机学院,浙江 杭州 310018)
2
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510014, Guangdong, China
(广州大学 计算机科学与网络工程学院,广东 广州 510006)
† Corresponding author. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
6
September
2025
Abstract
The Transformer has achieved great success in the field of medical image segmentation, but its quadratic computational complexity limits its application in dense medical image prediction. Recently, the receptance weighted key value (RWKV) architecture has garnered widespread attention due to its linear computational complexity and its capability of parallel computation during training. Despite the RWKV model's proficiency in addressing long-range modeling tasks with linear computational complexity, most current RWKV-based approaches employ static scanning patterns. These patterns may inadvertently incorporate biased prior knowledge into the model's predictions. To address this challenge, we propose a multi-head scan strategy combined with padding methods to effectively simulate spatial continuity in 2D images. Within the Feature Aggregation Attention (FAA) module, asymmetric convolutions are designed to aggregate 1D sequence features along a single dimension, thereby expanding effective receptive fields while preserving structural sparsity. Additionally, panoramic token shift (P-Shift) effectively models local dependency relationships by moving tokens from a wide receptive field. Extensive experiments conducted on the ISIC17/18 and ACDC datasets demonstrate that our method exhibits superior performance in dense medical image prediction tasks.
摘要
尽管Transformer架构在医学图像分割领域取得了显著成果,但其自注意力机制固有的二次计算复杂度限制了在密集预测任务中的应用。近年来,RWKV架构因其线性计算复杂度及训练时的高并行能力受到广泛关注。尽管RWKV模型能够以线性计算复杂度有效处理远程建模任务,但当前基于RWKV的方法多依赖静态扫描模式,容易引入有偏的先验知识,影响模型泛化性能。为应对这一挑战,我们提出结合填充方法的多头扫描策略,以更好地模拟二维图像中的空间连续性。在特征聚合注意力(FAA)模块中,通过设计异构卷积沿单一维度融合一维序列特征,在保持结构稀疏性的同时扩展有效感受野。此外,P-Shift通过宽感受野内的token移动增强局部依赖建模。在ISIC和ACDC数据集上的大量实验表明,所提出方法在多项密集预测任务中均优于现有基线模型,展现出更高的分割精度和鲁棒性。
Key words: multi-head scan receptance weighted key value (RWKV) / asymmetric convolution / panoramic token shift (P-Shift) / medical image segmentation
关键字 : 多头扫描RWKV / 异构卷积 / P-Shift / 医学图像分割
Cite this article:JIANG Dong, JI Zhongping, FANG Meie. MS-RWKV-UNet: Multi-Head Scan Receptance Weighted Key Value UNet for Medical Image Segmentation[J]. Wuhan Univ J of Nat Sci, 2026, 31(1): 1-9.
Biography: JIANG Dong, male, Master candidate, research direction: computer vision and medical image segmentation, etc. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
Foundation item: Supported by Zhejiang Provincial Natural Science Foundation of China (LY22F020025) and the National Natural Science Foundation of China( 62072126)
© Wuhan University 2026
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.
