Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita (2024) Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation? arXiv preprint arXiv:2406.16993 .

Full text not available from this repository.

Official URL: https://doi.org/10.48550/arXiv.2406.16993

Related URL: http://dx.doi.org/10.48550/arXiv.2406.16993

Abstract

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets.

Item Type:	Article
Source:	Copyright of this article belongs to Arxiv Publications.
ID Code:	136744
Deposited On:	10 Sep 2025 05:34
Last Modified:	10 Sep 2025 05:34

Repository Staff Only: item control page