Request for a Vision Transformer Model for Digital Image Segmentation #35477

hanshengzhu0001 · 2025-01-02T01:43:29Z

Model description

I would like to request the addition of a Vision Transformer (ViT) model specifically fine-tuned for digital image segmentation tasks. The model should leverage transformer-based architecture to effectively capture spatial relationships within images, improving performance in tasks such as medical image analysis, satellite image segmentation, or autonomous driving.

The Vision Transformer architecture has proven to be highly effective for various vision tasks by using self-attention mechanisms to capture long-range dependencies in images. This model would be particularly valuable for tasks requiring pixel-level classification, where traditional convolutional neural networks (CNNs) often struggle to capture global features effectively.

The ViT model should include the following key features:

Pretrained on a large, diverse image segmentation dataset.
Fine-tuned for pixel-level classification tasks such as medical image segmentation or semantic segmentation of everyday objects.
Support for standard ViT variants such as ViT-B (Base), ViT-L (Large), and ViT-H (Huge) depending on the task's computational budget.
Open-source weights and implementation.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

@amyeroberts, @qubvel

Uvi-12 · 2025-01-03T11:02:56Z

@amyeroberts @qubvel I’m interested in working on this task. Could you please confirm if this model is required and would be a valuable addition?

hanshengzhu0001 added the New model label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for a Vision Transformer Model for Digital Image Segmentation #35477

Request for a Vision Transformer Model for Digital Image Segmentation #35477

hanshengzhu0001 commented Jan 2, 2025

Uvi-12 commented Jan 3, 2025

Request for a Vision Transformer Model for Digital Image Segmentation #35477

Request for a Vision Transformer Model for Digital Image Segmentation #35477

Comments

hanshengzhu0001 commented Jan 2, 2025

Model description

Open source status

Provide useful links for the implementation

Uvi-12 commented Jan 3, 2025