syscv-community/sam-hq-vit-base¶

Model Information¶

syscv-community/sam-hq-vit-base is a high-quality, efficient image segmentation model that builds upon the original Segment Anything Model (SAM). It delivers enhanced mask accuracy with minimal increase in computational demands, making it especially effective for scenarios requiring detailed segmentation, even when provided with vague or minimal prompts.

Model Developer: SYSCV Community
Model Release Date: May 2023 (SAM-HQ)
Supported Task: Image Segmentation via point prompt

Model Architecture¶

syscv-community/sam-hq-vit-base enhances the original SAM framework by modifying its decoder to include a High-Quality (HQ) output token. This addition allows the model to produce more detailed masks directly during inference, especially around object edges and fine structures. It maintains the same ViT-B (Vision Transformer - Base) backbone used in SAM, preserving the strengths of the original architecture.

While SAM relied on lower-resolution masks followed by upscaling, HQ-SAM generates high-resolution outputs natively, eliminating the need for additional refinement steps. These architectural improvements are achieved with minimal increase in computational cost, ensuring the model remains fast and responsive in real-time use cases.

Key Architecture Details

Model Type: Image Segmentation Model (Modified Transformer-based architecture)
Parameters: 362.1M
- ~358M from the frozen ViT-B image encoder (inherited from SAM)
- ~4.1M trainable parameters in the HQ mask decoder
Base Architecture: Vision Transformer (ViT-B) for image encoding
Enhancements: Integration of a High-Quality (HQ) output token into the mask decoder for improved mask fidelity.
Input:
- RGB Image
- Prompt (support in AI Refinery: points)
Output: High-quality segmentation masks
Training:
- Inherits SAM’s pretraining on the SA-1B dataset (1B masks)
- Fine-tuned with additional high-quality segmentation datasets to improve edge detail and structure accuracy
Capabilities:
- Generates highly accurate segmentation masks from various prompts.
- Handles ambiguous prompts with improved precision.
- Optimized for a balance between speed and quality.

Benchmark Scores:¶

SAM-HQ (ViT-Base) demonstrates a measurable improvement in mask quality over the original SAM (ViT-Base) across various segmentation benchmarks, achieving higher precision with minimal computational overhead.

Category	Benchmark Dataset	Metric	SAM-HQ (ViT-Base)
Mask Quality	COCO	Average Precision (AP)	~46.7
Mask Quality	COCO	Boundary AP	31.3

syscv-community/sam-hq-vit-base¶

Model Information¶

Model Architecture¶

Benchmark Scores:¶

References¶