Skip to content

Qwen/Qwen3-VL-32B-Instruct

Model Information

Qwen/Qwen3-VL-32B-Instruct is a state-of-the-art vision-language model developed by Alibaba Cloud's Qwen Team, combining powerful language capabilities with advanced visual understanding. With 33 billion parameters and native 256K context length (expandable to 1M), it supports multimodal tasks including visual question answering, document analysis, OCR, and video understanding.

  • Model Developer: Alibaba Cloud (Qwen Team)
  • Model Release Date: May 2025
  • Supported Languages: 32 languages for OCR and text understanding, including English, Chinese, French, Spanish, German, Japanese, Korean, Portuguese, and other major languages
  • Applicable License: Use of this model is subject to the following license: Apache license

Model Architecture

Qwen/Qwen3-VL-32B-Instruct is a decoder-only transformer model with integrated vision encoder, designed for advanced multimodal understanding and generation.

Key Architecture Details:

  • Model Type: Vision-Language Model (VLM) with decoder-only transformer
  • Parameters: 33B
  • Context Length: Native 256K tokens, expandable to 1M tokens
  • Training: Pretrained on multimodal datasets (images, videos, documents), followed by instruction fine-tuning and safety alignment

Multimodal Capabilities:

  • Visual understanding (image captioning, VQA, scene analysis)
  • Document intelligence (OCR in 32 languages, chart interpretation)
  • Video understanding (up to 2 hours, temporal reasoning)
  • Visual code generation (diagrams, HTML/CSS/JS)
  • STEM & math reasoning

Benchmark Scores

Qwen3-VL-32B-Instruct demonstrates leading performance across comprehensive vision-language evaluations, excelling in both perception and reasoning tasks.

Benchmark Qwen3-VL-2B-Instruct Qwen3-VL-4B-Instruct Qwen3-VL-8B-Instruct Qwen3-VL-32B-Instruct Qwen2.5-VL-72B GPT5-Mini-Minimal Claude4-Sonnet-Without-Thinking
RealWorldQA 63.9 70.9 71.5 79.0 75.7* 73.3 68.1
MMStar 58.3 69.8 70.9 77.7 70.8* 61.3 67.4
SimpleVQA 40.7 48.0 50.2 56.9 58.2 50.3 52.8

* indicates score from report

References