Skip to content

nvidia/llama-3-2-nv-embedqa-1b-v2

Model Information

The nvidia/llama-3-2-nv-embedqa-1b-v2 model is optimized for multilingual and cross-lingual text question-answering retrieval. It supports long documents up to 8192 tokens and features dynamic embedding sizes (Matryoshka Embeddings), significantly reducing the data storage footprint by 35x.

  • Model Developer: NVIDIA
  • Model Release Date: April 12, 2025
  • Supported Languages:
    • Primary: English (US)
    • Additional Support: Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish.

Model Architecture

  • Base Model: Fine-tuned Llama3.2 1b retriever
  • Architecture Type: Transformer encoder
  • Layers: 16
  • Embedding Dimension: Configurable (maximum 2048, other options include 384, 512, 768, 1024)
  • Attention Mechanism: Bi-encoder architecture with contrastive learning
  • Training Approach: Semi-supervised pre-training on 12M samples from public datasets and fine-tuning on 1M samples.

Benchmark Scores

The model has been evaluated on multiple academic benchmarks:

Benchmark Model Embedding Dimension Metric Score
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA) llama-3.2-nv-embedqa-1b-v2 2048 Average Recall@5 68.60%
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA) llama-3.2-nv-embedqa-1b-v2 384 Average Recall@5 64.48%
Multilingual Capabilities (MIRACL Benchmark) llama-3.2-nv-embedqa-1b-v2 2048 Average Recall@5 60.75%
Cross-Lingual Capabilities (MLQA Benchmark) llama-3.2-nv-embedqa-1b-v2 2048 Average Recall@5 79.86%
Long Document Support (MLDR Benchmark) llama-3.2-nv-embedqa-1b-v2 2048 Average Recall@5 59.55%

Note: The model demonstrates superior performance in multilingual, cross-lingual, and long-document retrieval tasks compared to other open and commercial retriever models.

References