nvidia/llama-3-2-nv-embedqa-1b-v2¶
Model Information¶
The nvidia/llama-3-2-nv-embedqa-1b-v2
model is optimized for multilingual and cross-lingual text question-answering retrieval. It supports long documents up to 8192 tokens and features dynamic embedding sizes (Matryoshka Embeddings), significantly reducing the data storage footprint by 35x.
- Model Developer: NVIDIA
- Model Release Date: April 12, 2025
- Supported Languages:
- Primary: English (US)
- Additional Support: Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish.
Model Architecture¶
- Base Model: Fine-tuned Llama3.2 1b retriever
- Architecture Type: Transformer encoder
- Layers: 16
- Embedding Dimension: Configurable (maximum 2048, other options include 384, 512, 768, 1024)
- Attention Mechanism: Bi-encoder architecture with contrastive learning
- Training Approach: Semi-supervised pre-training on 12M samples from public datasets and fine-tuning on 1M samples.
Benchmark Scores¶
The model has been evaluated on multiple academic benchmarks:
Benchmark | Model | Embedding Dimension | Metric | Score |
---|---|---|---|---|
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA) | llama-3.2-nv-embedqa-1b-v2 | 2048 | Average Recall@5 | 68.60% |
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA) | llama-3.2-nv-embedqa-1b-v2 | 384 | Average Recall@5 | 64.48% |
Multilingual Capabilities (MIRACL Benchmark) | llama-3.2-nv-embedqa-1b-v2 | 2048 | Average Recall@5 | 60.75% |
Cross-Lingual Capabilities (MLQA Benchmark) | llama-3.2-nv-embedqa-1b-v2 | 2048 | Average Recall@5 | 79.86% |
Long Document Support (MLDR Benchmark) | llama-3.2-nv-embedqa-1b-v2 | 2048 | Average Recall@5 | 59.55% |
Note: The model demonstrates superior performance in multilingual, cross-lingual, and long-document retrieval tasks compared to other open and commercial retriever models.