nvidia/llama-3-2-nv-embedqa-1b-v2¶

Model Information¶

The nvidia/llama-3-2-nv-embedqa-1b-v2 model is optimized for multilingual and cross-lingual text question-answering retrieval. It supports long documents up to 8192 tokens and features dynamic embedding sizes (Matryoshka Embeddings), significantly reducing the data storage footprint by 35x.

Model Developer: NVIDIA
Model Release Date: April 12, 2025
Supported Languages:
- Primary: English (US)
- Additional Support: Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish.

Model Architecture¶

Base Model: Fine-tuned Llama3.2 1b retriever
Architecture Type: Transformer encoder
Layers: 16
Embedding Dimension: Configurable (maximum 2048, other options include 384, 512, 768, 1024)
Attention Mechanism: Bi-encoder architecture with contrastive learning
Training Approach: Semi-supervised pre-training on 12M samples from public datasets and fine-tuning on 1M samples.

Benchmark Scores¶

The model has been evaluated on multiple academic benchmarks:

Benchmark	Model	Embedding Dimension	Metric	Score
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA)	llama-3.2-nv-embedqa-1b-v2	2048	Average Recall@5	68.60%
BeIR Benchmark (NQ, HotpotQA, FiQA, TechQA)	llama-3.2-nv-embedqa-1b-v2	384	Average Recall@5	64.48%
Multilingual Capabilities (MIRACL Benchmark)	llama-3.2-nv-embedqa-1b-v2	2048	Average Recall@5	60.75%
Cross-Lingual Capabilities (MLQA Benchmark)	llama-3.2-nv-embedqa-1b-v2	2048	Average Recall@5	79.86%
Long Document Support (MLDR Benchmark)	llama-3.2-nv-embedqa-1b-v2	2048	Average Recall@5	59.55%

Note: The model demonstrates superior performance in multilingual, cross-lingual, and long-document retrieval tasks compared to other open and commercial retriever models.

References¶

NVIDIA Model Card