nvidia/llama-3.2-nv-rerankqa-1b-v2¶
Model Information¶
nvidia/llama-3.2-nv-rerankqa-1b-v2
is a reranking model optimized for retrieval-augmented generation (RAG) workflows. Built on top of the LLaMA 3.2 architecture and fine-tuned by NVIDIA, it is designed to evaluate the relevance of candidate documents to a given query using a cross-encoder approach. The model supports input sequences up to 8192 tokens and is particularly effective in multilingual and cross-lingual question-answering retrieval contexts.
- Model Developer: NVIDIA
- Model Release Date: April 19, 2025
- Supported Languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish
Model Architecture¶
The nvidia/llama-3.2-nv-rerankqa-1b-v2
model utilizes the Mistral-7B-v0.1 architecture, featuring:
- 32 transformer layers
- Embedding size of 4096
It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts, particularly in reranking contexts.
Benchmark Scores¶
nvidia/llama-3.2-nv-rerankqa-1b-v2
is evaluated for reranking performance in multilingual and English QA tasks. It significantly improves retrieval quality in RAG systems.
Task | Metric | Baseline | Reranker |
---|---|---|---|
English QA | Recall@5 | 78.2% | 87.5% |
NDCG@5 | 72.6% | 84.0% | |
Multilingual | Recall@5 | 65.1% | 77.8% |
NDCG@5 | 60.4% | 75.3% | |
Open-domain | MRR@10 | 49.3% | 63.7% |
Metrics based on NeMo Retriever evaluations and RAG pipeline benchmarks.