Skip to content

nvidia/llama-3.2-nv-rerankqa-1b-v2

Model Information

nvidia/llama-3.2-nv-rerankqa-1b-v2 is a reranking model optimized for retrieval-augmented generation (RAG) workflows. Built on top of the LLaMA 3.2 architecture and fine-tuned by NVIDIA, it is designed to evaluate the relevance of candidate documents to a given query using a cross-encoder approach. The model supports input sequences up to 8192 tokens and is particularly effective in multilingual and cross-lingual question-answering retrieval contexts.

  • Model Developer: NVIDIA
  • Model Release Date: April 19, 2025
  • Supported Languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish

Model Architecture

The nvidia/llama-3.2-nv-rerankqa-1b-v2 model utilizes the Mistral-7B-v0.1 architecture, featuring:

  • 32 transformer layers
  • Embedding size of 4096

It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts, particularly in reranking contexts.


Benchmark Scores

nvidia/llama-3.2-nv-rerankqa-1b-v2 is evaluated for reranking performance in multilingual and English QA tasks. It significantly improves retrieval quality in RAG systems.

Task Metric Baseline Reranker
English QA Recall@5 78.2% 87.5%
NDCG@5 72.6% 84.0%
Multilingual Recall@5 65.1% 77.8%
NDCG@5 60.4% 75.3%
Open-domain MRR@10 49.3% 63.7%

Metrics based on NeMo Retriever evaluations and RAG pipeline benchmarks.


References