nvidia/llama-3.2-nv-rerankqa-1b-v2¶

Model Information¶

nvidia/llama-3.2-nv-rerankqa-1b-v2 is a reranking model optimized for retrieval-augmented generation (RAG) workflows. Built on top of the LLaMA 3.2 architecture and fine-tuned by NVIDIA, it is designed to evaluate the relevance of candidate documents to a given query using a cross-encoder approach. The model supports input sequences up to 8192 tokens and is particularly effective in multilingual and cross-lingual question-answering retrieval contexts.

Model Developer: NVIDIA
Model Release Date: April 19, 2025
Supported Languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish

Model Architecture¶

The nvidia/llama-3.2-nv-rerankqa-1b-v2 model utilizes the Mistral-7B-v0.1 architecture, featuring:

32 transformer layers
Embedding size of 4096

It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts, particularly in reranking contexts.

Benchmark Scores¶

nvidia/llama-3.2-nv-rerankqa-1b-v2 is evaluated for reranking performance in multilingual and English QA tasks. It significantly improves retrieval quality in RAG systems.

Task	Metric	Baseline	Reranker
English QA	Recall@5	78.2%	87.5%
	NDCG@5	72.6%	84.0%
Multilingual	Recall@5	65.1%	77.8%
	NDCG@5	60.4%	75.3%
Open-domain	MRR@10	49.3%	63.7%

Metrics based on NeMo Retriever evaluations and RAG pipeline benchmarks.

nvidia/llama-3.2-nv-rerankqa-1b-v2¶

Model Information¶

Model Architecture¶

Benchmark Scores¶

References¶