intfloat/e5-mistral-7b-instruct¶

Model Information¶

intfloat/e5-mistral-7b-instruct is a 7.3B parameter instruction-tuned embedding model built upon the Mistral-7B-v0.1 architecture. It is designed to generate high-quality text embeddings, particularly for English-language tasks such as passage ranking, retrieval, and semantic similarity. The model supports input sequences up to 4096 tokens and allows for customization through natural language instructions, enhancing its versatility in various applications.

Model Developer: Intfloat
Model Release Date: January 2024
Supported Languages: While the model has been fine-tuned on a mixture of multilingual datasets, it is primarily optimized for English-language tasks. For applications requiring robust multilingual support, consider using the multilingual-e5-large model.

Model Architecture¶

The intfloat/e5-mistral-7b-instruct model utilizes the Mistral-7B-v0.1 architecture, featuring:

32 transformer layers
Embedding size of 4096

It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts.

Benchmark Scores¶

A comparison of e5-mistral-7b-instruct with other E5 models on key benchmarks:

Model	BEIR	MTEB	Notes
e5-base	51.5	56.7	English baseline.
e5-large	54.2	58.7	Larger model, better accuracy.
e5-mistral	56.9	60.3	Instr.-tuned; ranked 3^rd multilingual.

BEIR = Retrieval across 18 datasets. MTEB = Avg. across classification, retrieval, and clustering.

intfloat/e5-mistral-7b-instruct¶

Model Information¶

Model Architecture¶

Benchmark Scores¶

References¶