Skip to content

intfloat/e5-mistral-7b-instruct

Model Information

intfloat/e5-mistral-7b-instruct is a 7.3B parameter instruction-tuned embedding model built upon the Mistral-7B-v0.1 architecture. It is designed to generate high-quality text embeddings, particularly for English-language tasks such as passage ranking, retrieval, and semantic similarity. The model supports input sequences up to 4096 tokens and allows for customization through natural language instructions, enhancing its versatility in various applications.

  • Model Developer: Intfloat
  • Model Release Date: January 2024
  • Supported Languages: While the model has been fine-tuned on a mixture of multilingual datasets, it is primarily optimized for English-language tasks. For applications requiring robust multilingual support, consider using the multilingual-e5-large model.

Model Architecture

The intfloat/e5-mistral-7b-instruct model utilizes the Mistral-7B-v0.1 architecture, featuring:

  • 32 transformer layers
  • Embedding size of 4096

It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts.


Benchmark Scores

A comparison of e5-mistral-7b-instruct with other E5 models on key benchmarks:

Model BEIR MTEB Notes
e5-base 51.5 56.7 English baseline.
e5-large 54.2 58.7 Larger model, better accuracy.
e5-mistral 56.9 60.3 Instr.-tuned; ranked 3rd multilingual.

BEIR = Retrieval across 18 datasets. MTEB = Avg. across classification, retrieval, and clustering.


References