intfloat/e5-mistral-7b-instruct¶
Model Information¶
intfloat/e5-mistral-7b-instruct
is a 7.3B parameter instruction-tuned embedding model built upon the Mistral-7B-v0.1 architecture. It is designed to generate high-quality text embeddings, particularly for English-language tasks such as passage ranking, retrieval, and semantic similarity. The model supports input sequences up to 4096 tokens and allows for customization through natural language instructions, enhancing its versatility in various applications.
- Model Developer: Intfloat
- Model Release Date: January 2024
- Supported Languages: While the model has been fine-tuned on a mixture of multilingual datasets, it is primarily optimized for English-language tasks. For applications requiring robust multilingual support, consider using the
multilingual-e5-large
model.
Model Architecture¶
The intfloat/e5-mistral-7b-instruct
model utilizes the Mistral-7B-v0.1 architecture, featuring:
- 32 transformer layers
- Embedding size of 4096
It is fine-tuned using supervised contrastive learning on a mixture of multilingual datasets, enabling it to produce dense and semantically rich text embeddings. The instruction-tuning approach allows the model to adapt to specific tasks through natural language prompts.
Benchmark Scores¶
A comparison of e5-mistral-7b-instruct
with other E5 models on key benchmarks:
Model | BEIR | MTEB | Notes |
---|---|---|---|
e5-base | 51.5 | 56.7 | English baseline. |
e5-large | 54.2 | 58.7 | Larger model, better accuracy. |
e5-mistral | 56.9 | 60.3 | Instr.-tuned; ranked 3rd multilingual. |
BEIR = Retrieval across 18 datasets. MTEB = Avg. across classification, retrieval, and clustering.