intfloat/multilingual-e5-large¶

Model Information¶

intfloat/multilingual-e5-large is a multilingual text embedding model designed for tasks such as semantic search, information retrieval, and text similarity. Built upon the XLM-RoBERTa architecture, it has been continually trained on a mixture of multilingual datasets, enabling it to support a wide range of languages. The model produces 1024-dimensional embeddings and is optimized for high performance across various benchmarks.

Model Developer: Intfloat
Model Release Date: Mid-2023
Supported Languages: The model supports 100 languages inherited from XLM-RoBERTa. However, performance may vary, especially for low-resource languages. For optimal results, it's recommended to use the model primarily for English tasks.

Model Architecture¶

Base Model: XLM-RoBERTa-large
Number of Layers: 24
Embedding Size: 1024
Training Objective: Contrastive learning on multilingual datasets to produce high-quality text embeddings.

Benchmark Scores¶

Mr. TyDi Benchmark (Mean Reciprocal Rank @10)¶

Model	Avg MRR@10	ar	bn	en	fi	id	ja	ko	ru	sw	te	th
BM25	33.3	36.7	41.3	15.1	28.8	38.2	21.7	28.1	32.9	39.6	42.4	41.7
mDPR	16.7	26.0	25.8	16.2	11.3	14.6	18.1	21.9	18.5	7.3	10.6	13.5
BM25 + mDPR	41.7	49.1	53.5	28.4	36.5	45.5	35.5	36.2	42.7	40.5	42.0	49.2
multilingual-e5-small	64.4	71.5	66.3	54.5	57.7	63.2	55.4	54.3	60.8	65.4	89.1	70.1
multilingual-e5-base	65.9	72.3	65.0	58.5	60.8	64.9	56.6	55.8	62.7	69.0	86.6	72.7
multilingual-e5-large	70.5	77.5	73.2	60.8	66.8	68.5	62.5	61.6	65.8	72.7	90.2	76.2

Note: Scores are based on the Mr. TyDi benchmark, which evaluates multilingual information retrieval performance.

intfloat/multilingual-e5-large¶

Model Information¶

Model Architecture¶

Benchmark Scores¶

Mr. TyDi Benchmark (Mean Reciprocal Rank @10)¶

References¶