XLM-RoBERTa¶

Introduction¶

XLM-R (XLM-RoBERTa, Unsupervised Cross-lingual Representation Learning at Scale) is a scaled cross lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross lingual benchmarks.

Tutorial¶

Tutorial in Notebook

Run the tutorial in Google Colab

Pre-trained models¶

Model	Description	#params	vocab size	Download
`xlmr.base.v0`	XLM-R using the BERT-base architecture	250M	250k	xlm.base.v0.tar.gz
`xlmr.large.v0`	XLM-R using the BERT-large architecture	560M	250k	xlm.large.v0.tar.gz

(Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.)

Results¶

XNLI (Conneau et al., 2018):

Model	average	en	fr	es	de	el	bg	ru	tr	ar	vi	th	zh	hi	sw	ur
`roberta.large.mnli` (TRANSLATE-TEST)	77.8	91.3	82.9	84.3	81.2	81.7	83.1	78.3	76.8	76.6	74.2	74.1	77.5	70.9	66.7	66.8
`xlmr.large.v0` (TRANSLATE-TRAIN-ALL)	82.4	88.7	85.2	85.6	84.6	83.6	85.5	82.4	81.6	80.9	83.4	80.9	83.3	79.8	75.9	74.3

MLQA (Lewis et al., 2018)

Model	average	en	es	de	ar	hi	vi	zh
`BERT-large`		80.2/67.4
`mBERT`	57.7 / 41.6	77.7 / 65.2	64.3 / 46.6	57.9 / 44.3	45.7 / 29.8	43.8 / 29.7	57.1 / 38.6	57.5 / 37.3
`xlmr.large.v0`	70.0 / 52.2	80.1 / 67.7	73.2 / 55.1	68.3 / 53.7	62.8 / 43.7	68.3 / 51.0	70.5 / 50.1	67.1 / 44.4

Citation¶

@article{
    title = {Unsupervised Cross-lingual Representation Learning at Scale},
    author = {Alexis Conneau and Kartikay Khandelwal
        and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek
        and Francisco Guzm\'an and Edouard Grave and Myle Ott
        and Luke Zettlemoyer and Veselin Stoyanov
    },
    journal={},
    year = {2019},
}