XLM-RoBERTa

Introduction

XLM-R (XLM-RoBERTa, Unsupervised Cross-lingual Representation Learning at Scale) is a scaled cross lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross lingual benchmarks.

Pre-trained models

Model Description #params vocab size Download
xlmr.base.v0 XLM-R using the BERT-base architecture 250M 250k xlm.base.v0.tar.gz
xlmr.large.v0 XLM-R using the BERT-large architecture 560M 250k xlm.large.v0.tar.gz

(Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.)

Results

XNLI (Conneau et al., 2018):

Model average en fr es de el bg ru tr ar vi th zh hi sw ur
roberta.large.mnli (TRANSLATE-TEST) 77.8 91.3 82.9 84.3 81.2 81.7 83.1 78.3 76.8 76.6 74.2 74.1 77.5 70.9 66.7 66.8
xlmr.large.v0 (TRANSLATE-TRAIN-ALL) 82.4 88.7 85.2 85.6 84.6 83.6 85.5 82.4 81.6 80.9 83.4 80.9 83.3 79.8 75.9 74.3

MLQA (Lewis et al., 2018)

Model average en es de ar hi vi zh
BERT-large
80.2/67.4
mBERT 57.7 / 41.6 77.7 / 65.2 64.3 / 46.6 57.9 / 44.3 45.7 / 29.8 43.8 / 29.7 57.1 / 38.6 57.5 / 37.3
xlmr.large.v0 70.0 / 52.2 80.1 / 67.7 73.2 / 55.1 68.3 / 53.7 62.8 / 43.7 68.3 / 51.0 70.5 / 50.1 67.1 / 44.4

Citation

@article{
    title = {Unsupervised Cross-lingual Representation Learning at Scale},
    author = {Alexis Conneau and Kartikay Khandelwal
        and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek
        and Francisco Guzm\'an and Edouard Grave and Myle Ott
        and Luke Zettlemoyer and Veselin Stoyanov
    },
    journal={},
    year = {2019},
}