XLM-RoBERTa
Introduction
XLM-R (XLM-RoBERTa, Unsupervised Cross-lingual Representation Learning at Scale) is a scaled cross lingual sentence encoder. It is
trained on 2.5T
of data across 100
languages data filtered from
Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross
lingual benchmarks.
Pre-trained models
Model |
Description |
#params |
vocab size |
Download |
xlmr.base.v0 |
XLM-R using the BERT-base architecture |
250M |
250k |
xlm.base.v0.tar.gz |
xlmr.large.v0 |
XLM-R using the BERT-large architecture |
560M |
250k |
xlm.large.v0.tar.gz |
(Note: The above models are still under training, we will update the
weights, once fully trained, the results are based on the above
checkpoints.)
Results
XNLI (Conneau et al., 2018):
Model |
average |
en |
fr |
es |
de |
el |
bg |
ru |
tr |
ar |
vi |
th |
zh |
hi |
sw |
ur |
roberta.large.mnli (TRANSLATE-TEST) |
77.8 |
91.3 |
82.9 |
84.3 |
81.2 |
81.7 |
83.1 |
78.3 |
76.8 |
76.6 |
74.2 |
74.1 |
77.5 |
70.9 |
66.7 |
66.8 |
xlmr.large.v0 (TRANSLATE-TRAIN-ALL) |
82.4 |
88.7 |
85.2 |
85.6 |
84.6 |
83.6 |
85.5 |
82.4 |
81.6 |
80.9 |
83.4 |
80.9 |
83.3 |
79.8 |
75.9 |
74.3 |
MLQA (Lewis et al., 2018)
Model |
average |
en |
es |
de |
ar |
hi |
vi |
zh |
BERT-large |
|
80.2/67.4 |
|
|
|
|
|
|
mBERT |
57.7 / 41.6 |
77.7 / 65.2 |
64.3 / 46.6 |
57.9 / 44.3 |
45.7 / 29.8 |
43.8 / 29.7 |
57.1 / 38.6 |
57.5 / 37.3 |
xlmr.large.v0 |
70.0 / 52.2 |
80.1 / 67.7 |
73.2 / 55.1 |
68.3 / 53.7 |
62.8 / 43.7 |
68.3 / 51.0 |
70.5 / 50.1 |
67.1 / 44.4 |
Citation
@article{
title = {Unsupervised Cross-lingual Representation Learning at Scale},
author = {Alexis Conneau and Kartikay Khandelwal
and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek
and Francisco Guzm\'an and Edouard Grave and Myle Ott
and Luke Zettlemoyer and Veselin Stoyanov
},
journal={},
year = {2019},
}