• No results found

BERT is Not an Interlingua and the Bias of Tokenization

N/A
N/A
Protected

Academic year: 2020

Share "BERT is Not an Interlingua and the Bias of Tokenization"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

Loading

Figure

Figure 1: Agglomerative clustering of Languages based on the PWCCA similarity between their represenations,generated from layer 6 of a pretrained multilingual uncased BERT.
Figure 2: How CCA is used to compare the representations of different languages at different layers in BERT.
Figure 3 demonstrates that for all language com-binations tested, the summary representation (as-sociated with the [CLS] token) for semanticallysimilar inputs translated into multiple languagesis most similar at the shallower layers of BERT,close to the in
Figure 6: PWCCA generated similarity matrix betweenlanguages.
+2

References

Related documents

The total coliform count from this study range between 25cfu/100ml in Joju and too numerous to count (TNTC) in Oju-Ore, Sango, Okede and Ijamido HH water samples as

[r]

Kathryn’s work focuses on public health policy, education programs, and prevention of injury and illness.  Kathryn has been involved in a number of active transportation

Doing Mad Studies: A Participatory Action Research Project to explore the experiences and impacts of being part of a Mad People’s History and Identity course and to

Our analysis, which is based on new evidence assembled for this study, describes the level of fees charged by veterinary surgeons for standard veterinary services and assesses the

We present a novel CNN-based architecture, referred to as Q-Net, to learn local feature descriptors that are useful for matching image patches from two different spectral bands..

Justeru, kajian yang bertujuan untuk mengenal pasti tahap kefahaman dan penggunaan konsep asid-bes dalam kehidupan harian di kalangan pelajar sangat perlu dijalankan supaya

pisum, only three individual proteins were common to all the aphid species; two paralogues of the GMC oxidoreductase family (glucose dehydrogenase; GLD) and ACYPI009881, an