Chapter 5 The Role of the Source Language
5.3 Transferring Sentiment from Arabic and Chinese
5.4.2 English as a Target Language with Arabic as a Source
We compared the output of the Arabic-to-English transfer model (under D3 tok- enization) with a supervised English model trained on the same downsampled and sentiment distributed training dataset. This supervised model results in an accu- racy of 62.3, macro-averaged F-measure of 61.4, positive F-measure of 58.1, negative F-measure of 59.1, and neutral F-measure of 66.9. In contrast, the transfer model results in an accuracy of 43.2, macro-averaged F-measure of 42.8, positive F-measure of 43.3, negative F-measure of 38.2, and neutral F-measure of 46.8.
We sampled and analyzed 60 errors from these two models. We found similar cat- egories of errors as with transferring from European and Indo-European languages to English - namely, those shown in Table 5.8 - however when transferring from Arabic, we observed more of errors like ‘sentiment indicator missed’ compared to the European and Indo-European model, where more errors were due to mispellings, mis- leading sentiment words, and requiring inference. Additionally, because of the small size of the training data and the negative bias in the distribution of sentiment, we observed many errors where the model predicted ‘negative’ sentiment due to majority baseline influence, even though the tweet contained no negative sentiment indicators.
We again divided the error samples into four groups:
• In the first group (51.7% of cases), the supervised model makes a correct pre- diction but the cross-lingual model result in an error. The majority of errors in this group come from a negative majority baseline influence or a missed key sentiment indicator.
– do you know what you wanna do when you’re done w/ school yet? (neutral, transfer predicts negative)
– rt i have a crush on fall weather, hot drinks, and cozy sweaters (positive,
transfer predicts negative)
• In the second group (23.3% of cases), the supervised model and the cross- lingual model make the same error. These errors mostly required inference or came from a wrong gold annotation.
– please mention me i really want to reach my goal x37 (gold neutral[wrong
or unclear], transfer and supervised predict positive)
– for my birthday i got a humidifier and a de-humidifier ... i put them in
the same room and let them fight it out (gold positive[requires inference],
transfer and supervised predict negative)
• In the third group (15% of cases), the supervised and cross-lingual models make different kinds of errors. These again were due to a variety of causes, such as wrong gold, misleading sentiment words, missing a key sentiment indicator, or requiring influence.
– rt 1 more day until this is back im screaming (gold positive[requires infer-
ence], transfer predicts neutral, supervised predicts negative)
• In the last group (10% of cases), the gold and supervised models agree, but the cross-lingual model actually makes a better prediction.
– marriott hotels servers up a “ fresh ” approach - healthy vending machine
debuts (gold neutral[wrong], transfer predicts positive, supervised predicts
neutral)
The distribution of groups and output of the cross-lingual model relative to the supervised model is more or less consistent with that observed when transferring from
European and Indo-European languages.
5.5
Conclusion
This chapter studied the influence of the source language and its characteristics when transferring sentiment cross-lingually. In contrast to most previous work which as- sumes that the source language is English, we evaluated the performance of cross- lingual sentiment models when trained on European and Indo-European languages, as well as Arabic and Chinese. Moreover, to facilitate the transfer of sentiment from Arabic, we introduced new techniques such as pivoting with machine translation to create an Arabic-Tigrinya corpus, and applying preprocessing schemes to reduce the sparsity of bilingual features that arise from morphological complexity. Our findings, summarized below, point to the important role played by the source language when transferring sentiment cross-lingually and the need for a future direction towards increasing resources made available to moderately resourced languages such as Slo- vak, Arabic, or Chinese, to faciliate transfer to target languages in similar language families.
• Language families: Languages from similar language families transfer sen-
timent well from each other. This was especially the case for the Germanic and Slavic languages, and evident in the performance of English compared to Arabic and Chinese when transferring to most Indo-European languages, even when using similarly sized resources. The success of language family transfer for sentiment analysis is consistent with past results on other cross-lingual tasks, such as direct transfer of part-of-speech tagging (Kim et al., 2017).
• Resource sizes and distribution: Languages with large parallel resources
as demonstrated by the success of languages like English (large parallel corpus) and Swedish (balanced dataset) when transferring to other European languages.
• Morphological richness: Languages with similar morphological complexity
and vocabulary sizes transfer sentiment well from each other. This is demon- strated by the success of sentiment transfer amongst languages like German, Bulgarian, and Swedish, or Arabic, Slovak, Croatian, and Tigrinya, which are similar in vocabulary size. Moreover, applying high-resource morphological to- kenization schemes enables Arabic to transfer sentiment better on average and is consistent with past results on machine translation.
Our error analysis with English as a target language revealed that Twitter-specific out-of-vocabulary words, which are unlikely to occur in a translation corpus or Wikipedia comparable corpus, are a source of error in the model; future work for improving the performance of untargeted cross-lingual sentiment models could thus focus on the collection and learning of bilingual embeddings from Twitter and social media corpora. In the next chapter, we turn to targeted sentiment analysis, where we focus on identifying sentiment towards targets in short documents.