Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. / Felbo, Bjarke ; Mislove, Alan ; Søgaard, Anders; Rahwan, Iyad ; Lehmann, Sune.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. p. 615–1625.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
AU - Felbo, Bjarke
AU - Mislove, Alan
AU - Søgaard, Anders
AU - Rahwan, Iyad
AU - Lehmann, Sune
PY - 2017
Y1 - 2017
N2 - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.
AB - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.
M3 - Article in proceedings
SP - 615
EP - 1625
BT - Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
PB - Association for Computational Linguistics
T2 - 2017 Conference on Empirical Methods in Natural Language Processing
Y2 - 9 September 2017 through 11 September 2017
ER -
ID: 195015150