The Copenhagen Team Participation in the Check-Worthiness Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF-2018 CheckThat! Lab
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Documents
- paper_81
Final published version, 404 KB, PDF document
We predict which claim in a political debate should be prioritized
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
Original language | English |
---|---|
Title of host publication | CLEF 2018 Working Notes |
Editors | Linda Cappellato , Nicola Ferro , Jian-Yun Nie, Laure Soulier |
Number of pages | 8 |
Publisher | CEUR-WS.org |
Publication date | 2018 |
Edition | 10 |
Article number | 81 |
Publication status | Published - 2018 |
Event | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France Duration: 10 Sep 2018 → 14 Sep 2018 |
Conference
Conference | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 |
---|---|
Land | France |
By | Avignon |
Periode | 10/09/2018 → 14/09/2018 |
Series | CEUR Workshop Proceedings |
---|---|
Volume | 2125 |
ISSN | 1613-0073 |
- CNN, Fact checking, Political debates, RNN
Research areas
Number of downloads are based on statistics from Google Scholar and www.ku.dk
No data available
ID: 202539747