The Copenhagen Team Participation in the Check-Worthiness Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF-2018 CheckThat! Lab

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt


  • paper_81

    Forlagets udgivne version, 404 KB, PDF-dokument

We predict which claim in a political debate should be prioritized
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
TitelCLEF 2018 Working Notes
RedaktørerLinda Cappellato , Nicola Ferro , Jian-Yun Nie, Laure Soulier
Antal sider8
StatusUdgivet - 2018
Begivenhed19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, Frankrig
Varighed: 10 sep. 201814 sep. 2018


Konference19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018
NavnCEUR Workshop Proceedings

Antal downloads er baseret på statistik fra Google Scholar og

Ingen data tilgængelig

ID: 202539747