Multimodal Detection and Classification of Head Movements in Face-to-Face Conversations: Exploring Models, Features and Their Interaction
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Multimodal Detection and Classification of Head Movements in Face-to-Face Conversations : Exploring Models, Features and Their Interaction. / Agirrezabal, Manex; Paggio, Patrizia; Navarretta, Costanza; Jongejan, Bart.
Gesture and Speech in Interaction (GESPIN 2023). Nijmegen : Max Planck Institut for Psycholinguistics, 2023.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Multimodal Detection and Classification of Head Movements in Face-to-Face Conversations
T2 - Exploring Models, Features and Their Interaction
AU - Agirrezabal, Manex
AU - Paggio, Patrizia
AU - Navarretta, Costanza
AU - Jongejan, Bart
PY - 2023
Y1 - 2023
N2 - In this work we perform multimodal detection and classificationof head movements from face to face video conversation data.We have experimented with different models and feature setsand provided some insight on the effect of independent features,but also how their interaction can enhance a head movementclassifier. Used features include nose, neck and mid hip positioncoordinates and their derivatives together with acoustic features,namely, intensity and pitch of the speaker on focus. Resultsshow that when input features are sufficiently processed by in-teracting with each other, a linear classifier can reach a similarperformance to a more complex non-linear neural model withseveral hidden layers. Our best models achieve state-of-the-artperformance in the detection task, measured by macro-averagedF1 score.
AB - In this work we perform multimodal detection and classificationof head movements from face to face video conversation data.We have experimented with different models and feature setsand provided some insight on the effect of independent features,but also how their interaction can enhance a head movementclassifier. Used features include nose, neck and mid hip positioncoordinates and their derivatives together with acoustic features,namely, intensity and pitch of the speaker on focus. Resultsshow that when input features are sufficiently processed by in-teracting with each other, a linear classifier can reach a similarperformance to a more complex non-linear neural model withseveral hidden layers. Our best models achieve state-of-the-artperformance in the detection task, measured by macro-averagedF1 score.
M3 - Article in proceedings
BT - Gesture and Speech in Interaction (GESPIN 2023)
PB - Max Planck Institut for Psycholinguistics
CY - Nijmegen
ER -
ID: 374969032