论文标题
CMCL 2022共享任务的NU HLT:通用语言空间中人类阅读行为的多语言和跨语言预测
NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space
论文作者
论文摘要
在本文中,我们提出了一个统一模型,该模型可用于多种语言的多语言和跨语言预测单词的阅读时间。该模型成功背后的秘密是在预处理步骤中,所有单词都通过国际语音字母(IPA)转换为其通用语言表示。据我们所知,这是第一个有利地利用这两个任务的语言语言特性的研究。提取了各种特征类型,涵盖了用于模型训练的基本频率,n-gram,信息理论和心理语言动机预测指标。填充的随机森林模型分别针对平均第一固定持续时间(FFDAVG)和平均总阅读时间(TRTAVG)的MAE得分为3.8031和3.9065 MAE得分的最佳性能。
In this paper, we present a unified model that works for both multilingual and crosslingual prediction of reading times of words in various languages. The secret behind the success of this model is in the preprocessing step where all words are transformed to their universal language representation via the International Phonetic Alphabet (IPA). To the best of our knowledge, this is the first study to favorable exploit this phonological property of language for the two tasks. Various feature types were extracted covering basic frequencies, n-grams, information theoretic, and psycholinguistically-motivated predictors for model training. A finetuned Random Forest model obtained best performance for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation duration (FFDAvg) and mean total reading time (TRTAvg) respectively.