Data-driven Identification of Idioms in Song Lyrics

被引:0
|
作者
Amin, Miriam [1 ]
Fankhauser, Peter [2 ]
Kupietz, Marc [2 ]
Schneider, Roman [2 ]
机构
[1] Univ Leipzig, Leipzig, Germany
[2] Leibniz Inst German Language, Mannheim, Germany
关键词
LOVE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
引用
收藏
页码:13 / 22
页数:10
相关论文
共 50 条
  • [1] Data-Driven Sparse System Identification
    Fattahi, Salar
    Sojoudi, Somayeh
    [J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 462 - 469
  • [2] Data-Driven Load Pattern Identification
    Fang, Mengqiu
    Xiang, Yue
    Pan, Li
    Xu, Bohan
    Liu, Youbo
    Liu, Junyong
    Wang, Tianhao
    [J]. 2021 IEEE IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (IEEE I&CPS ASIA 2021), 2021, : 568 - 573
  • [3] Data-driven identification of crystallization kinetics
    Nyande, Baggie W.
    Nagy, Zoltan K.
    Lakerveld, Richard
    [J]. AICHE JOURNAL, 2024, 70 (05)
  • [4] A Data-Driven Method for Congestion Identification and Classification
    Zarindast, Atousa
    Poddar, Subhadipto
    Sharma, Anuj
    [J]. JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2022, 148 (04)
  • [5] Data-Driven Identification of Nonlinear Flame Models
    Ghani, Abdulla
    Boxx, Isaac
    Noren, Carrie
    [J]. JOURNAL OF ENGINEERING FOR GAS TURBINES AND POWER-TRANSACTIONS OF THE ASME, 2020, 142 (12):
  • [6] Data-driven scale identification in oscillatory dynamos
    Guseva, Anna
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 528 (02) : 1685 - 1696
  • [7] Data-driven identification of complex disease phenotypes
    Strauss, Markus J.
    Niederkrotenthaler, Thomas
    Thurner, Stefan
    Kautzky-Willer, Alexandra
    Klimek, Peter
    [J]. JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2021, 18 (180)
  • [8] Data-Driven Identification of Hydrogen Sulfide Scavengers
    Yang, Chun-tao
    Wang, Yingying
    Marutani, Eizo
    Ida, Tomoaki
    Ni, Xiang
    Xu, Shi
    Chen, Wei
    Zhang, Hui
    Akaike, Takaaki
    Ichinose, Fumito
    Xian, Ming
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2019, 58 (32) : 10898 - 10902
  • [9] Archetypal analysis for data-driven prototype identification
    Ragozini, G.
    Palumbo, F.
    D'Esposito, M. R.
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2017, 10 (01) : 6 - 20
  • [10] Data-Driven Vehicle Identification by Image Matching
    Rodriguez-Serrano, Jose A.
    Sandhawalia, Harsimrat
    Bala, Raja
    Perronnin, Florent
    Saunders, Craig
    [J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7584 : 536 - 545