A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引:0
|
作者
Liang, Yinghong [1 ]
Tan, Hongye [2 ]
Li, Hui [1 ]
Wang, Zhigang [1 ]
Gui, Wenming [1 ]
机构
[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China
[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China
来源
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年
基金
中国国家自然科学基金;
关键词
Multi-Word Expression; Bi-LSTM; Language-Independent;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.
引用
收藏
页码:3273 / 3279
页数:7
相关论文
共 50 条
  • [21] Superimposition: A language-independent approach to software composition
    Apel, Sven
    Lengauer, Christian
    SOFTWARE COMPOSITION, 2008, 4954 : 20 - 35
  • [22] LOOKing for multi-word expressions in American Sign Language
    Hou, Lynn
    COGNITIVE LINGUISTICS, 2022, 33 (02) : 291 - 337
  • [23] A contrastive Approach to Multi-word Term Extraction from Domain-specific Corpora
    Bonin, Francesca
    Dell' Orletta, Felice
    Venturi, Giulia
    Montemagni, Simonetta
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [24] A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
    Wang, Yanbo J.
    Coenen, Frans
    Sanderson, Robert
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 338 - +
  • [25] Language-Independent Text Lines Extraction Using Seam Carving
    Saabni, Raid
    El-Sana, Jihad
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 563 - 568
  • [26] l(1) Regularization of Word Embeddings for Multi-Word Expression Identification
    Berend, Gabor
    ACTA CYBERNETICA, 2018, 23 (03): : 801 - 813
  • [27] Extraction of multi-word expressions from small parallel corpora
    Tsvetkov, Yulia
    Wintner, Shuly
    NATURAL LANGUAGE ENGINEERING, 2012, 18 : 549 - 573
  • [28] Multi-word collocation extraction by syntactic composition of collocation bigrams
    Seretan, V
    Nerima, L
    Wehrli, E
    RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 91 - 100
  • [29] Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
    Bollegala, Danushka
    Kiryo, Ryuichi
    Tsujino, Kosuke
    Yukawa, Haruki
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3851 - 3860
  • [30] Frameworks compiled from declarations: a language-independent approach
    van der Walt, Paul
    Consel, Charles
    Balland, Emilie
    SOFTWARE-PRACTICE & EXPERIENCE, 2017, 47 (05): : 741 - 762