A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引：0

作者：

Liang, Yinghong ^{[1
]}

Tan, Hongye ^{[2
]}

Li, Hui ^{[1
]}

Wang, Zhigang ^{[1
]}

Gui, Wenming ^{[1
]}

机构：

[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China

[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China

来源：

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年

基金：

中国国家自然科学基金;

关键词：

Multi-Word Expression; Bi-LSTM; Language-Independent;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.

引用

页码：3273 / 3279

页数：7

共 50 条

[21] Superimposition: A language-independent approach to software composition
Apel, Sven
Lengauer, Christian
SOFTWARE COMPOSITION, 2008, 4954 : 20 - 35
[22] LOOKing for multi-word expressions in American Sign Language
Hou, Lynn
COGNITIVE LINGUISTICS, 2022, 33 (02) : 291 - 337
[23] A contrastive Approach to Multi-word Term Extraction from Domain-specific Corpora
Bonin, Francesca
Dell' Orletta, Felice
Venturi, Giulia
Montemagni, Simonetta
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
[24] A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
Wang, Yanbo J.
Coenen, Frans
Sanderson, Robert
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 338 - +
[25] Language-Independent Text Lines Extraction Using Seam Carving
Saabni, Raid
El-Sana, Jihad
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 563 - 568
[26] l(1) Regularization of Word Embeddings for Multi-Word Expression Identification
Berend, Gabor
ACTA CYBERNETICA, 2018, 23 (03): : 801 - 813
[27] Extraction of multi-word expressions from small parallel corpora
Tsvetkov, Yulia
Wintner, Shuly
NATURAL LANGUAGE ENGINEERING, 2012, 18 : 549 - 573
[28] Multi-word collocation extraction by syntactic composition of collocation bigrams
Seretan, V
Nerima, L
Wehrli, E
RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 91 - 100
[29] Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Bollegala, Danushka
Kiryo, Ryuichi
Tsujino, Kosuke
Yukawa, Haruki
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3851 - 3860
[30] Frameworks compiled from declarations: a language-independent approach
van der Walt, Paul
Consel, Charles
Balland, Emilie
SOFTWARE-PRACTICE & EXPERIENCE, 2017, 47 (05): : 741 - 762

← 1 2 3 4 5 →