An Automated Term Definition Extraction System Using the Web Corpus in the Chinese Language

被引:0
|
作者
Leu, Fang-Yie [1 ]
Ko, Chih-Chieh [1 ]
机构
[1] Tunghai Univ, Dept Comp Sci, Taichung 407, Taiwan
基金
俄罗斯基础研究基金会;
关键词
definitions; web corpus; information extraction; Chinese language; text mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a system, named Del-Explorer, which analyzes the type of given Chinese terms, extracts term definitions from the Web, and selects answers from noisy Web pages. DefExplorer tillers out invalid data with a semantic approach. Two types of candidate sets, common and domain specific, are employed to cluster similar candidates into groups. Different approaches are also deployed to evaluate candidates' importance which is the key factor for selecting the best answers from retrieved candidates. Experimental results show that DefExplorer can effectively extract term definitions from the Web, especially for the definitions of out-of-vocabulary terms.
引用
收藏
页码:505 / 525
页数:21
相关论文
共 50 条
  • [21] Automated Extraction of Semantic Legal Metadata Using Natural Language Processing
    Sleimi, Amin
    Sannier, Nicolas
    Sabetzadeh, Mehrdad
    Briand, Lionel C.
    Dann, John
    2018 IEEE 26TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2018), 2018, : 124 - 135
  • [22] Using Google to Search Language Patterns in Web-Corpus: EFL Writing Pedagogy
    Kvashnina, Olga S.
    Sumtsova, Olga V.
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2018, 13 (03): : 173 - 179
  • [23] Automated Grading System using Natural Language Processing
    Rokade, Amit
    Patil, Bhushan
    Rajani, Sana
    Revandkar, Surabhi
    Shedge, Rajashree
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1123 - 1127
  • [24] Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
    Ali, Stephen R.
    Strafford, Huw
    Dobbs, Thomas D.
    Fonferko-Shadrach, Beata
    Lacey, Arron S.
    Pickrell, William Owen
    Hutchings, Hayley A.
    Whitaker, Iain S.
    FRONTIERS IN SURGERY, 2022, 9
  • [25] Automatic Chinese unknown word extraction using small-corpus-based method
    Chang, TH
    Lee, CH
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 459 - 464
  • [26] Chinese typed collocation extraction using corpus-based syntactic collocation patterns
    Li, Wanyin
    Lu, Qin
    Liu, James
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 248 - +
  • [27] Language independent web news extraction system based on text detection framework
    Wu, Yu-Chieh
    INFORMATION SCIENCES, 2016, 342 : 132 - 149
  • [28] Construction of Chinese-English Parallel Corpus of Chinese Laws and Regulations and Term Extraction Assisted by CAD Virtual Reality Technology
    Zhao Q.
    Wang J.
    Computer-Aided Design and Applications, 2022, 20 : 46 - 55
  • [29] Automated Adaptive Mobile Learning System using the Semantic WEB
    Hamada, Samir
    Alshalabi, Ibrahim Alkore
    Elleithy, Khaled
    Badara, Ioana A.
    2016 IEEE LONG ISLAND SYSTEMS, APPLICATIONS AND TECHNOLOGY CONFERENCE (LISAT), 2016,
  • [30] Tibetan-Chinese Cross Language Named Entity Extraction Based on Comparable Corpus and Naturally Annotated Resources
    Sun, Yuan
    Guo, Wenbin
    Zhao, Xiaobing
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2014, : 288 - 295