Web-based Chinese term extraction in the field of study

被引:1
|
作者
Guo, Rui [1 ]
Qiu, Jing [1 ]
Zhang, Guanghua [1 ]
机构
[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang, Peoples R China
关键词
natural language processing; text extraction; Chinese word segmentation; TF-IDF; DR plus DC;
D O I
10.1109/SKG.2015.45
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In today's era of big data, huge amounts of Web contains many important information. From the Web to extract domain-specific term is an indispensable part of the natural language processing Web, and it also plays an important role in the domain ontology study. Chinese text has no evident difference between words, therefore the present stage in Web text extraction is difficult in the field of Chinese text. This article will put forward to more accurately extract the Chinese text. First by removing stop words, Chinese word segmentation, lexical analysis to extract the nouns and noun phrases as candidate field terms. Then according to the candidate term in the field of subject in the field of distribution, the distribution of the subject areas each page, and terms in the distribution of other background areas. Combination of subject areas and background areas, using both TF-IDF and DR + DC algorithm terminology and implementing the term extraction in the field of subject, based on the Chinese word segmentation system of Chinese Academy of Sciences (ICTCLAS) and Language Technology Platform Cloud of Harbin Institute of Technology (LTP) [15] two platform tools to implement the term extraction, so that extract more accurate domain terminology.
引用
收藏
页码:133 / 139
页数:7
相关论文
共 50 条
  • [31] Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction
    Sadat, Fatiha
    [J]. PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 519 - 526
  • [32] iLSE: An Intelligent Web-based System for Log Structuring and Extraction
    Serasinghe, Sahan
    Shen, Haifeng
    Chen, David
    [J]. 2017 24TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2017), 2017, : 588 - 593
  • [33] Exploiting a multilingual web-based encyclopedia for bilingual terminology extraction
    Sadat, Fatiha
    [J]. PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 2010, : 519 - 526
  • [34] Automatic extraction of translations from web-based bilingual materials
    Zhu, Qibo
    Inkpen, Diana
    Asudeh, Ash
    [J]. MACHINE TRANSLATION, 2007, 21 (03) : 139 - 163
  • [35] Field trial tests web-based wireless eSCADA
    Veitch, G
    Barboza, G
    Nasr, H
    Suheil, S
    [J]. OIL & GAS JOURNAL, 2002, 100 (38) : 39 - 41
  • [36] Web-based extraction of semantic relation instances for terminology work
    Halskov, Jakob
    Barriere, Caroline
    [J]. TERMINOLOGY, 2008, 14 (01): : 20 - 44
  • [37] Attitudes of Chinese Adults to Breastfeeding in Public: A Web-Based Survey
    Zhao, Ya
    Ouyang, Yan-Qiong
    Redding, Sharon R.
    [J]. BREASTFEEDING MEDICINE, 2017, 12 (05) : 316 - 321
  • [38] Classifying criminal charges in Chinese for web-based legal services
    Liu, CL
    Liao, TM
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 64 - 75
  • [39] Automatic construction of web-based English/Chinese parallel corpora
    Tan Bin
    Zhou Xu-yan
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 114 - 117
  • [40] A Web-Based Search Engine for Chinese Calligraphic Manuscript Images
    Zhuang, Yi
    Jiang, Nan
    Hu, Haiyang
    [J]. ADVANCES IN WEB BASED LEARNING - ICWL 2009, 2009, 5686 : 464 - +