Web-based Chinese term extraction in the field of study

被引:1
|
作者
Guo, Rui [1 ]
Qiu, Jing [1 ]
Zhang, Guanghua [1 ]
机构
[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang, Peoples R China
关键词
natural language processing; text extraction; Chinese word segmentation; TF-IDF; DR plus DC;
D O I
10.1109/SKG.2015.45
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In today's era of big data, huge amounts of Web contains many important information. From the Web to extract domain-specific term is an indispensable part of the natural language processing Web, and it also plays an important role in the domain ontology study. Chinese text has no evident difference between words, therefore the present stage in Web text extraction is difficult in the field of Chinese text. This article will put forward to more accurately extract the Chinese text. First by removing stop words, Chinese word segmentation, lexical analysis to extract the nouns and noun phrases as candidate field terms. Then according to the candidate term in the field of subject in the field of distribution, the distribution of the subject areas each page, and terms in the distribution of other background areas. Combination of subject areas and background areas, using both TF-IDF and DR + DC algorithm terminology and implementing the term extraction in the field of subject, based on the Chinese word segmentation system of Chinese Academy of Sciences (ICTCLAS) and Language Technology Platform Cloud of Harbin Institute of Technology (LTP) [15] two platform tools to implement the term extraction, so that extract more accurate domain terminology.
引用
收藏
页码:133 / 139
页数:7
相关论文
共 50 条
  • [1] Enhanced web-based translation extraction for English-Chinese CLIR
    School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane, QLD 4001, Australia
    [J]. Proc. Eleventh Australas. Doc. Comput. Symp., ACDS, 2006,
  • [2] Translation disambiguation in web-based translation extraction for English-Chinese CLIR
    Lu, Chengye
    Xu, Yue
    Geva, Shlomo
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 819 - 823
  • [3] Web-based Geospatial Information Extraction
    Kahler, Bart
    Jones, K. C.
    Bacher, Brian
    [J]. PROCEEDINGS OF THE 2012 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2012, : 46 - 50
  • [4] Web-Based Information Extraction Technology
    孙铁利
    教巍巍
    刘淑华
    [J]. Journal of Donghua University(English Edition), 2007, (02) : 288 - 292
  • [5] Web-based introductory chemistry module: A field study.
    Walker, DR
    Williamson, V
    Yennello, S
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1999, 218 : U247 - U247
  • [6] Web-based study on Chinese dermatologists' attitudes towards artificial intelligence
    Shen, Changbing
    Li, Chengxu
    Xu, Feng
    Wang, Ziyi
    Shen, Xue
    Gao, Jing
    Ko, Randy
    Jing, Yan
    Tang, Xiaofeng
    Yu, Ruixing
    Guo, Junhu
    Xu, Feng
    Meng, Rusong
    Cui, Yong
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2020, 8 (11)
  • [7] A study of the structure of Chinese customer satisfaction in a web-based shopping environment
    Chen, Hui
    Li, Zheng
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 1137 - +
  • [8] A Web-Based System for Emotion Vector Extraction
    Franzoni, Valentina
    Biondi, Giulio
    Milani, Alfredo
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT III, 2017, 10406 : 653 - 668
  • [9] Web-based extraction of periodical metadata information
    Li, Shengli
    Li, Changqing
    Yuan, Pingpeng
    Liu, Yingshu
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35 (12): : 13 - 15
  • [10] The Current Knowledge, Attitudes, and Practices of the Neglected Methodology of Web-Based Questionnaires Among Chinese Health Workers: Web-Based Questionnaire Study
    Fang, Heping
    Lv, Yuxin
    Chen, Lin
    Zhang, Xuan
    Hu, Yan
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25