A semi-structured information semantic annotation method for Web pages

被引:0
|
作者
Lu Zhang
Tiantian Wang
Yiran Liu
Qingling Duan
机构
[1] China Agricultural University,College of Information and Electrical Engineering
来源
关键词
Semantic annotation; Semi-structured information; Back-propagation neural network; Domain description model;
D O I
暂无
中图分类号
学科分类号
摘要
There is a large amount of semi-structured information on Web pages. Comprehensive and accurate annotation of Web page information with uniform semantics can enhance the use value of information and provide support for Web site information integration. According to the characteristics of semi-structured information on Web pages, a semantic annotation method based on header recognition and data item classification is proposed. Firstly, a description model is constructed for the domain to be annotated. Secondly, header recognition is used to annotate data items on extracted pages. For those data items fail to be annotated by header recognition, feature vectors are constructed based on the feature sets in the domain description model and semantics of those data items are annotated by the classification results of back-propagation neural network. The proposed method is tested on 19,657 data items in the domain of agricultural product price and 8089 data items in the domain of recruitment information. The annotation precision is 97.39% and 95.67% respectively, and the annotation recall is 95.41% and 95.67%, respectively. These results show that the proposed method can annotate semi-structured information on Web pages accurately and completely.
引用
收藏
页码:6491 / 6501
页数:10
相关论文
共 50 条
  • [1] A semi-structured information semantic annotation method for Web pages
    Zhang, Lu
    Wang, Tiantian
    Liu, Yiran
    Duan, Qingling
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6491 - 6501
  • [2] Reverse method for labeling the information from semi-structured web pages
    Akbar, Z.
    Handoko, L. T.
    [J]. PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 551 - 555
  • [3] Semantic annotation of semi-structured documents
    Ranganathan, Girish R.
    Biletskiy, Yevgen
    Kaltchenko, Alexey
    [J]. 2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 877 - +
  • [4] Bootstrapping Information Extraction from Semi-structured Web Pages
    Carlson, Andrew
    Schafer, Charles
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 195 - +
  • [5] A strategy for extracting information from semi-structured web pages
    Shaker, Mahmoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2010, 6 (04) : 304 - 318
  • [6] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    [J]. WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [7] Automatic information extraction from semi-structured Web pages by pattern discovery
    Chang, CH
    Hsu, CN
    Lui, SC
    [J]. DECISION SUPPORT SYSTEMS, 2003, 35 (01) : 129 - 147
  • [8] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [9] Extracting lists of data records from semi-structured web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
  • [10] A Hybrid Method for Semantic Annotation of Chinese Web Pages
    Jing, Tao
    Zuo, Wanli
    He, Fengling
    [J]. 2008 INTERNATIONAL SEMINAR ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, PROCEEDINGS, 2008, : 252 - 256