Research on method of learning web information extraction rule based on XPATH

被引:0
|
作者
Hu, Yan [1 ]
Xuan, Yanyan [1 ]
机构
[1] Wuhan Univ Technol, Dept Comp Sci & Technol, Wuhan 430070, Peoples R China
关键词
DOM; XPATH; XSLT; web information extraction;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper identifies theme blocks through cleaning website on the basis of the research in HTML documents structure, designs and implements a theme information extraction (IE) method with web based on XPATH, studies the key point of this method-XPATH expression that expresses the IE path, and then constructs an XPATH automatic algorithm. Thereby, IE rules can be learned automatically and generated to implement Web IE.
引用
收藏
页码:897 / 899
页数:3
相关论文
共 50 条
  • [1] Sample-based XPath Ranking for Web Information Extraction
    Jundt, Oliver
    van Keulen, Maurice
    [J]. PROCEEDINGS OF THE 8TH CONFERENCE OF THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY (EUSFLAT-13), 2013, 32 : 187 - 194
  • [2] Web data extraction research based on wrapper and XPath technology
    Liu, Hong
    Ma, YinXiao
    [J]. ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 706 - 712
  • [3] Research on Web Information Extraction Based on XML
    Hu, Yan
    Xuan, Yanyan
    [J]. SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 201 - 204
  • [4] Information Extraction Using XPath
    Okada, Masashi
    Ishii, Naohiro
    Torii, Ippei
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT III, 2010, 6278 : 104 - 112
  • [5] Extraction Rule Language for Web Information Extraction and Integration
    Wei, Wu
    Shi, Shengsheng
    Liu, Yulong
    Wang, Haitao
    Yuan, Chunfeng
    Huang, Yihua
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 65 - +
  • [6] A rule extraction method based on meta-information
    Su, Jian
    Weng, Wenyong
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 102 - +
  • [7] Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning
    Peng, Hao
    Li, Qiao
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (03): : 609 - 616
  • [8] The Research of Web Parallel Information Extraction Based on Hadoop
    Ma, Songyu
    Shi, Quan
    Xu, Lu
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 341 - 348
  • [9] Research on Model of Network Information Currency Evaluation Based on Web Semantic Extraction Method
    Chen, Mo
    Yang, Xiao-Ping
    Sun, Meng
    Zhao, Yun
    [J]. INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2014, 7 (02): : 103 - 116
  • [10] Method of Web Information Extraction Based on Decision Tree
    Chen Hong-ye
    [J]. 2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 664 - 666