An Analysis of Characters and Structures of Web Pages Based on Regular Expressions

被引:0
|
作者
Xu, Lei [1 ]
机构
[1] Hubei Univ, Fac Phys & Elect Sci, Wuhan, Peoples R China
关键词
information extraction; !text type='HTML']HTML[!/text; regular expressions;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a method to analyze characters and structures of web pages via regular expressions. From encoding to HMTL elements, characters in Web pages are counted one by one. The effectiveness of this tool is proven in experiments with more than one hundred real-world web pages. All work can be ready for massive web information extraction.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Strategy for mining association rules for web pages based on formal concept analysis
    Du, YaJun
    Li, HaiMing
    [J]. APPLIED SOFT COMPUTING, 2010, 10 (03) : 772 - 783
  • [42] Web pages classification using concept analysis
    Di Lucca, Giuseppe Antonio
    Fasolino, Anna Rita
    Tramontana, Porfirio
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2007, : 435 - +
  • [43] Analysis of co-occurrence toponyms in web pages based on complex networks
    Zhong, Xiang
    Liu, Jiajun
    Gao, Yong
    Wu, Lun
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2017, 466 : 462 - 475
  • [44] A Simple Presentation Tool Based on Web Pages
    Peng, Ying
    Wu, Chanle
    Peng, Shujuan
    Yue, Jun
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING WORKSHOP PROCEEDINGS, VOLS 1 AND 2, 2008, : 715 - +
  • [45] Web Clustering based on the Information of Sibling Pages
    Lu, Caimei
    Zhang, Xiaodan
    Park, Jung-ran
    Hu, Xiaohua
    He, Tingting
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 480 - +
  • [46] Analysis of Web Pages Based the Changed Information and Its' Application in the Search Engine for One Web Site
    Liu Hongshen
    Wang Pengfei
    [J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2311 - 2316
  • [47] Recommendation of web pages based on concept association
    Lu, MY
    Qiang, Z
    Li, F
    Lu, YC
    Zhou, LH
    [J]. WECWIS 2002: FOURTH IEEE INTERNATIONAL WORKSHOP ON ADVANCED ISSUES OF E-COMMERCE AND WEB-BASED INFORMATION SYSTEMS, PROCEEDINGS, 2002, : 221 - 227
  • [48] Searching Web pages based on predefined strings
    Karar, Mete
    Gulec, Kadir
    Carkacioglu, Abdurrahman
    [J]. 2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 563 - +
  • [49] Visualization Method of Web Pages based on Syllabus
    Yaginuma, Yoshitomo
    [J]. 2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 1009 - 1010
  • [50] Composite analysis of web pages in adaptive environment through Modified Salp Swarm algorithm to rank the web pages
    E. Manohar
    E. Anandha Banu
    D. Shalini Punithavathani
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 2585 - 2600