Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引:2
|
作者
Shu, Zhinian [1 ]
Li, Xiaorong [1 ]
机构
[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China
关键词
INTERNET;
D O I
10.1155/2022/9220661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Method of Automatic Web Information Extraction Based on Page Clustering
    Yang, Tianqi
    Qiu, Taofen
    2011 9TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2011), 2011, : 390 - 393
  • [2] A novel web page text information extraction method
    Wang, Chongjun
    Wei, Peng
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2213 - 2218
  • [3] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [4] Automatic Web Information Extraction Based on Rules
    Hu, Fanghuai
    Ruan, Tong
    Shao, Zhiqing
    Ding, Jun
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2011, 2011, 6997 : 265 - 272
  • [5] Text-Based Web Page Classification with Use of Visual Information
    Bartik, Vladimir
    2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, : 416 - 420
  • [6] Web text information extraction based on wrapper model
    Wang, Jingpu
    Lin, Yaping
    Zhou, Shunxian
    2005 International Symposium on Computer Science and Technology, Proceedings, 2005, : 607 - 612
  • [7] A Method of Web Information Automatic Extraction Based on XML
    Gu, Junhua
    Song, Jie
    Zhang, Na
    Liu, Yanliu
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 178 - 183
  • [8] Web Page Information Extraction Service Based on Graph Convolutional Neural Network and Multimodal Data Fusion
    Zhang, Mingzhu
    Yang, Zhongguo
    Ali, Sikandar
    Ding, Weilong
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 681 - 687
  • [9] Automatic extraction and verification of page transitions in a Web application
    Kubo, Atsuto
    Washizaki, Hironori
    Fukazawa, Yoshiaki
    14TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2007, : 350 - +
  • [10] Study of Web Page Information Topic Extraction Technology Based on Vision
    Li, Qingshui
    Wu, Kai
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 9 (ICCSIT 2010), 2010, : 781 - 784