Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引:2
|
作者
Shu, Zhinian [1 ]
Li, Xiaorong [1 ]
机构
[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China
关键词
INTERNET;
D O I
10.1155/2022/9220661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] A web page content information extraction method based on tag window
    Zhao, Xin-Xin
    Suo, Hong-Guang
    Liu, Yu-Shu
    Proceedings of 2006 International Conference on Machine Learning and Cybernetics, Vols 1-7, 2006, : 1598 - 1601
  • [12] E-Commerce Web Page Classification Based on Automatic Content Extraction
    Petprasit, Warid
    Jaiyen, Saichon
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 74 - 77
  • [13] Automatic Content Extraction for Live Streaming Web Page Based on the Comparison Approach
    Li, Yen-Chieh
    Cheng, Hui-Wen
    Lee, Pei-Feng
    Kuo, Wei-Xun
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [14] Web Page Segmentation Towards Information Extraction for Web Semantics
    Malhotra, Pooja
    Malik, Sanjay Kumar
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 431 - 442
  • [15] PSO: A language for Web information extraction and Web page clipping
    Suzuki, T
    Tokuda, T
    ADAPTIVE HYPERMEDIA AND ADAPTIVE WEB-BASED SYSTEMS, PROCEEDINGS, 2004, 3137 : 332 - 335
  • [16] Inference by coincidence and the extraction of propositional information from text
    Dennis, SJ
    Kinstch, W
    AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2004, 56 : 177 - 177
  • [17] Deep Neural Networks for Web Page Information Extraction
    Gogar, Tomas
    Hubacek, Ondrej
    Sedivy, Jan
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2016, 2016, 475 : 154 - 163
  • [18] Spoken Dialogue System Based on Information Extraction from Web Text
    Yoshino, Koichiro
    Kawahara, Tatsuya
    SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 196 - 197
  • [19] On Web Page extraction based on position of DIV
    Liu, Xunhua
    Li, Hui
    Wu, Dan
    Huang, Jiaqing
    Wang, Wei
    Yu, Li
    Wu, Ye
    Xie, Hengjun
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 4, 2010, : 144 - 147
  • [20] Automatic extraction algorithm of Web pages topical information based on blocks
    Yin, Xianliang
    Li, Meng
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35 (10): : 39 - 41