Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引:2
|
作者
Shu, Zhinian [1 ]
Li, Xiaorong [1 ]
机构
[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China
关键词
INTERNET;
D O I
10.1155/2022/9220661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 145 - 149
  • [32] Web Page Ranking Based on Text Content and Link Information Using Data Mining Techniques
    Naamha, Esraa Q.
    Abdulmunim, Matheel E.
    ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2024, 12 (01): : 29 - 40
  • [33] A Method to Discover Sensitive Information in Classified Network Based on Web Information Extraction
    Zhang, Jianping
    Li, Hongmin
    Lu, Min
    Ke, Mingmin
    2016 FIRST IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET (ICCCI 2016), 2016, : 262 - 265
  • [34] Earthquake Information Extraction and Comparison from Different Sources Based on Web Text
    Han, Xuehua
    Wang, Juanle
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (06)
  • [35] An information extraction method based on improved mixed text density web pages
    Zhou, Yuan
    Yin, Xiaojun
    Yan, Jingchen
    EXPERT SYSTEMS, 2024, 41 (06)
  • [36] Automatic Summarization of Web Page Based on Statistics and Structure
    Zheng, Shuangyi
    Yu, Junyang
    KNOWLEDGE DISCOVERY AND DATA MINING, 2012, 135 : 643 - +
  • [37] SVM based Chinese web page automatic classification
    Liang, JZ
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2265 - 2268
  • [38] An Open Relation Extraction System for Web Text Information
    Li, Huagang
    Liu, Bo
    APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [39] TEXT: Automatic Template Extraction from Heterogeneous Web Pages
    Kim, Chulyun
    Shim, Kyuseok
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (04) : 612 - 626
  • [40] HTML text segmentation for Web page summarization by a key sentence extraction method
    Sunayama, Wataru
    Iyama, Akihiro
    Yachida, Masahiko
    Systems and Computers in Japan, 2006, 37 (07): : 26 - 36