Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引:2
|
作者
Shu, Zhinian [1 ]
Li, Xiaorong [1 ]
机构
[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China
关键词
INTERNET;
D O I
10.1155/2022/9220661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Efficient Web Page Main Text Extraction towards Online News Analysis
    Zhou, Baoyao
    Xiong, Yuhong
    Liu, Wei
    ICEBE 2009: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2009, : 37 - 41
  • [42] A Construction Scheme of Web Page Comment Information Extraction System Based on Frequent Subtree Mining
    Zhang, Xiaowen
    Chen, Bingfeng
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [43] Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold
    Fkih, Fethi
    Omri, Mohamed Nazih
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2012, 2 (04) : 12 - 30
  • [44] Automatic Data Records Extraction from List Page in Deep Web Sources
    Chen Hong-ping
    Fang Wei
    Yang Zhou
    Zhuo Lin
    Cui Zhi-Ming
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 370 - 373
  • [45] Automatic RESTful Web Service Identification and Information Extraction
    Czyszczon, Adam
    Zgrzywa, Aleksander
    COMPUTER NETWORKS, CN 2014, 2014, 431 : 318 - 327
  • [46] A class of neural-network-based transducers for web information extraction
    Sleiman, Hassan A.
    Corchuelo, Rafael
    Neurocomputing, 2014, 135 : 61 - 68
  • [47] A class of neural-network-based transducers for web information extraction
    Sleiman, Hassan A.
    Corchuelo, Rafael
    NEUROCOMPUTING, 2014, 135 : 61 - 68
  • [48] A class of neural-network-based transducers for web information extraction
    Sleiman, Hassan A.
    Corchuelo, Rafael
    Neurocomputing, 2014, 135 : 61 - 68
  • [49] Automatic Text Generation via Text Extraction Based on Submodular
    Ai, Lisi
    Li, Na
    Zheng, Jianbing
    Gao, Ming
    WEB AND BIG DATA, 2017, 10612 : 237 - 246
  • [50] Learning knowledge bases for information extraction from multiple text based web sites
    Gao, XY
    Zhang, MJ
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 119 - 125