Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引：2

作者：

Shu, Zhinian ^{[1
]}

Li, Xiaorong ^{[1
]}

机构：

[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China

来源：

WIRELESS COMMUNICATIONS & MOBILE COMPUTING | 2022年 / 2022卷

关键词：

INTERNET;

D O I：

10.1155/2022/9220661

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.

引用

页数：10

共 50 条

[11] A web page content information extraction method based on tag window
Zhao, Xin-Xin
Suo, Hong-Guang
Liu, Yu-Shu
Proceedings of 2006 International Conference on Machine Learning and Cybernetics, Vols 1-7, 2006, : 1598 - 1601
[12] E-Commerce Web Page Classification Based on Automatic Content Extraction
Petprasit, Warid
Jaiyen, Saichon
PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 74 - 77
[13] Automatic Content Extraction for Live Streaming Web Page Based on the Comparison Approach
Li, Yen-Chieh
Cheng, Hui-Wen
Lee, Pei-Feng
Kuo, Wei-Xun
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
[14] Web Page Segmentation Towards Information Extraction for Web Semantics
Malhotra, Pooja
Malik, Sanjay Kumar
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 431 - 442
[15] PSO: A language for Web information extraction and Web page clipping
Suzuki, T
Tokuda, T
ADAPTIVE HYPERMEDIA AND ADAPTIVE WEB-BASED SYSTEMS, PROCEEDINGS, 2004, 3137 : 332 - 335
[16] Inference by coincidence and the extraction of propositional information from text
Dennis, SJ
Kinstch, W
AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2004, 56 : 177 - 177
[17] Deep Neural Networks for Web Page Information Extraction
Gogar, Tomas
Hubacek, Ondrej
Sedivy, Jan
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2016, 2016, 475 : 154 - 163
[18] Spoken Dialogue System Based on Information Extraction from Web Text
Yoshino, Koichiro
Kawahara, Tatsuya
SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 196 - 197
[19] On Web Page extraction based on position of DIV
Liu, Xunhua
Li, Hui
Wu, Dan
Huang, Jiaqing
Wang, Wei
Yu, Li
Wu, Ye
Xie, Hengjun
2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 4, 2010, : 144 - 147
[20] Automatic extraction algorithm of Web pages topical information based on blocks
Yin, Xianliang
Li, Meng
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35 (10): : 39 - 41

← 1 2 3 4 5 →