Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

被引:2
|
作者
Shu, Zhinian [1 ]
Li, Xiaorong [1 ]
机构
[1] Chaohu Univ, Coll Informat Engn, Chaohu 238000, Peoples R China
关键词
INTERNET;
D O I
10.1155/2022/9220661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Automatic Information Extraction from the Web: An HMM-Based Approach
    Tran-Le, M. S.
    Vo-Dang, T. T.
    Ho-Van, Quan
    Dang, T. K.
    MODELING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2008, : 575 - 585
  • [22] Chinese web page classification based on text contents
    Liang, JZ
    ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
  • [23] Automatic pattern construction for web information extraction
    Gao, XY
    Zhang, MJ
    Andreae, P
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2004, 12 (04) : 447 - 470
  • [24] Applying Information Extraction to Automatic Web Advertising
    Dung T. Dao
    Huong T. Le
    RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 128 - 133
  • [25] An approach of automatic web mail information extraction
    Li, Yingrun
    Shu, Hui
    2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 1113 - 1118
  • [26] AUTOBIB: Automatic extraction of bibliographic information on the web
    Geng, JF
    Yang, J
    INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2004, : 193 - 204
  • [27] Automatic web information extraction in the ROADRUNNER system
    Crescenzi, V
    Mecca, G
    Merialdo, P
    CONCEPTUAL MODELING FOR NEW INFORMATION SYSTEMS TECHNOLOGIES, 2002, 2465 : 264 - 277
  • [28] A novel approach for Web page modeling in personal information extraction
    Wei Yuliang
    Zhou Qi
    Lv Fang
    Han Xixian
    Xin Guodong
    Wang Bailing
    World Wide Web, 2019, 22 : 603 - 620
  • [29] A novel approach for Web page modeling in personal information extraction
    Wei Yuliang
    Zhou Qi
    Lv Fang
    Han Xixian
    Xin Guodong
    Wang Bailing
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 603 - 620
  • [30] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 361 - 365