Web Page Classification Based on Graph Neural Network

被引:1
|
作者
Guo, Tao [1 ]
Cui, Baojiang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1007/978-3-030-79728-7_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page, a kind of semi-structured document, includes a lot of additional attribute content besides text information. Traditional web page classification technology is mostly based on text classification methods. They ignore the additional attribute information of web page text. We propose WEB-GNN, an approach for Web page classification. There are two major contributions to this work. First, we propose a web page graph representation method called W2G that reconstructs text nodes into graph representation based on text visual association relationship and DOM-tree hierarchy relationship and realizes the efficient integration of web page content and structure. Our second contribution is to propose a web page classification method based on graph convolutional neural network. It takes the web page graph representation as to the input, integrates text features and structure features through graph convolution layer, and generates the advanced webpage feature representation. Experimental results on the Web-black dataset suggest that the proposed method significantly outperforms text-only method.
引用
收藏
页码:188 / 198
页数:11
相关论文
共 50 条
  • [1] A hybrid neural network for web page classification
    Cao, YK
    Li, YF
    Yu, ZZ
    DIGITAL LIBRARIES: INTERNATIONAL COLLABORATION AND CROSS-FERTILIZATION, PROCEEDINGS, 2004, 3334 : 641 - 641
  • [2] Artificial Neural Network Based Technique Compare with "GA" for Web Page Classification
    Alarabi, Ali
    Mishra, Kamta Nath
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 699 - 705
  • [3] Graph based co-training algorithm for web page classification
    Hou, Cui-Qin
    Jiao, Li-Cheng
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (10): : 2173 - 2180
  • [4] Neural networks for web page classification based on augmented PCA
    Selamat, A
    Omatu, S
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1792 - 1797
  • [5] Web Page Information Extraction Service Based on Graph Convolutional Neural Network and Multimodal Data Fusion
    Zhang, Mingzhu
    Yang, Zhongguo
    Ali, Sikandar
    Ding, Weilong
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 681 - 687
  • [6] Block Based Web Page Feature Selection with Neural Network
    Jin, Yushan
    Liu, Ruikai
    He, Xingran
    Huang, Yongping
    ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 222 - 229
  • [7] Phishing Web Page Detection with HTML']HTML-Level Graph Neural Network
    Ouyang, Linshu
    Zhang, Yongzheng
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 952 - 958
  • [8] Research on Network Traffic Classification Based on Graph Neural Network
    University of Science and Technology Liaoning, Liaoning, Anshan
    114051, China
    不详
    IAENG Int. J. Comput. Sci., 2024, 12 (2043-2050):
  • [9] Research on fabric classification based on graph neural network
    Tao, Peng
    Cao, Wenli
    Jia, Chen
    Lv, Xinghang
    Zhang, Zili
    Jiu, Junping
    Hu, Xinrong
    INDUSTRIA TEXTILA, 2023, 74 (01): : 3 - 11
  • [10] Label Contrastive Coding Based Graph Neural Network for Graph Classification
    Ren, Yuxiang
    Bai, Jiyang
    Zhang, Jiawei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT I, 2021, 12681 : 123 - 140