Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning

被引:14
|
作者
Peng, Hao [1 ]
Li, Qiao [1 ]
机构
[1] Hunan Int Econ Univ, Sch Informat Sci & Engn, High Tech Ind Dev Zone, Changsha 410205, Hunan, Peoples R China
来源
关键词
Automatic extraction; deep learning; neural network; Web data;
D O I
10.32604/iasc.2020.013939
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper represents a neural network model for the Web page information extraction based on the depth learning technology, and implements the model algorithm using the TensorFbw system. We then complete a detailed experimental analysis of the information extraction effect of Web pages on the same website, then show statistics on the accuracy index of the page information extraction, and optimize some parameters in the model according to the experimental results. On the premise of achieving ideal experimental results, an algorithm for migrating the model to the same pages of other websites for information extraction is proposed, and the experimental results are analyzed. Although the overall effect of the experiment is not as good as that of the page information extraction in different websites, it is far more effective than that of using the model directly on new websites. A new method is proposed to improve the portability of the information extraction system based on machine leaming technology. At the same time, the deep nonlinear learning method of the depth learning model can prove deeper features, can have a more essential description of the abstract language, and can better express and understand sentences from the syntactic and semantic levels.
引用
收藏
页码:609 / 616
页数:8
相关论文
共 50 条
  • [1] The Research of automatic extraction dynamic web data
    Qu Jubao
    [J]. 2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 143 - 146
  • [2] Semantic Deep Web: Automatic Attribute Extraction from the Deep Web Data Sources
    An, Yoo Jung
    Geller, James
    Wu, Yi-Ta
    Chun, Soon Ae
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 1667 - 1672
  • [3] An Automatic Semantic Extraction Method for Web Data Interchange
    Yao, Yuangang
    Liu, Hui
    Yi, Jin
    Chen, Haiqiang
    Zhao, Xianghui
    Ma, Xiaoyu
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2014, : 148 - 152
  • [4] Research on method of learning web information extraction rule based on XPATH
    Hu, Yan
    Xuan, Yanyan
    [J]. DCABES 2007 PROCEEDINGS, VOLS I AND II, 2007, : 897 - 899
  • [5] Research on Adaptive Wrapper in Deep Web Data Extraction
    Liu, Donglan
    Ma, Lei
    Liu, Xin
    [J]. INTERNET OF VEHICLES - SAFE AND INTELLIGENT MOBILITY, IOV 2015, 2015, 9502 : 409 - 423
  • [6] Automatic Vegetation Extraction Method based on Feature Separation Mechanism with Deep Learning
    Zhou, Xinxin
    Wu, Yanlan
    Li, Mengya
    Zheng, Zhiteng
    [J]. Journal of Geo-Information Science, 2021, 23 (09) : 1675 - 1689
  • [7] Research on Natural Language Extraction Method Based on Deep Learning Technology
    Zhuang, Wei
    [J]. 2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, : 88 - 93
  • [8] A Method of Web Information Automatic Extraction Based on XML
    Gu, Junhua
    Song, Jie
    Zhang, Na
    Liu, Yanliu
    [J]. INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 178 - 183
  • [9] Prerequisites between learning objects: Automatic extraction based on a machine learning approach
    Gasparetti, Fabio
    De Medio, Carlo
    Limongelli, Carla
    Sciarrone, Filippo
    Temperini, Marco
    [J]. TELEMATICS AND INFORMATICS, 2018, 35 (03) : 595 - 610
  • [10] Research on automatic pilot repetition generation method based on deep reinforcement learning
    Pan, Weijun
    Jiang, Peiyuan
    Li, Yukun
    Wang, Zhuang
    Huang, Junxiang
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17