Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning

被引:14
|
作者
Peng, Hao [1 ]
Li, Qiao [1 ]
机构
[1] Hunan Int Econ Univ, Sch Informat Sci & Engn, High Tech Ind Dev Zone, Changsha 410205, Hunan, Peoples R China
来源
关键词
Automatic extraction; deep learning; neural network; Web data;
D O I
10.32604/iasc.2020.013939
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper represents a neural network model for the Web page information extraction based on the depth learning technology, and implements the model algorithm using the TensorFbw system. We then complete a detailed experimental analysis of the information extraction effect of Web pages on the same website, then show statistics on the accuracy index of the page information extraction, and optimize some parameters in the model according to the experimental results. On the premise of achieving ideal experimental results, an algorithm for migrating the model to the same pages of other websites for information extraction is proposed, and the experimental results are analyzed. Although the overall effect of the experiment is not as good as that of the page information extraction in different websites, it is far more effective than that of using the model directly on new websites. A new method is proposed to improve the portability of the information extraction system based on machine leaming technology. At the same time, the deep nonlinear learning method of the depth learning model can prove deeper features, can have a more essential description of the abstract language, and can better express and understand sentences from the syntactic and semantic levels.
引用
收藏
页码:609 / 616
页数:8
相关论文
共 50 条
  • [31] Automatic identification method of overpasses based on deep learning
    Ma Jingzhen
    Wen Bowei
    Zhang Fubing
    [J]. 2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
  • [32] Automatic Counting Method for Centipedes Based on Deep Learning
    Yao, Jin
    Chen, Weitao
    Wang, Tao
    Yang, Fu
    Sun, Xiaoyan
    Yao, Chong
    Jia, Liangquan
    [J]. IEEE ACCESS, 2024, 12 : 84726 - 84737
  • [33] An effective method supporting data extraction and schema recognition on deep web
    Liu, Wei
    Shen, Derong
    Nie, Tiezheng
    [J]. PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 419 - 431
  • [34] Research on the extraction method of book number region based on bayesian optimization and deep learning
    Zhang, Qianqian
    Sun, Jianglei
    Zhao, Jing
    Xia, Zilin
    Zhang, Kai
    [J]. International Journal of Circuits, Systems and Signal Processing, 2021, 15 : 1150 - 1158
  • [35] Domain Ontology Creation Based on Automatic Text Extraction for Learning Objects Characterization
    Ortiz, Adela
    Azevedo, Isabel
    Seica, Rui
    Carrapatoso, Eurico
    Carvalho, Carlos Vaz
    [J]. PROCEEDINGS OF THE 8TH EUROPEAN CONFERENCE ON E-LEARNING, 2009, : 440 - 448
  • [36] Automatic Data Extraction from Lists in Web Pages Based on XML
    Xin, Zhou
    Hao, Wang
    [J]. ADVANCED TECHNOLOGY IN TEACHING - PROCEEDINGS OF THE 2009 3RD INTERNATIONAL CONFERENCE ON TEACHING AND COMPUTATIONAL SCIENCE (WTCS 2009), VOL 2: EDUCATION, PSYCHOLOGY AND COMPUTER SCIENCE, 2012, 117 : 915 - 921
  • [37] Research on Automatic Extraction Technology of Power Transmission Tower Based on SAR Image and Deep Learning Technology
    Ou, WenHao
    Yang, Zhi
    Zhao, BinBin
    Fei, XiangZe
    Ma, Xiao
    Yang, Gang
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2019), 2019, : 825 - 828
  • [38] Review of Deep Web Data Extraction
    Li, Shenglin
    Chen, Chen
    Luo, Kaiwen
    Song, Bo
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1068 - 1070
  • [39] SmartColor: Automatic Web Color Scheme Generation Based on Deep Learning
    Feng, Zhitao
    Hou, Mingliang
    Liu, Huiyang
    Liu, Mujie
    Kaur, Achhardeep
    Febrinanto, Falih Gozi
    Zhao, Wenhong
    [J]. 2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 285 - 290
  • [40] On the automatic extraction of data from the hidden web
    Liddle, SW
    Yau, SH
    Embley, DW
    [J]. CONCEPTUAL MODELING FOR NEW INFORMATION SYSTEMS TECHNOLOGIES, 2002, 2465 : 212 - 226