Web information extraction using generalized hidden Markov model

被引:0
|
作者
Zhong, Ping [1 ]
Chen, Jinlin [2 ]
Cook, Terry [1 ]
机构
[1] CUNY, Grad Ctr, Dept Comp Sci, New York, NY 10021 USA
[2] CUNY, Grad Ctr, Queens Coll, Dept Comp Sci, New York, NY 10021 USA
关键词
hidden Markov model; information extraction; layout analysis; web;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hidden Markov Model (HMM) is an important approach for information extraction (IE). When applied to Web IE, several problems exist with HMM based approaches due to the lack of consideration on Web-specific features. In this paper we present a Generalized Hidden Markov Model (GHMM) that extends traditional HMMs by making use of Web-specific information for Web IE. In our approach we use Web content block instead of term as basic extraction unit. Besides, instead of using the traditional sequential state transition order, we detect the state transition order of GHMM based on layout structure of the corresponding web page. Furthermore, we use multiple emission features instead of single emission feature. In this way GHMM can better accommodate Web IE. Experiments show promising results comparing to traditional HMM based Web IE.
引用
收藏
页码:142 / +
页数:2
相关论文
共 50 条
  • [1] A generalized hidden Markov model approach for web information extraction
    Zhong, Ping
    Chen, Jinlin
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 709 - +
  • [2] Web information extraction based on a Generalized Hidden Markov Model
    Yao, Yong
    Wang, Jing
    Liu, Zhijing
    Journal of Computational Information Systems, 2007, 3 (05): : 1847 - 1854
  • [3] Web object information extraction based on generalized hidden Markov model
    Wang, Jing
    Yao, Yong
    Liu, ZhiJing
    2007 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, VOLS 1-3, 2007, : 1520 - 1523
  • [4] Optimization of hidden Markov model by a genetic algorithm for web information extraction
    Xiao, Jiyi
    Zou, Lamei
    Li, Chuanqi
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE 2007), 2007,
  • [5] Extraction of Key Information in Web News Based on Improved Hidden Markov Model
    Liu Z.
    Du Y.
    Shi S.
    Data Analysis and Knowledge Discovery, 2019, 3 (03) : 120 - 128
  • [6] Detecting web content function using generalized hidden Markov model
    Chen, Jinlin
    Zhong, Ping
    Cook, Terry
    ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2006, : 279 - +
  • [7] Using hidden Markov model for information extraction based on multiple templates
    Liu, YZ
    Lin, YP
    Chen, ZP
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 394 - 399
  • [8] Information Extraction System Based on Hidden Markov Model
    Park, Dong-Chul
    Huong, Vu Thi Lan
    Woo, Dong-Min
    Hieu, Duong Ngoc
    Ninh, Sai Thi Hien
    ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 1, PROCEEDINGS, 2009, 5551 : 52 - +
  • [9] Financial Information Extraction Using the Improved Hidden Markov Model and Deep Learning
    Yang, Ping
    IETE JOURNAL OF RESEARCH, 2023, 69 (10)
  • [10] Information extraction algorithm based on multiple templates using hidden Markov model
    College of Information Technology, Jiangxi University of Finance and Economy, Nanchang 330013, China
    不详
    不详
    Jisuanji Gongcheng, 2006, 2 (203-205):