Sampling, information extraction and summarisation of Hidden Web databases

被引:14
|
作者
Hedley, Yih-Ling [1 ]
Younas, Muhammad
James, Anne
Sanderson, Mark
机构
[1] Coventry Univ, Sch Math & Informat Sci, Coventry CV1 5FB, W Midlands, England
[2] Oxford Brookes Univ, Dept Comp, Oxford OX33 1HP, England
[3] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
关键词
Hidden Web databases; information extraction; document sampling;
D O I
10.1016/j.datak.2006.01.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated using page templates. This paper presents the Two-Phase Sampling (2PS) technique that detects and extracts query-related information from documents contained in databases. 2PS is based on a two-phase framework for the sampling, information extraction and summarisation of Hidden Web documents. In the first phase, 2PS samples and stores documents for further analysis. In the second phase, it detects Web page templates from sampled documents and extracts relevant information from which a content summary is then generated. Experimental results demonstrate that 2PS effectively eliminates irrelevant information from sampled documents and generates terms and frequencies with improved accuracy. (c) 2006 Published by Elsevier B.V.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [31] Incremental Information Extraction Using Relational Databases
    Tari, Luis
    Phan Huy Tu
    Hakenberg, Joerg
    Chen, Yi
    Tran Cao Son
    Gonzalez, Graciela
    Baral, Chitta
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 86 - 99
  • [32] Formal Languages in Information Extraction and Graph Databases
    Martens, Wim
    [J]. BEYOND THE HORIZON OF COMPUTABILITY, CIE 2020, 2020, 12098 : 306 - 309
  • [33] Querying text databases for efficient information extraction
    Agichtein, E
    Gravano, L
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 113 - 124
  • [34] MobiFace: A Mobile Application for Faceted Search over Hidden Web Databases
    Nazi, Azade
    Asudeh, Abolfazl
    Zhang, Nan
    Jaoua, Ali
    Das, Gautam
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 13 - 17
  • [35] The Hidden Web: Finding Quality Information on the Net
    MacDonald, Ross
    [J]. ELECTRONIC LIBRARY, 2008, 26 (05): : 762 - 763
  • [36] Bilingual Data Extraction and Auto Summarisation
    Singh, Shashi Pal
    Darbari, Hemant
    Kumar, Ajai
    Mehta, Swati
    Jain, Nidhi
    Kaur, Prabh Simran
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 4350 - 4354
  • [37] Sentence Extraction for Legal Text Summarisation
    Hachey, Ben
    Grover, Claire
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1686 - 1687
  • [38] Building databases with information extracted from web documents
    Gutiérrez, A
    Motz, R
    Viera, D
    [J]. XX INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY - PROCEEDINGS, 2000, : 41 - 49
  • [39] Dynamic Query Processing for Hidden Web Data Extraction
    Ahuja, Babita
    Anuradha
    Juneja, Dimple
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1352 - 1356
  • [40] Spatial Information Extraction using Hidden Correlations
    Lo, Chun-Chih
    Hsu, Kuo-Hsuan
    Horng, Mong-Fong
    Kuo, Yau-Hwang
    [J]. 2018 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2018,