Sampling, information extraction and summarisation of Hidden Web databases

被引：14

作者：

Hedley, Yih-Ling ^{[1
]}

Younas, Muhammad

James, Anne

Sanderson, Mark

机构：

[1] Coventry Univ, Sch Math & Informat Sci, Coventry CV1 5FB, W Midlands, England

[2] Oxford Brookes Univ, Dept Comp, Oxford OX33 1HP, England

[3] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England

来源：

DATA & KNOWLEDGE ENGINEERING | 2006年 / 59卷 / 02期

关键词：

Hidden Web databases; information extraction; document sampling;

D O I：

10.1016/j.datak.2006.01.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hidden Web databases maintain a collection of specialised documents, which are dynamically generated using page templates. This paper presents the Two-Phase Sampling (2PS) technique that detects and extracts query-related information from documents contained in databases. 2PS is based on a two-phase framework for the sampling, information extraction and summarisation of Hidden Web documents. In the first phase, 2PS samples and stores documents for further analysis. In the second phase, it detects Web page templates from sampled documents and extracts relevant information from which a content summary is then generated. Experimental results demonstrate that 2PS effectively eliminates irrelevant information from sampled documents and generates terms and frequencies with improved accuracy. (c) 2006 Published by Elsevier B.V.

引用

页码：213 / 230

页数：18

共 50 条

[31] Incremental Information Extraction Using Relational Databases
Tari, Luis
Phan Huy Tu
Hakenberg, Joerg
Chen, Yi
Tran Cao Son
Gonzalez, Graciela
Baral, Chitta
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 86 - 99
[32] Formal Languages in Information Extraction and Graph Databases
Martens, Wim
[J]. BEYOND THE HORIZON OF COMPUTABILITY, CIE 2020, 2020, 12098 : 306 - 309
[33] Querying text databases for efficient information extraction
Agichtein, E
Gravano, L
[J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 113 - 124
[34] MobiFace: A Mobile Application for Faceted Search over Hidden Web Databases
Nazi, Azade
Asudeh, Abolfazl
Zhang, Nan
Jaoua, Ali
Das, Gautam
[J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 13 - 17
[35] The Hidden Web: Finding Quality Information on the Net
MacDonald, Ross
[J]. ELECTRONIC LIBRARY, 2008, 26 (05): : 762 - 763
[36] Bilingual Data Extraction and Auto Summarisation
Singh, Shashi Pal
Darbari, Hemant
Kumar, Ajai
Mehta, Swati
Jain, Nidhi
Kaur, Prabh Simran
[J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 4350 - 4354
[37] Sentence Extraction for Legal Text Summarisation
Hachey, Ben
Grover, Claire
[J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1686 - 1687
[38] Building databases with information extracted from web documents
Gutiérrez, A
Motz, R
Viera, D
[J]. XX INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY - PROCEEDINGS, 2000, : 41 - 49
[39] Dynamic Query Processing for Hidden Web Data Extraction
Ahuja, Babita
Anuradha
Juneja, Dimple
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1352 - 1356
[40] Spatial Information Extraction using Hidden Correlations
Lo, Chun-Chih
Hsu, Kuo-Hsuan
Horng, Mong-Fong
Kuo, Yau-Hwang
[J]. 2018 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2018,

← 1 2 3 4 5 →