DTM- Extracting Data Records from Search Engine Results Page using Tree Matching Algorithm

被引：0

作者：

Hong, Jer Lang ^{[1
]}

Siew, Eugene ^{[1
]}

Egerton, Simon ^{[1
]}

机构：

[1] Monash Univ, Selangor Darul Ehsan 46150, Malaysia

来源：

2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION | 2009年

关键词：

Information Extraction; Wrapper Generation; Search Engine;

D O I：

10.1109/SoCPaR.2009.40

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we develop a non-visual automatic wrapper for extracting data records from search engine results page. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region. Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.

引用

页码：149 / 154

页数：6

共 50 条

[1] Extracting Knowledge from Web Search Engine Results
Kanavos, Andreas
Theodoridis, Evangelos
Tsakalidis, Athanasios
[J]. 2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 860 - 867
[2] Data Extraction for Search Engine Using Safe Matching
Hong, Jer Lang
Tan, Ee Xion
Fauzi, Fariza
[J]. AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 759 - +
[3] Effects of Using Arabic Web Pages in Building Rank Estimation Algorithm for Google Search Engine Results Page
Almadhoun, Mohamed
Malim, Nurul
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (06) : 996 - 1007
[4] Extracting Knowledge from Web Search Engine Using Wikipedia
Kanavos, Andreas
Makris, Christos
Plegas, Yannis
Theodoridis, Evangelos
[J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT II, 2013, 384 : 100 - 109
[5] Extracting knowledge from web search engine using wikipedia
Kanavos, Andreas
Makris, Christos
Plegas, Yannis
Theodoridis, Evangelos
[J]. Communications in Computer and Information Science, 2013, 384 : 100 - 109
[6] Layered and Weighted Tree Matching Algorithm for Automatic Web Data Records Recognition
Shi, Shengsheng
Quan, Fuliang
Xie, Tao
Yuan, Chunfeng
Huang, Yihua
[J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 55 - 60
[7] Generating page clippings from web search results using a dynamically terminated genetic algorithm
Chen, LC
Luh, CJ
Jou, CC
[J]. INFORMATION SYSTEMS, 2005, 30 (04) : 299 - 316
[8] Using Twitter Data to Improve News Results on Search Engine
Santoso, Abraham Krisnanda
Saptawati, Gusti Ayu Putri
[J]. 2014 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2014,
[9] Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data
Kou, Gang
Lou, Chunwei
[J]. ANNALS OF OPERATIONS RESEARCH, 2012, 197 (01) : 123 - 134
[10] Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data
Gang Kou
Chunwei Lou
[J]. Annals of Operations Research, 2012, 197 : 123 - 134

← 1 2 3 4 5 →