Automatic Web Information Extraction and Alignment using CTVS Technique

被引：0

作者：

Pandarge, Sangmesh S. ^{[1
]}

Chakkarwar, V. A. ^{[1
]}

机构：

[1] Govt Coll Engn, Dept Comp Sci Engn, Aurangabad, Maharashtra, India

来源：

2017 INTERNATIONAL CONFERENCE OF ELECTRONICS, COMMUNICATION AND AEROSPACE TECHNOLOGY (ICECA), VOL 2 | 2017年

关键词：

Web page; Query result records (QRRs); Tag tree format; Data region; Record segmentation; Web data extraction and Data alignment;

D O I：

暂无

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

User hit the query on internet browser then it generates query's result from web databases which called as query result page. Basically, web browser provides query results having structured, semi-structured or unstructured in HTML web pages through web database. In this paper, the main objective is the automatically extracting web based data and aligns that information in a tabular form. The benefit of extracted data is mainly for knowledge discovery as well as comparison shopping purpose etc. Web page contains a very large data in regularly structured objects is called as data record. This paper presents one of the methods for web information extraction and alignment is CTVS which is novel and improved technique which exploits tag as well as value similarity in a web page. The proposed approach fetches information through query result pages automatically by identifying QRRs, construction of tag tree and separating QRRs (query result records) in a query result page. Extracted data can be aligned in pairwise or holistic alignment technique. The segmented query result records are arranged according to same attribute of data values in database table. The proposed technique is suitable for both contiguous and non-contiguous data regions because of result page contain some irrelevant data with having expected result data. The experimental result gives good accuracy in less time and highly effective in extracting the web data and aligning structured data records.

引用

页码：94 / 99

页数：6

共 50 条

[1] Automatic Extraction of Semantic Relations by Using Web Statistical Information
Borzi, Valeria
Faro, Simone
Pavone, Arianna
GRAPH-BASED REPRESENTATION AND REASONING, 2014, 8577 : 174 - 187
[2] Automatic Web Information Extraction Based on Rules
Hu, Fanghuai
Ruan, Tong
Shao, Zhiqing
Ding, Jun
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2011, 2011, 6997 : 265 - 272
[3] Automatic pattern construction for web information extraction
Gao, XY
Zhang, MJ
Andreae, P
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2004, 12 (04) : 447 - 470
[4] Applying Information Extraction to Automatic Web Advertising
Dung T. Dao
Huong T. Le
RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 128 - 133
[5] An approach of automatic web mail information extraction
Li, Yingrun
Shu, Hui
2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 1113 - 1118
[6] AUTOBIB: Automatic extraction of bibliographic information on the web
Geng, JF
Yang, J
INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2004, : 193 - 204
[7] Automatic web information extraction in the ROADRUNNER system
Crescenzi, V
Mecca, G
Merialdo, P
CONCEPTUAL MODELING FOR NEW INFORMATION SYSTEMS TECHNOLOGIES, 2002, 2465 : 264 - 277
[8] DEPTA: An Efficient Technique For Web Data Extraction and Alignment
Lokhande, Rahul L.
Manjaramkar, Arati
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2307 - 2310
[9] Information extraction from Web pages using semi-structured data alignment
Kuboyama, Tetsuji
Miyahara, Tetsuhiro
Hirokawa, Sachio
Itou, Eisuke
WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
[10] Automatic RESTful Web Service Identification and Information Extraction
Czyszczon, Adam
Zgrzywa, Aleksander
COMPUTER NETWORKS, CN 2014, 2014, 431 : 318 - 327

← 1 2 3 4 5 →