Automatic Web Information Extraction and Alignment using CTVS Technique

被引：0

作者：

Pandarge, Sangmesh S. ^{[1
]}

Chakkarwar, V. A. ^{[1
]}

机构：

[1] Govt Coll Engn, Dept Comp Sci Engn, Aurangabad, Maharashtra, India

来源：

2017 INTERNATIONAL CONFERENCE OF ELECTRONICS, COMMUNICATION AND AEROSPACE TECHNOLOGY (ICECA), VOL 2 | 2017年

关键词：

Web page; Query result records (QRRs); Tag tree format; Data region; Record segmentation; Web data extraction and Data alignment;

D O I：

暂无

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

User hit the query on internet browser then it generates query's result from web databases which called as query result page. Basically, web browser provides query results having structured, semi-structured or unstructured in HTML web pages through web database. In this paper, the main objective is the automatically extracting web based data and aligns that information in a tabular form. The benefit of extracted data is mainly for knowledge discovery as well as comparison shopping purpose etc. Web page contains a very large data in regularly structured objects is called as data record. This paper presents one of the methods for web information extraction and alignment is CTVS which is novel and improved technique which exploits tag as well as value similarity in a web page. The proposed approach fetches information through query result pages automatically by identifying QRRs, construction of tag tree and separating QRRs (query result records) in a query result page. Extracted data can be aligned in pairwise or holistic alignment technique. The segmented query result records are arranged according to same attribute of data values in database table. The proposed technique is suitable for both contiguous and non-contiguous data regions because of result page contain some irrelevant data with having expected result data. The experimental result gives good accuracy in less time and highly effective in extracting the web data and aligning structured data records.

引用

页码：94 / 99

页数：6

共 50 条

[41] An automatic alignment technique for multiple rangefinders
Fujiwara, Kenta
Yamauchi, Koichiro
Sato, Yukio
THREE-DIMENSIONAL IMAGE CAPTURE AND APPLICATIONS 2008, 2008, 6805
[42] AN AUTOMATIC MATCHING TECHNIQUE FOR PATIENT ALIGNMENT
BADRAN, AK
FISHER, AC
DURRANI, TS
PAUL, JP
JOURNAL OF BIOMEDICAL ENGINEERING, 1991, 13 (04): : 281 - 286
[43] Automatic support for the alignment of multilingual Web sites
Tonella, Paolo
Ricca, Filippo
Pianta, Emanuele
Girardi, Christian
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2006, 18 (03): : 153 - 179
[44] Accessing Deep Web Using Automatic Query Translation Technique
Liang, Hao
Zuo, Wanli
Ren, Fei
Sun, Chong
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 267 - 271
[45] To improve the web personalization using the boosted random forest for web information extraction
Rao P.S.
Devara V.
Recent Advances in Computer Science and Communications, 2020, 13 (06) : 1264 - 1268
[46] Towards Web Information Extraction using Extraction Ontologies and (Indirectly) Domain Ontologies
Labsky, Martin
Nekvasil, Marek
Svatek, Vojtch
K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 201 - 202
[47] An automatic label extraction technique for domain-specific hidden web crawling (LEHW)
El-Desouky, Ali I.
Ali, Hesham A.
El-Ghamrawy, Sally M.
2006 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2006, : 454 - +
[48] Research on Automatic Extraction of Web Metadata
Hu Changxia
Liu Xiaoxing
2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 449 - 452
[49] Solution for automatic Web review extraction
Liu W.
Yan H.-L.
Xiao J.-G.
Zeng J.-X.
Ruan Jian Xue Bao/Journal of Software, 2010, 21 (12): : 3220 - 3236
[50] Automatic extraction of meaning from the web
Cilibrasi, Rudi
Vitanyi, Paul
2006 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, VOLS 1-6, PROCEEDINGS, 2006, : 2309 - +

← 1 2 3 4 5 →