Automatic Web Information Extraction and Alignment using CTVS Technique

被引:0
|
作者
Pandarge, Sangmesh S. [1 ]
Chakkarwar, V. A. [1 ]
机构
[1] Govt Coll Engn, Dept Comp Sci Engn, Aurangabad, Maharashtra, India
关键词
Web page; Query result records (QRRs); Tag tree format; Data region; Record segmentation; Web data extraction and Data alignment;
D O I
暂无
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
User hit the query on internet browser then it generates query's result from web databases which called as query result page. Basically, web browser provides query results having structured, semi-structured or unstructured in HTML web pages through web database. In this paper, the main objective is the automatically extracting web based data and aligns that information in a tabular form. The benefit of extracted data is mainly for knowledge discovery as well as comparison shopping purpose etc. Web page contains a very large data in regularly structured objects is called as data record. This paper presents one of the methods for web information extraction and alignment is CTVS which is novel and improved technique which exploits tag as well as value similarity in a web page. The proposed approach fetches information through query result pages automatically by identifying QRRs, construction of tag tree and separating QRRs (query result records) in a query result page. Extracted data can be aligned in pairwise or holistic alignment technique. The segmented query result records are arranged according to same attribute of data values in database table. The proposed technique is suitable for both contiguous and non-contiguous data regions because of result page contain some irrelevant data with having expected result data. The experimental result gives good accuracy in less time and highly effective in extracting the web data and aligning structured data records.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [41] An automatic alignment technique for multiple rangefinders
    Fujiwara, Kenta
    Yamauchi, Koichiro
    Sato, Yukio
    THREE-DIMENSIONAL IMAGE CAPTURE AND APPLICATIONS 2008, 2008, 6805
  • [42] AN AUTOMATIC MATCHING TECHNIQUE FOR PATIENT ALIGNMENT
    BADRAN, AK
    FISHER, AC
    DURRANI, TS
    PAUL, JP
    JOURNAL OF BIOMEDICAL ENGINEERING, 1991, 13 (04): : 281 - 286
  • [43] Automatic support for the alignment of multilingual Web sites
    Tonella, Paolo
    Ricca, Filippo
    Pianta, Emanuele
    Girardi, Christian
    JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2006, 18 (03): : 153 - 179
  • [44] Accessing Deep Web Using Automatic Query Translation Technique
    Liang, Hao
    Zuo, Wanli
    Ren, Fei
    Sun, Chong
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 267 - 271
  • [45] To improve the web personalization using the boosted random forest for web information extraction
    Rao P.S.
    Devara V.
    Recent Advances in Computer Science and Communications, 2020, 13 (06) : 1264 - 1268
  • [46] Towards Web Information Extraction using Extraction Ontologies and (Indirectly) Domain Ontologies
    Labsky, Martin
    Nekvasil, Marek
    Svatek, Vojtch
    K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 201 - 202
  • [47] An automatic label extraction technique for domain-specific hidden web crawling (LEHW)
    El-Desouky, Ali I.
    Ali, Hesham A.
    El-Ghamrawy, Sally M.
    2006 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2006, : 454 - +
  • [48] Research on Automatic Extraction of Web Metadata
    Hu Changxia
    Liu Xiaoxing
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 449 - 452
  • [49] Solution for automatic Web review extraction
    Liu W.
    Yan H.-L.
    Xiao J.-G.
    Zeng J.-X.
    Ruan Jian Xue Bao/Journal of Software, 2010, 21 (12): : 3220 - 3236
  • [50] Automatic extraction of meaning from the web
    Cilibrasi, Rudi
    Vitanyi, Paul
    2006 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, VOLS 1-6, PROCEEDINGS, 2006, : 2309 - +