Automatic Web Information Extraction and Alignment using CTVS Technique

被引:0
|
作者
Pandarge, Sangmesh S. [1 ]
Chakkarwar, V. A. [1 ]
机构
[1] Govt Coll Engn, Dept Comp Sci Engn, Aurangabad, Maharashtra, India
关键词
Web page; Query result records (QRRs); Tag tree format; Data region; Record segmentation; Web data extraction and Data alignment;
D O I
暂无
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
User hit the query on internet browser then it generates query's result from web databases which called as query result page. Basically, web browser provides query results having structured, semi-structured or unstructured in HTML web pages through web database. In this paper, the main objective is the automatically extracting web based data and aligns that information in a tabular form. The benefit of extracted data is mainly for knowledge discovery as well as comparison shopping purpose etc. Web page contains a very large data in regularly structured objects is called as data record. This paper presents one of the methods for web information extraction and alignment is CTVS which is novel and improved technique which exploits tag as well as value similarity in a web page. The proposed approach fetches information through query result pages automatically by identifying QRRs, construction of tag tree and separating QRRs (query result records) in a query result page. Extracted data can be aligned in pairwise or holistic alignment technique. The segmented query result records are arranged according to same attribute of data values in database table. The proposed technique is suitable for both contiguous and non-contiguous data regions because of result page contain some irrelevant data with having expected result data. The experimental result gives good accuracy in less time and highly effective in extracting the web data and aligning structured data records.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [31] Automatic ultra-precision alignment using Moire technique
    Zhang, JL
    Yu, LL
    Liu, JN
    Uchida, Y
    ADVANCES IN DYNAMICS, INSTRUMENTATION AND CONTROL, 2004, : 361 - 366
  • [32] Monitoring web information using PBD technique
    Tan, B
    Foo, S
    Hui, SC
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 666 - 672
  • [33] Utilization of automatic tagging using web information to datamining
    Sugimura, Hiroshi
    Matsumoto, Kazunori
    IEEJ Transactions on Electronics, Information and Systems, 2012, 132 (04): : 623 - 624
  • [34] Automatic Data Extraction of Websites Using Data Path Matching and Alignment
    Chu, Yu-Chun
    Hsu, Chiun-Chieh
    Lee, Chen-Jhe
    Tsai, Yu-Ting
    2015 FIFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS (ICDIPC), 2015, : 60 - 64
  • [35] A Review of an Information Extraction Technique Approach for Automatic Short Answer Grading
    Hasanah, Uswatun
    Permanasari, Adhistya Erna
    Kusumawardani, Suning
    Pribadi, Feddy Setio
    2016 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE), 2016, : 192 - 196
  • [36] Sentiment classification using Information Extraction technique
    Liu, J
    Yao, JX
    Wu, GF
    ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 216 - 227
  • [37] Automatic Bug Assignment Using Information Extraction Methods
    Shokripour, Ramin
    Kasirun, Zarinah M.
    Zamani, Sima
    Anvik, John
    2012 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2012, : 144 - 149
  • [38] Improved Automatic Keyphrase Extraction by Using Semantic Information
    Wang, XiaoLing
    Mu, DeJun
    Fang, Jun
    INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 1061 - 1065
  • [39] A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information
    Tsekouras, George E.
    Gavalas, Damianos
    Filios, Stefanos
    Niros, Antonios D.
    Bafaloukas, George
    ARTIFICIAL INTELLIGENCE: THEORIES, MODELS AND APPLICATIONS, SETN 2008, 2008, 5138 : 419 - 424
  • [40] STAVIES: A system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques
    Papadakis, NK
    Skoutas, D
    Raftopoulos, K
    Varvarigou, TA
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1638 - 1652