Extracting Templates from Web pages

被引:0
|
作者
Manjula, R. [1 ]
Chilambuchelvan, A. [2 ]
机构
[1] RMK Engn Coll Chennai, Dept CSE, Madras, Tamil Nadu, India
[2] RMK Engn Coll, Dept CSE, Madras, Tamil Nadu, India
关键词
Document Object Model; Minimum Description Length; Template Extraction; VIPS;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In today's world, World Wide Web is the most popular information providers. A website is a collection of web pages and Web pages usually include information for the users. The web sites are designed with common templates and content. The template is used to access the content easily by consistent structures even the templates are not explicitly announced. The current Template extraction techniques are degrading the performance of web applications such as search engine due to irrelevant terms in templates. Hence, we present a new method for detecting and extracting templates from web pages automatically by identifying the relevant information.
引用
收藏
页码:788 / 791
页数:4
相关论文
共 50 条
  • [1] Extracting Topic Maps from Web Pages
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. NEW FRONTIERS IN APPLIED DATA MINING, 2009, 5433 : 169 - +
  • [2] Adaptively extracting structured data from Web pages
    Guo, Yingnan
    Zhang, Jiajun
    Chen, Xing
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1524 - 1525
  • [3] Extracting Academic Information from Conference Web Pages
    Wang, Peng
    You, Yue
    Xu, Baowen
    Zhao, Jianyu
    [J]. 2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 952 - 959
  • [4] Finding and Extracting Data Records from Web Pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2010, 59 (01): : 123 - 137
  • [5] Finding and Extracting Data Records from Web Pages
    Manuel Álvarez
    Alberto Pan
    Juan Raposo
    Fernando Bellas
    Fidel Cacheda
    [J]. Journal of Signal Processing Systems, 2010, 59 : 123 - 137
  • [6] Extracting structured data from web pages (poster)
    Arasu, A
    Garcia-Molina, H
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 698 - 698
  • [7] Finding and extracting data records from web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2007, 4808 : 466 - 478
  • [8] LBDA: A NOVEL FRAMEWORK FOR EXTRACTING CONTENT FROM WEB PAGES
    Vijendran, Anna Saro
    Deepa, C.
    [J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2013,
  • [9] A novel algorithm for extracting the user reviews from web pages
    Ucar, Erdem
    Uzun, Erdinc
    Tufekci, Pinar
    [J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (05) : 696 - 712
  • [10] Robin: Extracting visual and textual features from web pages
    Oka, M
    Tsukada, H
    Kato, K
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 765 - 771