Improving the web text content by extracting significant pages into a Web Site

被引:0
|
作者
Ríos, SA [1 ]
Velásquez, JD [1 ]
Vera, ES [1 ]
Yasuda, H [1 ]
Aoki, T [1 ]
机构
[1] Univ Tokyo, Res Ctr Adv Sci & Technol, Meguro Ku, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web Systems have reached a very important role in today's business world. Every day organizations fight to keep their present clients and to gain new ones. In order to accomplish this goal it is very important to make precise changes in the web site content. However, the development of these improvements is a complex and specialized task because of the nature of the web data itself We propose a novel approach to successfully make changes to improve the web site content using text mining. We use a Self Organizing Feature Map (SOFM) to find the most relevant text content, and then we propose a reverse clustering analysis in order to extract the most significant pages of the whole web site. The effectiveness of this method was experimentally tested in a real web site.
引用
收藏
页码:32 / 36
页数:5
相关论文
共 50 条
  • [1] Web site keywords: A methodology for improving gradually the web site text content
    Velasquez, Juan D.
    [J]. INTELLIGENT DATA ANALYSIS, 2012, 16 (02) : 327 - 348
  • [2] Extracting Content for News Web Pages based on DOM
    Geng, Hua
    Gao, Qiang
    Pan, Jingui
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (02): : 124 - 129
  • [3] Extracting News Content with Visual Unit of Web Pages
    Zhu, Wenhao
    Dai, Song
    Song, Yang
    Lu, Zhiguo
    [J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 211 - 215
  • [4] Extracting Topic Maps from Web Pages by Web Link Structure and Content
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 1232 - +
  • [5] LBDA: A NOVEL FRAMEWORK FOR EXTRACTING CONTENT FROM WEB PAGES
    Vijendran, Anna Saro
    Deepa, C.
    [J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2013,
  • [6] Extracting content structure for web pages based on visual representation
    Cai, D
    Yu, SP
    Wen, JR
    Ma, WY
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2003, 2642 : 406 - 417
  • [7] EXTRACTING THE SEMANTIC CONTENT OF WEB PAGES VIA REPEATED STRUCTURES
    He, Zheng
    Luo, Hangzai
    Fan, Jianping
    Liu, Xiao
    [J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [8] A hybrid approach for extracting informative content from web pages
    Uzun, Erdinc
    Agun, Hayri Volkan
    Yerlikaya, Tarik
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944
  • [9] Extracting news text from web pages: an application for the visually impaired
    Lundgren, Erik
    Papapetrou, Panagiotis
    Asker, Lars
    [J]. 8TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2015), 2015,
  • [10] Extracting the Main Content of Web Pages Using the First Impression Area
    Jung, Geunseong
    Han, Sungjae
    Kim, Hansung
    Kim, Kwanguk
    Cha, Jaehyuk
    [J]. IEEE ACCESS, 2022, 10 : 129958 - 129969