Improving the web text content by extracting significant pages into a Web Site

被引：0

作者：

Ríos, SA ^{[1
]}

Velásquez, JD ^{[1
]}

Vera, ES ^{[1
]}

Yasuda, H ^{[1
]}

Aoki, T ^{[1
]}

机构：

[1] Univ Tokyo, Res Ctr Adv Sci & Technol, Meguro Ku, Tokyo, Japan

来源：

5th International Conference on Intelligent Systems Design and Applications, Proceedings | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Web Systems have reached a very important role in today's business world. Every day organizations fight to keep their present clients and to gain new ones. In order to accomplish this goal it is very important to make precise changes in the web site content. However, the development of these improvements is a complex and specialized task because of the nature of the web data itself We propose a novel approach to successfully make changes to improve the web site content using text mining. We use a Self Organizing Feature Map (SOFM) to find the most relevant text content, and then we propose a reverse clustering analysis in order to extract the most significant pages of the whole web site. The effectiveness of this method was experimentally tested in a real web site.

引用

页码：32 / 36

页数：5

共 50 条

[1] Web site keywords: A methodology for improving gradually the web site text content
Velasquez, Juan D.
[J]. INTELLIGENT DATA ANALYSIS, 2012, 16 (02) : 327 - 348
[2] Extracting Content for News Web Pages based on DOM
Geng, Hua
Gao, Qiang
Pan, Jingui
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (02): : 124 - 129
[3] Extracting News Content with Visual Unit of Web Pages
Zhu, Wenhao
Dai, Song
Song, Yang
Lu, Zhiguo
[J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 211 - 215
[4] Extracting Topic Maps from Web Pages by Web Link Structure and Content
Mase, Motohiro
Yamada, Seiji
Nitta, Katsumi
[J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 1232 - +
[5] LBDA: A NOVEL FRAMEWORK FOR EXTRACTING CONTENT FROM WEB PAGES
Vijendran, Anna Saro
Deepa, C.
[J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2013,
[6] Extracting content structure for web pages based on visual representation
Cai, D
Yu, SP
Wen, JR
Ma, WY
[J]. WEB TECHNOLOGIES AND APPLICATIONS, 2003, 2642 : 406 - 417
[7] EXTRACTING THE SEMANTIC CONTENT OF WEB PAGES VIA REPEATED STRUCTURES
He, Zheng
Luo, Hangzai
Fan, Jianping
Liu, Xiao
[J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
[8] A hybrid approach for extracting informative content from web pages
Uzun, Erdinc
Agun, Hayri Volkan
Yerlikaya, Tarik
[J]. INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944
[9] Extracting news text from web pages: an application for the visually impaired
Lundgren, Erik
Papapetrou, Panagiotis
Asker, Lars
[J]. 8TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2015), 2015,
[10] Extracting the Main Content of Web Pages Using the First Impression Area
Jung, Geunseong
Han, Sungjae
Kim, Hansung
Kim, Kwanguk
Cha, Jaehyuk
[J]. IEEE ACCESS, 2022, 10 : 129958 - 129969

← 1 2 3 4 5 →