Extraction of Web Content Based on Content Type

被引：0

作者：

Verma, Manish Kumar ^{[1
]}

Kumar, Sarowar ^{[1
]}

Abhishek, Kumar ^{[1
]}

Singh, M. P. ^{[1
]}

机构：

[1] NIT, Dept CSE, Patna, Bihar, India

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1 | 2016年 / 408卷

关键词：

PHP; Data mining; Absolute and relative URLs; CSS; !text type='Java']Java[!/text]Script; DOM document; XAMPP;

D O I：

10.1007/978-981-10-0129-1_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Today, World Wide Web has become an integral part of our life. We have entered into a digital era where everything we need is available online. For every task or information we think of, there exists a website for it. With so many websites running over the internet the amount of useless scripts, images, ads, videos, link have increased exponentially. These irrelevant information is making the sites heavy and taking a lot of resources to load properly. If these types of contents are removed from the site or at least restrict them from loading, then the surfing speed will improve a lot and a more precise and concise site will be loaded which will be easier to view and accurate. This paper proposes a method to load the contents of a website like links, images, videos, etc., as per user requirement and demand. Runtime tests have been performed on different types of websites such as educational sites, blogs, personal websites, e-commerce sites, etc. Results from these tests have been included in this paper which emphasize the fact that a concise and on demand loading of heavy web contents make web surfing easier and efficient.

引用

页码：105 / 113

页数：9

共 50 条

[1] A Type of Web Content Extraction Algorithm Based on Adaptive Threshold
Zheng, Guang
Hui, Xianghui
Xu, Xin
Xi, Lei
[J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON SENSORS, MECHATRONICS AND AUTOMATION (ICSMA 2016), 2016, 136 : 244 - 250
[2] Web content extraction based on multiple strategies
Gao, Yan
Gu, Shiwen
Tan, Liqiu
[J]. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2007, 42 (04): : 473 - 477
[3] Entropy based Informative Content Density Approach for Efficient Web Content Extraction
Annam, Manjusha
Sajeev, G. P.
[J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 118 - 124
[4] DOM Tree Based Approach for Web Content Extraction
Mehta, Bhavdeep
Narvekar, Meera
[J]. 2015 International Conference on Communication, Information & Computing Technology (ICCICT), 2015,
[5] Intelligent Web Robot for Content Extraction
Wenxing HONG
Jie LI
Weiwei WANG
Yang WENG
[J]. Instrumentation, 2019, 6 (03) : 52 - 58
[6] Web Content Extraction Based on Subject Detection and Node Density
Petprasit, Warid
Jaiyen, Saichon
[J]. 2015 7TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2015, : 121 - 125
[7] Web Information Extraction for content augmentation
Janevski, A
Dimitrova, N
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A389 - A392
[8] Basic Semantic Units Based Web Page Content Extraction
Wang, Jingqi
Chen, Qingcai
Wang, Xiaolong
Guo, Hongzhi
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1488 - 1493
[9] Content-based Title Extraction from Web Page
Gali, Najlah
Franti, Pasi
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2 (WEBIST), 2016, : 204 - 210
[10] Web Page Classification based on Context to the Content Extraction of Articles
Patel, Ankit Dilip
Pandya, Vimal N.
[J]. 2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 539 - 541

← 1 2 3 4 5 →