Extraction of Web Content Based on Content Type

被引:0
|
作者
Verma, Manish Kumar [1 ]
Kumar, Sarowar [1 ]
Abhishek, Kumar [1 ]
Singh, M. P. [1 ]
机构
[1] NIT, Dept CSE, Patna, Bihar, India
关键词
PHP; Data mining; Absolute and relative URLs; CSS; !text type='Java']Java[!/text]Script; DOM document; XAMPP;
D O I
10.1007/978-981-10-0129-1_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, World Wide Web has become an integral part of our life. We have entered into a digital era where everything we need is available online. For every task or information we think of, there exists a website for it. With so many websites running over the internet the amount of useless scripts, images, ads, videos, link have increased exponentially. These irrelevant information is making the sites heavy and taking a lot of resources to load properly. If these types of contents are removed from the site or at least restrict them from loading, then the surfing speed will improve a lot and a more precise and concise site will be loaded which will be easier to view and accurate. This paper proposes a method to load the contents of a website like links, images, videos, etc., as per user requirement and demand. Runtime tests have been performed on different types of websites such as educational sites, blogs, personal websites, e-commerce sites, etc. Results from these tests have been included in this paper which emphasize the fact that a concise and on demand loading of heavy web contents make web surfing easier and efficient.
引用
收藏
页码:105 / 113
页数:9
相关论文
共 50 条
  • [1] A Type of Web Content Extraction Algorithm Based on Adaptive Threshold
    Zheng, Guang
    Hui, Xianghui
    Xu, Xin
    Xi, Lei
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON SENSORS, MECHATRONICS AND AUTOMATION (ICSMA 2016), 2016, 136 : 244 - 250
  • [2] Web content extraction based on multiple strategies
    Gao, Yan
    Gu, Shiwen
    Tan, Liqiu
    [J]. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2007, 42 (04): : 473 - 477
  • [3] Entropy based Informative Content Density Approach for Efficient Web Content Extraction
    Annam, Manjusha
    Sajeev, G. P.
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 118 - 124
  • [4] DOM Tree Based Approach for Web Content Extraction
    Mehta, Bhavdeep
    Narvekar, Meera
    [J]. 2015 International Conference on Communication, Information & Computing Technology (ICCICT), 2015,
  • [5] Intelligent Web Robot for Content Extraction
    Wenxing HONG
    Jie LI
    Weiwei WANG
    Yang WENG
    [J]. Instrumentation, 2019, 6 (03) : 52 - 58
  • [6] Web Content Extraction Based on Subject Detection and Node Density
    Petprasit, Warid
    Jaiyen, Saichon
    [J]. 2015 7TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2015, : 121 - 125
  • [7] Web Information Extraction for content augmentation
    Janevski, A
    Dimitrova, N
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A389 - A392
  • [8] Basic Semantic Units Based Web Page Content Extraction
    Wang, Jingqi
    Chen, Qingcai
    Wang, Xiaolong
    Guo, Hongzhi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1488 - 1493
  • [9] Content-based Title Extraction from Web Page
    Gali, Najlah
    Franti, Pasi
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2 (WEBIST), 2016, : 204 - 210
  • [10] Web Page Classification based on Context to the Content Extraction of Articles
    Patel, Ankit Dilip
    Pandya, Vimal N.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 539 - 541