HierarchicalRank: Webpage Rank Improvement Using HTML']HTML TagLevel Similarity

被引:0
|
作者
Sharma, Dilip [1 ]
Ganeshiya, Deepak [1 ]
机构
[1] GLA Univ Mathura, Dept Comp Engn & Applicat, Mathura, Uttar Pradesh, India
关键词
Web mining; web graph; hyperlink analysis; connectivity; pagerank; !text type='HTML']HTML[!/text] tags;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past researches, two types of algorithms are introduced that are query dependent and query independent, works online or offline. PageRank Algorithm works offline independent to query while Hyperlink-Induced Topic Search (HITS) algorithm woks online dependent on query. One of the problems of these algorithms is that, division of the rank is based on number of inlinks, outlinks and different parameters used in hyperlink analysis which is dependent or independent to webpage content with the problem of topic drift. Previous researches were focused to solve this problem using the popularity of the outlink webpages. In this paper a novel algorithm for popularity measure is proposed based on similarity between query and Hierarchical text extracted from source and target webpage using Hyper Text Markup Language (HTML) tags importance parameter. In this paper, result of proposed method is compared with PageRank Algorithm and Topic Distillation with Query Dependent Link Connections and Page Characteristics results.
引用
收藏
页码:485 / 492
页数:8
相关论文
共 50 条
  • [21] SGML to the rescue - Using SGML with modern HTML']HTML
    Reichardt, Marcus
    DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [22] USING COOLLISTS TO INDEX HTML']HTML DOCUMENTS IN THE WEB
    LIM, JG
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2): : 147 - 154
  • [23] Finding and using the content texts of HTML']HTML pages
    Ma, Jun
    Chen, Zhumin
    Lian, Li
    Li, Lianxia
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 656 - 662
  • [24] Using the structure of HTML']HTML documents to improve retrieval
    Cutler, M
    Shih, YM
    Meng, WY
    PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251
  • [25] A Web Content Extraction Method Base on Punctuation Distribution and HTML']HTML Tag Similarity
    Gong, Nan
    Fan, Chunxiao
    Wu, Yuexin
    Ming, Yue
    LISS 2013, 2015, : 803 - 810
  • [26] ChemSymphony: A tool for publishing using HTML']HTML and on the WWW.
    Tebbutt, P
    Hodgkin, A
    Krassavine, A
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1997, 214 : 58 - CINF
  • [27] Multipurpose Web publishing using HTML']HTML, XML, and CSS
    Lie, HW
    Saarela, J
    COMMUNICATIONS OF THE ACM, 1999, 42 (10) : 95 - 101
  • [28] HTML']HTML Web Content Extraction Using Paragraph Tags
    Carey, Howard J., III
    Manic, Milos
    PROCEEDINGS 2016 IEEE 25TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2016, : 1099 - 1104
  • [29] HTML']HTML-LSTM: Information Extraction from HTML']HTML Tables in Web Pages Using Tree-Structured LSTM
    Kawamura, Kazuki
    Yamamoto, Akihiro
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 29 - 43
  • [30] Using HTML']HTML as a single source solution - A case study
    Butkiewicz, M
    Garriques, L
    NAVIGATING THE FUTURE OF TECHNICAL COMMUNICATION, 2004, : 256 - 259