HierarchicalRank: Webpage Rank Improvement Using HTML']HTML TagLevel Similarity

被引：0

作者：

Sharma, Dilip ^{[1
]}

Ganeshiya, Deepak ^{[1
]}

机构：

[1] GLA Univ Mathura, Dept Comp Engn & Applicat, Mathura, Uttar Pradesh, India

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2018年 / 15卷 / 03期

关键词：

Web mining; web graph; hyperlink analysis; connectivity; pagerank; !text type='HTML']HTML[!/text] tags;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the past researches, two types of algorithms are introduced that are query dependent and query independent, works online or offline. PageRank Algorithm works offline independent to query while Hyperlink-Induced Topic Search (HITS) algorithm woks online dependent on query. One of the problems of these algorithms is that, division of the rank is based on number of inlinks, outlinks and different parameters used in hyperlink analysis which is dependent or independent to webpage content with the problem of topic drift. Previous researches were focused to solve this problem using the popularity of the outlink webpages. In this paper a novel algorithm for popularity measure is proposed based on similarity between query and Hierarchical text extracted from source and target webpage using Hyper Text Markup Language (HTML) tags importance parameter. In this paper, result of proposed method is compared with PageRank Algorithm and Topic Distillation with Query Dependent Link Connections and Page Characteristics results.

引用

页码：485 / 492

页数：8

共 50 条

[21] SGML to the rescue - Using SGML with modern HTML']HTML
Reichardt, Marcus
DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
[22] USING COOLLISTS TO INDEX HTML']HTML DOCUMENTS IN THE WEB
LIM, JG
COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2): : 147 - 154
[23] Finding and using the content texts of HTML']HTML pages
Ma, Jun
Chen, Zhumin
Lian, Li
Li, Lianxia
INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 656 - 662
[24] Using the structure of HTML']HTML documents to improve retrieval
Cutler, M
Shih, YM
Meng, WY
PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251
[25] A Web Content Extraction Method Base on Punctuation Distribution and HTML']HTML Tag Similarity
Gong, Nan
Fan, Chunxiao
Wu, Yuexin
Ming, Yue
LISS 2013, 2015, : 803 - 810
[26] ChemSymphony: A tool for publishing using HTML']HTML and on the WWW.
Tebbutt, P
Hodgkin, A
Krassavine, A
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1997, 214 : 58 - CINF
[27] Multipurpose Web publishing using HTML']HTML, XML, and CSS
Lie, HW
Saarela, J
COMMUNICATIONS OF THE ACM, 1999, 42 (10) : 95 - 101
[28] HTML']HTML Web Content Extraction Using Paragraph Tags
Carey, Howard J., III
Manic, Milos
PROCEEDINGS 2016 IEEE 25TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2016, : 1099 - 1104
[29] HTML']HTML-LSTM: Information Extraction from HTML']HTML Tables in Web Pages Using Tree-Structured LSTM
Kawamura, Kazuki
Yamamoto, Akihiro
DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 29 - 43
[30] Using HTML']HTML as a single source solution - A case study
Butkiewicz, M
Garriques, L
NAVIGATING THE FUTURE OF TECHNICAL COMMUNICATION, 2004, : 256 - 259

← 1 2 3 4 5 →