Fine-grained Web Content Classification via Entity-level Analytics: The Case of Semantic Fingerprinting

被引:3
|
作者
Govind [1 ]
Alec, Celine [1 ]
Spaniol, Marc [1 ]
机构
[1] Univ Caen Normandie, Dept Comp Sci, Campus Cote Nacre, F-14032 Caen, France
来源
JOURNAL OF WEB ENGINEERING | 2018年 / 17卷 / 6-7期
关键词
Fine-grained Web Content Classification; Entity-level Web Analytics; Advanced Web Engineering; Web Semantics; Semantic Fingerprinting; WORDNET;
D O I
10.13052/jwe1540-9589.17673
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Approaching three decades of Web contents being created, the amount of heterogeneous data of diverse provenance becomes seemingly over-whelming and its organization is a "continuous battle" against time. In parallel, business, sociological, political, and media analysts require a structured access to these contents in order to conduct their studies. To this end, concise and - at the same time - efficient engineering methods are required to classify Web contents accordingly. However, the whole task is not as simple as classifying something as A or B, but to assign the most suitable (sub-)category for each Web content based on a fine-grained classification scheme. In practice, the underlying type hierarchies are commonly excerpts of large scale ontologies containing several hundreds or even thousands of (sub-) types decomposed into a few top-level types. Having such a fine-grained type hierarchy, the engineering task of Web content classification becomes out-most challenging. Our main objective in this work is to investigate whether entity-level analytics can be utilized to characterize a Web content and align it onto a fine-grained hierarchy. We hypothesize that "You know a document by the named entities it contains". To this end, we present a novel concept, called "Semantic Fingerprinting" that allows Web content classification solely based on the information derived from the named entities contained in a Web document. It encodes the semantic nature of a Web content into a concise vector, namely the semantic fingerprint. Thus, we expect that semantic fingerprints, when utilized in combination with machine learning, will enable a fine-grained classification of Web contents. In order to empirically validate the effectiveness of semantic fingerprinting, we perform a case study on the classification of Wikipedia documents. Even further, we thoroughly examine the results obtained by analyzing the performance of Semantic Fingerprinting with respect to the characteristics of the data set used for the experiments. In addition, we also investigate performance aspects of the engineered approach by discussing the run-time in comparison with its competitor baselines. We observe that the semantic fingerprinting approach outperforms the state-of-the-art baselines as it raises Web contents to the entity-level and captures their core essence. Moreover, our approach achieves a superior run time performance on the test data in comparison to competitors.
引用
收藏
页码:449 / 482
页数:34
相关论文
共 50 条
  • [1] Semantic Fingerprinting: A Novel Method for Entity-Level Content Classification
    Govind
    Alec, Celine
    Spaniol, Marc
    [J]. WEB ENGINEERING, ICWE 2018, 2018, 10845 : 277 - 285
  • [2] There is a fine Line between Personalization and Surveillance: Semantic User Interest Tracing via Entity-level Analytics
    Kumar, Amit
    Spaniol, Marc
    [J]. PROCEEDINGS OF THE 14TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2022, 2022, : 22 - 33
  • [3] Fine-grained entity type classification with adaptive context
    Jin Liu
    Lina Wang
    Mingji Zhou
    Jin Wang
    Sungyoung Lee
    [J]. Soft Computing, 2018, 22 : 4307 - 4318
  • [4] Fine-grained entity type classification with adaptive context
    Liu, Jin
    Wang, Lina
    Zhou, Mingji
    Wang, Jin
    Lee, Sungyoung
    [J]. SOFT COMPUTING, 2018, 22 (13) : 4307 - 4318
  • [5] Neural Architectures for Fine-grained Entity Type Classification
    Shimaoka, Sonse
    Stenetorp, Pontus
    Inui, Kentaro
    Riedel, Sebastian
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1271 - 1280
  • [6] Diversified Semantic Attention Model for Fine-Grained Entity Typing
    Hu, Yanfeng
    Qiao, Xue
    Xing, Luo
    Peng, Chen
    [J]. IEEE ACCESS, 2021, 9 (09): : 2251 - 2265
  • [7] Corpus-Level Fine-Grained Entity Typing
    Yaghoobzadeh, Yadollah
    Adel, Heike
    Schuetze, Hinrich
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 835 - 862
  • [8] Visual Analytics for Fine-grained Text Classification Models and Datasets
    Battogtokh, M.
    Xing, Y.
    Davidescu, C.
    Abdul-Rahman, A.
    Luck, M.
    Borgo, R.
    [J]. COMPUTER GRAPHICS FORUM, 2024, 43 (03)
  • [9] Fine-Grained Named Entity Classification with Wikipedia Article Vectors
    Suzuki, Masatoshi
    Matsuda, Koji
    Sekine, Satoshi
    Okazaki, Naoaki
    Inui, Kentaro
    [J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 483 - 486
  • [10] Triple Classification Using Regions and Fine-Grained Entity Typing
    Dong, Tiansi
    Wang, Zhigang
    Li, Juanzi
    Bauckhage, Christian
    Cremers, Armin B.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 77 - 85