Named Entities as Key Features for Detecting Semantically Similar News Articles

被引:0
|
作者
Novo, Anne Stockem [1 ]
Gedikli, Fatih [1 ]
机构
[1] Ruhr West Univ Appl Sci, Inst Comp Sci, Duisburger Str 100, D-45479 Mulheim, Germany
关键词
Near-duplicate detection; news articles; explainability; BERT; SHAP;
D O I
10.1142/S1793351X23300030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of this work is detecting semantically similar news articles for search engines and recommender systems which is an important step towards processing and understanding natural language. Search engines and recommender systems typically filter out near-duplicate articles which are often just a paraphrasing of a previous article and therefore irrelevant for the users. Articles with a high level of overlapping content are not interesting to the reader and should be avoided. Here, we focus on named entities, such as people, organizations and places, and their role as a key feature for identifying near-duplicate articles. Since our dataset from the energy business contains a significant amount of paraphrased articles, standard techniques, e.g. based on the Jaccard coefficient, already serve quite well. A fine-tuned BERT model evaluated on named entities achieves best model results with more than 97% accuracy and highest True Positive Rates. The importance of individual words for the model decisions is evaluated by computing their Shapley values. It was found that the explanations are in overall good agreement with the human intuitive interpretation.
引用
收藏
页码:633 / 649
页数:17
相关论文
共 50 条
  • [41] Not all fake news is semantically similar: Contextual semantic representation learning for multimodal fake news detection
    Peng, Liwen
    Jian, Songlei
    Kan, Zhigang
    Qiao, Linbo
    Li, Dongsheng
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [42] The Power of Temporal Features for Classifying News Articles
    Lange, Lukas
    Alonso, Omar
    Stroetgen, Jannik
    COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1159 - 1160
  • [43] Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles
    Dacon, Jamell
    Liu, Haochen
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 385 - 392
  • [44] Visualization of Similar News Articles with Network Analysis and Text Mining
    Imai, Takayuki
    Nakamura, Keita
    Ohmameuda, Toshiaki
    2015 IEEE 4TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2015, : 151 - 152
  • [45] Multilingual news clustering:: Feature translation vs. identification of cognate named entities
    Montalvo, S.
    Martinez, R.
    Casillas, A.
    Fresno, V.
    PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2305 - 2311
  • [46] Proposal for Named Entities Recognition and Classification (NERC) and the Automatic Generation of Rules on Mexican News
    Ramos Flores, Orlando
    Pinto, David
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 533 - 538
  • [47] SELEcTor: Discovering Similar Entities on LinkEd DaTa by Ranking their Features
    Ruback, Livia
    Casanova, Marco Antonio
    Renso, Chiara
    Lucchese, Claudio
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2017, : 117 - 124
  • [48] Detecting Political Bias in News Articles Using Headline Attention
    Reddy, Rama Rohit
    Duggenpudi, Suma Reddy
    Mamidi, Radhika
    BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 77 - 84
  • [49] Hybrid Neural Network Models for Detecting Fake News Articles
    Ashwaq Khalil
    Moath Jarrah
    Monther Aldwairi
    Human-Centric Intelligent Systems, 2024, 4 (1): : 136 - 146
  • [50] Detecting Factual and Non-Factual Content in News Articles
    Sahu, Ishan
    Majumdar, Debapriyo
    PROCEEDINGS OF THE FOURTH ACM IKDD CONFERENCES ON DATA SCIENCES (CODS '17), 2017,