Documents, Topics, and Authors: Text Mining of Online News

被引:0
|
作者
Sertkan, Mete [1 ]
Neidhardt, Julia [1 ]
Werthner, Hannes [1 ]
机构
[1] TU Wien, Res Unit ECommerce, Vienna, Austria
关键词
Recommender Systems; Online News; Text Mining; Topic Modelling; Co-occurence Networks;
D O I
10.1109/CBI.2019.00053
中图分类号
F [经济];
学科分类号
02 ;
摘要
The goal of recommender systems is, in essence, to help people to discover items they might like, i.e., items that fit their preferences, personality, and needs. Depending on the respective domain, those items can be books, movies, music, hotels, and much more. Typically, recommendations are based on past user interactions (e.g., movies a user saw, hotels a user booked, etc.). This work in progress paper focuses on news recommender systems. Because of the nature of news (e.g., constantly new items, short item lifetime, etc.), recommendations based on past interactions are especially hard to make. Hence, news recommender systems heavily rely on the actual content of news. While previous work mainly considers one aspect of the content of news articles, we jointly analyse and discuss in this work a given corpora of news articles on three different levels (i.e., document-level, topic-level, and author-level). The overall aim is to set to provide the basis for a comprehensive news recommender system, which reaches beyond accuracy and considers also diversity and serendipity. We demonstrate that relevant information can be extracted out of a given corpora, and differences in author, time, and topic can be shown. Furthermore, the author-level analysis shows that documents can be clustered based on the writing style of authors. Finally, our findings show that author-level analysis has the potential to recommend the most diverse items compared to the other approaches.
引用
收藏
页码:405 / 413
页数:9
相关论文
共 50 条
  • [41] Text Mining Documents in Electronic Data Interchange Environment
    Zubi, Zakaria Suliman
    RECENT ADVANCES IN NEURAL NETWORKS, FUZZY SYSTEMS & EVOLUTIONARY COMPUTING, 2010, : 76 - 88
  • [42] Automated Text Mining for Requirements Analysis of Policy Documents
    Massey, Aaron K.
    Eisenstein, Jacob
    Anton, Annie, I
    Swire, Peter P.
    2013 21ST IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE), 2013, : 4 - 13
  • [43] Text Mining for Evaluating Authors' Birth and Death Years
    Mughaz, Dror
    Hacohen-Kerner, Yaakov
    Gabbay, Dov
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (01)
  • [44] Quantifying Online News Media Coverage of the COVID-19 Pandemic: Text Mining Study and Resource
    Krawczyk, Konrad
    Chelkowski, Tadeusz
    Laydon, Daniel J.
    Mishra, Swapnil
    Xifara, Denise
    Flaxman, Seth
    Mellan, Thomas
    Schwammle, Veit
    Rottger, Richard
    Hadsund, Johannes T.
    Bhatt, Samir
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (06)
  • [45] Discovering fashion industry trends in the online news by applying text mining and time series regression analysis
    Kim, Hyojung
    Park, Minjung
    HELIYON, 2023, 9 (07)
  • [46] Bringing Structure to Text: Mining Phrases, Entities, Topics, and Hierarchies
    Han, Jiawei
    Wang, Chi
    El-Kishky, Ahmed
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1968 - 1968
  • [47] Alkemio: association of chemicals with biomedical topics by text and data mining
    Gijon-Correas, Jose A.
    Andrade-Navarro, Miguel A.
    Fontaine, Jean F.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W422 - W429
  • [48] Text mining: identification of similarity of text documents using hybrid similarity model
    K. M. Shiva Prasad
    Iran Journal of Computer Science, 2023, 6 (2) : 123 - 135
  • [49] Exploring Topics and Genres in Storytime Books: A Text Mining Approach
    Joo, Soohyung
    Ingram, Erin
    Cahill, Maria
    EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2021, 16 (04): : 41 - 62
  • [50] An intelligent information system for organizing online text documents
    Han-joon Kim
    Sang-goo Lee
    Knowledge and Information Systems, 2004, 6 (2) : 125 - 149