Documents, Topics, and Authors: Text Mining of Online News

被引:0
|
作者
Sertkan, Mete [1 ]
Neidhardt, Julia [1 ]
Werthner, Hannes [1 ]
机构
[1] TU Wien, Res Unit ECommerce, Vienna, Austria
关键词
Recommender Systems; Online News; Text Mining; Topic Modelling; Co-occurence Networks;
D O I
10.1109/CBI.2019.00053
中图分类号
F [经济];
学科分类号
02 ;
摘要
The goal of recommender systems is, in essence, to help people to discover items they might like, i.e., items that fit their preferences, personality, and needs. Depending on the respective domain, those items can be books, movies, music, hotels, and much more. Typically, recommendations are based on past user interactions (e.g., movies a user saw, hotels a user booked, etc.). This work in progress paper focuses on news recommender systems. Because of the nature of news (e.g., constantly new items, short item lifetime, etc.), recommendations based on past interactions are especially hard to make. Hence, news recommender systems heavily rely on the actual content of news. While previous work mainly considers one aspect of the content of news articles, we jointly analyse and discuss in this work a given corpora of news articles on three different levels (i.e., document-level, topic-level, and author-level). The overall aim is to set to provide the basis for a comprehensive news recommender system, which reaches beyond accuracy and considers also diversity and serendipity. We demonstrate that relevant information can be extracted out of a given corpora, and differences in author, time, and topic can be shown. Furthermore, the author-level analysis shows that documents can be clustered based on the writing style of authors. Finally, our findings show that author-level analysis has the potential to recommend the most diverse items compared to the other approaches.
引用
收藏
页码:405 / 413
页数:9
相关论文
共 50 条
  • [22] Text mining in scientific publications with Argentine authors
    Dorr, Ricardo A.
    Jose Casal, Juan
    Toriano, Roxana
    MEDICINA-BUENOS AIRES, 2021, 81 (02) : 214 - 223
  • [23] Health Topics Mining in Online Medical Community
    Liu, Xin
    Wu, Dapeng
    Peng, Haiying
    Wang, Ruyan
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [24] Intelligence gathering from online NEWS documents
    Suryanarayanan, Mahalakshmi G.
    Selvaraju, Sendhilkumar
    2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 436 - +
  • [25] A Corpus of Images and Text in Online News
    Hollink, Laura
    Bedjeti, Adriatik
    van Harmelen, Martin
    Elliott, Desmond
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1377 - 1382
  • [26] Geotagging Named Entities in News and Online Documents
    Yu, Jiangwei
    Rafiei, Davood
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1321 - 1330
  • [27] Research mining using the relationships among authors, topics and papers
    Ichise, Ryutaro
    Fujita, Setsu
    Muraki, Taichi
    Takeda, Hideaki
    11TH INTERNATIONAL CONFERENCE INFORMATION VISUALIZATION, 2007, : 425 - +
  • [28] Vectorization of Text Documents for Identifying Unifiable News Articles
    Singh, Anita Kumari
    Shashi, Mogalla
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (07) : 305 - 310
  • [29] Discovering news topics from microblogs based on hidden topics analysis and text clustering
    Lu, Rong
    Xiang, Liang
    Liu, Ming-Rong
    Yang, Qing
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (03): : 382 - 387
  • [30] Deep Text Mining for Automatic Keyphrase Extraction from Text Documents
    Abulaish, Muhammad
    Jahiruddin
    Dey, Lipika
    JOURNAL OF INTELLIGENT SYSTEMS, 2011, 20 (04) : 327 - 351