Leveraging User-Generated Content for News Search

被引:0
|
作者
McCreadie, Richard M. C. [1 ]
机构
[1] Univ Glasgow, Dept Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
关键词
News; Blogs; Social Media;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the last few years both availability and accessibility of current news stories on the Web have dramatically improved [3]. In particular, users can now access news from a variety of sources hosted on the Web, from newswire presences such as the New York Times, to integrated news search within Web search engines. However, of central interest is the emerging impact that user-generated content (UGC) is having on this online news landscape. Indeed, the emergence of Web 2.0 has turned a static news consumer base into a dynamic news machine, where news stories are summarised and commented upon. In summary, value is being added to each news story in terms of additional content. Importantly, however, while there has been movement in commercial circles to exploit this extra value to enrich online news [5], there has been little research from the academic community on how can be achieved. Indeed, the main purpose of this thesis is to research practical techniques for the integration of UGC to improve the news search component of the most ubiquitous of Web tools, i.e the Web search engine. Importantly, we identify the following three key aspects of news search which might be improved through the application of UGC. Intuitively, the first task that the news vertical search aspect of a Web search engine needs to accomplish when confronted with a user query is to decide whether the query is in fact news-related, and hence requires news content to be included. However, queries themselves are sparse in nature, being often comprised of one of two tokens only. This presents issues when performing query classification, as there are few features to distinguish the news related queries. We attest that UGC can help alleviate this ambiguity. Indeed, we hypothesise that there is a strong link between the volume of UGC content being posted mentioning a query and the likelihood of that query being news-related within a specific timeframe. Secondly, we consider the task of real-time event detection. It is imperative for search engines to maintain knowledge of the events of the moment, such that the results displayed are updated. Traditionally, systems have detected new events through the clustering of newswire articles [1]. However, in the current fast-paced news search environment where users begin querying for events within a couple of minutes of their occurrence [4], relying on slow newswire reporting is unacceptable. On the other-hand, UGC sources such as Twitter provide a natural alternative, as the high post rate and popularity of news topics makes a site such as this an ideal medium from which to monitor emerging events. Indeed, many paid journalists maintain personal blogs and other social media accounts for the reporting of fast-breaking news stories [2]. Lastly, we examine the presentation of results to the user. The presentation of news articles to satisfy news-searches is generally accepted. However, with the ever-increasing pace of news reporting world-wide, there is now no guarantee that a trusted news source will have yet published upon the story. In these cases, one must look else-where for content to satisfy the user. We hypothesise that UGC is ideal for presentation in these cases as the delay between an event occurring and commentary appearing in UGC sources like Twitter or the Blogosphere is mear minutes. Moreover some information needs cannot be easily solved using newswire articles alone. For example, the correct result for the query 'current news' would be a list of news stories ranked by their importance for the day in question. This is a difficult ranking problem, as 'importance' is greatly dependent upon the perspective of the user. In this case, one solution might be to leverage 'public opinion' as represented in UGC, for example by taking 'the pulse of the Blogosphere'. Indeed, we have examined such during TREC 2009. In conclusion, we have identified multiple areas of the news-search process which cannot be satisfied by traditional newswire articles. We hypothesise that the application of user-generated content can be leveraged to improve the field of news-search in relation to the rich and timely information that UGC provides.
引用
下载
收藏
页码:919 / 919
页数:1
相关论文
共 50 条
  • [1] User-Generated Content and Bias in News Media
    Yildirim, Pinar
    Gal-Or, Esther
    Geylani, Tansev
    MANAGEMENT SCIENCE, 2013, 59 (12) : 2655 - 2666
  • [2] User-generated content
    Greenfield, David
    CONTROL ENGINEERING, 2009, 56 (10) : 2 - 2
  • [3] User-generated content
    Wofford, Jennifer
    NEW MEDIA & SOCIETY, 2012, 14 (07) : 1236 - 1239
  • [4] USER-GENERATED CONTENT AND THE NEWS Empowerment of citizens or interactive illusion?
    Joesson, Anna Maria
    Orbring, Henrik
    JOURNALISM PRACTICE, 2011, 5 (02) : 127 - 144
  • [5] Cable News Wars on the Internet: Competition and User-Generated Content
    Sabnis, Gaurav
    Grewal, Rajdeep
    INFORMATION SYSTEMS RESEARCH, 2015, 26 (02) : 301 - 319
  • [6] Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content
    Bordino, Ilaria
    Mejova, Yelena
    Lalmas, Mounia
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 109 - 118
  • [7] Introduction to the Special Section on Search and Mining User-Generated Content
    Carlos Cortizo, Jose
    Carrero, Francisco
    Cantador, Ivan
    Antonio Troyano, Jose
    Rosso, Paolo
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (04)
  • [8] User-Generated Content Introduction
    Krumm, John
    Davies, Nigel
    Narayanaswami, Chandra
    IEEE PERVASIVE COMPUTING, 2008, 7 (04) : 10 - 11
  • [9] Differentiation with User-Generated Content
    Zhang, Kaifu
    Sarvary, Miklos
    MANAGEMENT SCIENCE, 2015, 61 (04) : 898 - 914
  • [10] The Power of User-Generated Content
    Jagger P.
    ITNOW, 2023, 65 (01) : 32 - 33