Extracting insights from social media with large-scale matrix approximations

被引:2
|
作者
Sindhwani, V. [1 ]
Ghoting, A. [1 ]
Ting, E. [2 ]
Lawrence, R. [1 ]
机构
[1] IBM Corp, Div Res, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Software Grp, Silicon Valley Lab, San Jose, CA 95141 USA
关键词
FACTORIZATION; ALGORITHM;
D O I
10.1147/JRD.2011.2163281
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social media platforms such as blogs, Twitter (R) accounts, and online discussion sites are large-scale forums where every individual can potentially voice an influential public opinion. According to recent surveys, a massive number of Internet users are turning to such forums to collect recommendations and reviews for products and services, and to shape their individual choices and stances by the commentary of the online community as a whole. The unsupervised extraction of insight from unstructured user-generated web content requires new methodologies that are likely to be rooted in natural language processing and machine-learning techniques. Furthermore, the unprecedented scale of data begging to be analyzed necessitates the implementation of these methodologies on modern distributed computing platforms. In this paper, we describe a flexible new family of low-rank matrix approximation algorithms for modeling topics in a given corpus of documents (e.g., blog posts and tweets). We benchmark distributed optimization algorithms for running these models in a Hadoop (TM)-enabled cluster environment. We describe online learning strategies for tracking the evolution of ongoing topics and rapidly detecting the emergence of new themes in a streaming setting.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A System for Extracting Sentiment from Large-Scale Arabic Social Data
    Wang, Hao
    Bommireddipalli, Vijay R.
    Hanafy, Ayman
    Bahgat, Mohamed
    Noeman, Sara
    Emam, Ossama S.
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 71 - 77
  • [2] RANDOMIZED SKETCHING FOR KRYLOV APPROXIMATIONS OF LARGE-SCALE MATRIX FUNCTIONS
    Guttel, Stefan
    Schweitzer, Marcel
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2023, 44 (03) : 1073 - 1095
  • [3] Extracting large-scale knowledge bases from the web
    Kumar, R
    Raghavan, P
    Rajagopalan, S
    Tomkins, A
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 639 - 650
  • [4] Appraising SPARK on Large-Scale Social Media Analysis
    Belcastro, Loris
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 483 - 495
  • [5] Large-Scale Social-Media Analytics on Stratosphere
    Boden, Christoph
    Markl, Volker
    Karnstedt, Marcel
    Fernandez, Miriam
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 257 - 260
  • [6] Insights into a spatially embedded social network from a large-scale snowball sample
    J. Illenberger
    M. Kowald
    K. W. Axhausen
    K. Nagel
    The European Physical Journal B, 2011, 84 : 549 - 561
  • [7] Insights into a spatially embedded social network from a large-scale snowball sample
    Illenberger, J.
    Kowald, M.
    Axhausen, K. W.
    Nagel, K.
    EUROPEAN PHYSICAL JOURNAL B, 2011, 84 (04): : 549 - 561
  • [8] Large-Scale Sleep Condition Analysis Using Selfies from Social Media
    Peng, Xuefeng
    Luo, Jiebo
    Glenn, Catherine
    Zhan, Jingyao
    Liu, Yuhan
    SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, 2017, 10354 : 151 - 161
  • [9] Reasoning human emotional responses from large-scale social and public media
    Li, Xianghua
    Wang, Zhen
    Gao, Chao
    Shi, Lei
    APPLIED MATHEMATICS AND COMPUTATION, 2017, 310 : 182 - 193
  • [10] Cross-domain semantic transfer from large-scale social media
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    MULTIMEDIA SYSTEMS, 2016, 22 (01) : 75 - 85