Extracting insights from social media with large-scale matrix approximations

被引:2
|
作者
Sindhwani, V. [1 ]
Ghoting, A. [1 ]
Ting, E. [2 ]
Lawrence, R. [1 ]
机构
[1] IBM Corp, Div Res, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Software Grp, Silicon Valley Lab, San Jose, CA 95141 USA
关键词
FACTORIZATION; ALGORITHM;
D O I
10.1147/JRD.2011.2163281
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social media platforms such as blogs, Twitter (R) accounts, and online discussion sites are large-scale forums where every individual can potentially voice an influential public opinion. According to recent surveys, a massive number of Internet users are turning to such forums to collect recommendations and reviews for products and services, and to shape their individual choices and stances by the commentary of the online community as a whole. The unsupervised extraction of insight from unstructured user-generated web content requires new methodologies that are likely to be rooted in natural language processing and machine-learning techniques. Furthermore, the unprecedented scale of data begging to be analyzed necessitates the implementation of these methodologies on modern distributed computing platforms. In this paper, we describe a flexible new family of low-rank matrix approximation algorithms for modeling topics in a given corpus of documents (e.g., blog posts and tweets). We benchmark distributed optimization algorithms for running these models in a Hadoop (TM)-enabled cluster environment. We describe online learning strategies for tracking the evolution of ongoing topics and rapidly detecting the emergence of new themes in a streaming setting.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Measuring Brand Favorability Using Large-Scale Social Media Data
    Zhang, Kunpeng
    Moe, Wendy
    INFORMATION SYSTEMS RESEARCH, 2021, 32 (04) : 1128 - 1139
  • [32] Guest Editorial: Large-Scale Multimedia Content Analysis on Social Media
    Li, Haojie
    Zha, Zheng-Jun
    Huet, Benoit
    Tian, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (03) : 1365 - 1369
  • [33] Effective Continuum Approximations for Permeability in Brown-Coal and Other Large-Scale Fractured Media
    Hu, Roger
    Walsh, Stuart D. C.
    GEOSCIENCES, 2021, 11 (12)
  • [34] Identifying the effects of social media on health behavior: Data from a large-scale online experiment
    Zhang, Jingwen
    Brackbill, Devon
    Yang, Sijia
    Centola, Damon
    DATA IN BRIEF, 2015, 5 : 453 - 457
  • [35] Problematic Social Media Use: Results from a Large-Scale Nationally Representative Adolescent Sample
    Banyai, Fanni
    Zsila, Agnes
    Kiraly, Orsolya
    Maraz, Aniko
    Elekes, Zsuzsanna
    Griffiths, Mark D.
    Andreassen, Cecilie Schou
    Demetrovics, Zsolt
    PLOS ONE, 2017, 12 (01):
  • [36] Understanding Urban Park-Based Social Interaction in Shanghai During the COVID-19 Pandemic: Insights from Large-Scale Social Media Analysis
    Wang, Haotian
    Su, Tianyu
    Zhao, Wanting
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2025, 14 (02)
  • [37] Extracting Emergent Semantics from Large-Scale User-Generated Content
    Kompatsiaris, Ioannis
    Diplaris, Sotiris
    Papadopoulos, Symeon
    ICT INNOVATIONS 2011, 2011, 150 : 27 - 37
  • [38] Insights into Large-Scale Synthesis of Benfotiamine
    Hanganu, Anamaria
    Maximov, Maxim
    Maximov, Oana-Cristina
    Popescu, Codruta C.
    Sandu, Nicoleta
    Florea, Mihaela
    Mirea, Anca G.
    Garbea, Cristian
    Matache, Mihaela
    Funeriu, Daniel P.
    ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2024, 28 (11) : 4069 - 4078
  • [39] Efficacy of extracting indices from large-scale acoustic recordings to monitor biodiversity
    Buxton, Rachel T.
    McKenna, Megan F.
    Clapp, Mary
    Meyer, Erik
    Stabenau, Erik
    Angeloni, Lisa M.
    Crooks, Kevin
    Wittemyer, George
    CONSERVATION BIOLOGY, 2018, 32 (05) : 1174 - 1184
  • [40] APPROXIMATIONS FOR LARGE-SCALE CLOSED QUEUING-NETWORKS
    CHOW, WM
    PERFORMANCE EVALUATION, 1983, 3 (01) : 1 - 12