Separating the signal from the noise - Financial machine learning for Twitter

被引:19
|
作者
Schnaubelt, Matthias [1 ]
Fischer, Thomas G. [1 ]
Krauss, Christopher [1 ]
机构
[1] Univ Erlangen Nurnberg, Dept Stat & Econometr, Lange Gasse 20, Nurnberg 90403, Germany
来源
关键词
Finance; Statistical arbitrage; Machine learning; Natural language processing; DEEP NEURAL-NETWORKS; STATISTICAL ARBITRAGE; INFORMATION-CONTENT; MICROBLOGGING DATA; MARKET PREDICTION; SENTIMENT; NEWS; CLASSIFICATION; TALK;
D O I
10.1016/j.jedc.2020.103895
中图分类号
F [经济];
学科分类号
02 ;
摘要
Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of millions of tweets on intraday returns of the S&P 500 constituents from 2014 and 2015. For this purpose, we design a machine learning system addressing specific challenges inherent to this task. At first, building on the literature of financial dictionaries, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off-between tweet coverage and tweet relevance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Separating Signal from Noise using Patch Recurrence Across Scales
    Zontak, Maria
    Mosseri, Inbar
    Irani, Michal
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 1195 - 1202
  • [22] METHOD FOR SEPARATING THE NOISE FROM THE SIGNAL BASED ON THE FOURIER-TRANSFORM
    CHELEBIEV, EV
    BORISSOVA, YH
    DOKLADI NA BOLGARSKATA AKADEMIYA NA NAUKITE, 1990, 43 (05): : 5 - 7
  • [23] Coronavirus Disease 2019 Pandemic and Infodemic: Separating the Signal From the Noise
    Wan, Kelvin H.
    Radke, Nishant Vijay
    Wong, Raymond L. M.
    Jonas, Jost B.
    ASIA-PACIFIC JOURNAL OF OPHTHALMOLOGY, 2023, 12 (06): : 507 - 508
  • [24] Separating signal from noise: the challenge of identifying useful biomarkers in sepsis
    Russell J McCulloh
    John A Spertus
    Critical Care, 18
  • [25] Behavior and physiology of mechanoreception: separating signal and noise
    Montgomery, John C.
    Windsor, Shane
    Bassett, Daniel
    INTEGRATIVE ZOOLOGY, 2009, 4 (01): : 3 - 12
  • [26] Machine learning for low signal-to-noise ratio detection
    Lacy, Fred
    Ruiz-Reyes, Angel
    Brescia, Anthony
    PATTERN RECOGNITION LETTERS, 2024, 179 : 115 - 122
  • [27] An EMD and PCA hybrid approach for separating noise from signal, and signal in climate change detection
    Lee, Taesam
    Ouarda, T. B. M. J.
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2012, 32 (04) : 624 - 634
  • [28] Separating the signal from the noise in metagenomic cell-free DNA sequencing
    Philip Burnham
    Nardhy Gomez-Lopez
    Michael Heyang
    Alexandre Pellan Cheng
    Joan Sesing Lenz
    Darshana M. Dadhania
    John Richard Lee
    Manikkam Suthanthiran
    Roberto Romero
    Iwijn De Vlaminck
    Microbiome, 8
  • [29] Separating the signal from the noise in metagenomic cell-free DNA sequencing
    Burnham, Philip
    Gomez-Lopez, Nardhy
    Heyang, Michael
    Cheng, Alexandre Pellan
    Lenz, Joan Sesing
    Dadhania, Darshana M.
    Lee, John Richard
    Suthanthiran, Manikkam
    Romero, Roberto
    De Vlaminck, Iwijn
    MICROBIOME, 2020, 8 (01)
  • [30] STATISTICAL PROCESS CONTROL: SEPARATING SIGNAL FROM NOISE IN EMERGENCY DEPARTMENT OPERATIONS
    Pimentel, Laura
    Barrueto, Fermin, Jr.
    JOURNAL OF EMERGENCY MEDICINE, 2015, 48 (05): : 628 - 638