Separating the signal from the noise - Financial machine learning for Twitter

被引:19
|
作者
Schnaubelt, Matthias [1 ]
Fischer, Thomas G. [1 ]
Krauss, Christopher [1 ]
机构
[1] Univ Erlangen Nurnberg, Dept Stat & Econometr, Lange Gasse 20, Nurnberg 90403, Germany
来源
关键词
Finance; Statistical arbitrage; Machine learning; Natural language processing; DEEP NEURAL-NETWORKS; STATISTICAL ARBITRAGE; INFORMATION-CONTENT; MICROBLOGGING DATA; MARKET PREDICTION; SENTIMENT; NEWS; CLASSIFICATION; TALK;
D O I
10.1016/j.jedc.2020.103895
中图分类号
F [经济];
学科分类号
02 ;
摘要
Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of millions of tweets on intraday returns of the S&P 500 constituents from 2014 and 2015. For this purpose, we design a machine learning system addressing specific challenges inherent to this task. At first, building on the literature of financial dictionaries, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off-between tweet coverage and tweet relevance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] An algorithm for separating multilevel random telegraph signal from 1/f noise
    Giusi, G.
    Crupi, F.
    Pace, C.
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2008, 79 (02):
  • [32] Word Adjacency Graph Modeling: Separating Signal From Noise in Big Data
    Miller, Wendy R.
    Groves, Doyle
    Knopf, Amelia
    Otte, Julie L.
    Silverman, Ross D.
    WESTERN JOURNAL OF NURSING RESEARCH, 2017, 39 (01) : 166 - 185
  • [33] Separating signal from noise in the design and analysis of host-microbial communities
    Triplett, E. W.
    PHYTOPATHOLOGY, 2014, 104 (11) : 155 - 155
  • [34] Noise as Signal in Learning from Rare Events
    Maslach, David
    Branzei, Oana
    Rerup, Claus
    Zbaracki, Mark J.
    ORGANIZATION SCIENCE, 2018, 29 (02) : 225 - 246
  • [35] NOISE IMMUNITY OF A LINEAR AMPLITUDE DETECTOR WITH A THRESHOLD WHEN SEPARATING A WEAK SINUSOIDAL SIGNAL FROM NOISE
    KENIN, LM
    TELECOMMUNICATIONS AND RADIO ENGINEER-USSR, 1968, (08): : 90 - &
  • [36] Frequency-domain method for separating signal and noise
    Wang, ZM
    Duan, XJ
    SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2000, 43 (01): : 9 - 16
  • [37] Frequency-domain method for separating signal and noise
    Zhengming Wang
    Xiaojun Duan
    Science in China Series E: Technological Sciences, 2000, 43 : 9 - 16
  • [38] Frequency-domain method for separating signal and noise
    王正明
    段晓君
    Science in China(Series E:Technological Sciences), 2000, (01) : 9 - 16
  • [39] Separating Decision and Encoding Noise in Signal Detection Tasks
    Cabrera, Carlos Alexander
    Lu, Zhong-Lin
    Dosher, Barbara Anne
    PSYCHOLOGICAL REVIEW, 2015, 122 (03) : 429 - 460
  • [40] Machine learning aided noise filtration and signal classification for CREDO experiment
    Bibrzycki, Lukasz
    Bar, Olaf
    Piekarczyk, Marcin
    Niedzwiecki, Michal
    Rzecki, Krzysztof
    Stuglik, Slawomir
    Homola, Piotr
    37TH INTERNATIONAL COSMIC RAY CONFERENCE, ICRC2021, 2022,