Separating the signal from the noise - Financial machine learning for Twitter

被引:19
|
作者
Schnaubelt, Matthias [1 ]
Fischer, Thomas G. [1 ]
Krauss, Christopher [1 ]
机构
[1] Univ Erlangen Nurnberg, Dept Stat & Econometr, Lange Gasse 20, Nurnberg 90403, Germany
来源
关键词
Finance; Statistical arbitrage; Machine learning; Natural language processing; DEEP NEURAL-NETWORKS; STATISTICAL ARBITRAGE; INFORMATION-CONTENT; MICROBLOGGING DATA; MARKET PREDICTION; SENTIMENT; NEWS; CLASSIFICATION; TALK;
D O I
10.1016/j.jedc.2020.103895
中图分类号
F [经济];
学科分类号
02 ;
摘要
Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of millions of tweets on intraday returns of the S&P 500 constituents from 2014 and 2015. For this purpose, we design a machine learning system addressing specific challenges inherent to this task. At first, building on the literature of financial dictionaries, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off-between tweet coverage and tweet relevance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Separating signal from noise
    Lev, Nir
    Peled, Ron
    Peres, Yuval
    PROCEEDINGS OF THE LONDON MATHEMATICAL SOCIETY, 2015, 110 : 883 - 931
  • [2] Separating signal from noise in Sahelanthropus
    Meyer, Marc R.
    Jung, Jason P.
    Araiza, Isabella F. X.
    Williams, Scott A.
    AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2023, 180 : 118 - 118
  • [3] Sentiment analysis of financial Twitter posts on Twitter with the machine learning classifiers
    Cam, Handan
    Cam, Alper Veli
    Demirel, Ugur
    Ahmed, Sana
    HELIYON, 2024, 10 (01)
  • [4] Splicing heterogeneity: separating signal from noise
    Yihan Wan
    Daniel R. Larson
    Genome Biology, 19
  • [5] Probiotics and sepsis: separating the signal from the noise
    Litton, Edward
    Currie, Andrew
    Raby, Edward
    INTENSIVE CARE MEDICINE, 2021, 47 (08) : 924 - 925
  • [6] MicroRNA profiling: separating signal from noise
    Monya Baker
    Nature Methods, 2010, 7 : 687 - 692
  • [7] Probiotics and sepsis: separating the signal from the noise
    Edward Litton
    Andrew Currie
    Edward Raby
    Intensive Care Medicine, 2021, 47 : 924 - 925
  • [8] MicroRNA profiling: separating signal from noise
    Baker, Monya
    NATURE METHODS, 2010, 7 (09) : 687 - 692
  • [9] Commentary: Separating the signal from the noise in epilepsy
    Sirven, Joseph I.
    EPILEPSY & BEHAVIOR, 2013, 28 (03) : 538 - 538
  • [10] Splicing heterogeneity: separating signal from noise
    Wan, Yihan
    Larson, Daniel R.
    GENOME BIOLOGY, 2018, 19