Event detection in finance using hierarchical clustering algorithms on news and tweets

被引:12
|
作者
Carta, Salvatore [1 ]
Consoli, Sergio [2 ]
Piras, Luca [1 ]
Podda, Alessandro Sebastian [1 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, Cagliari, Italy
[2] European Commiss, Joint Res Ctr DG JRC, Ispra, Varese, Italy
关键词
Natural language processing; Event detection; News analysis; Social media; Finance; Hierarchical clustering; Stocktwits; Text mining; Big data; TWITTER; FRAMEWORK; INFORMATION; EXTRACTION; BURSTY; MODEL;
D O I
10.7717/peerj-cs.438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d(-1) Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones' Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor's 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.
引用
收藏
页数:39
相关论文
共 50 条
  • [21] Modeling dynamics of cryptocurrency XRP using tweets, news, and GMDH-based algorithms
    Mogilev, Pavel
    Kusegenov, Dinislam
    Alexandrov, Mikhail
    Cardiff, John
    Koshulko, Olexiy
    2022 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT), 2022, : 559 - 563
  • [22] Abnormal event detection from surveillance video by dynamic hierarchical clustering
    Jiang, Fan
    Wu, Ying
    Katsaggelos, Aggelos K.
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 2397 - 2400
  • [23] Twitter Event Photo Detection Using both Geotagged Tweets and Non-geotagged Photo Tweets
    Takamu, Kaneko
    Hang, Nga Do
    Yanai, Keiji
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT II, 2015, 9315 : 128 - 138
  • [24] Graph-Based Clustering Approach for Economic and Financial Event Detection Using News Analytics Data
    Sidorov, Sergei P.
    Faizliev, Alexey R.
    Levshunov, Michael
    Chekmareva, Alfia
    Gudkov, Alexander
    Korobov, Eugene
    SOCIAL INFORMATICS (SOCINFO 2018), PT II, 2018, 11186 : 271 - 280
  • [25] Fuzzy time series prediction using hierarchical clustering algorithms
    Bang, Young-Keun
    Lee, Chul-Heui
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) : 4312 - 4325
  • [26] Analysis of spatial point patterns using hierarchical clustering algorithms
    Pereira, SMC
    BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2005, 71 (01) : 175 - 175
  • [27] Outlier detection using an ensemble of clustering algorithms
    Ray, Biswarup
    Ghosh, Soulib
    Ahmed, Shameem
    Sarkar, Ram
    Nasipuri, Mita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (02) : 2681 - 2709
  • [28] Outlier detection using an ensemble of clustering algorithms
    Biswarup Ray
    Soulib Ghosh
    Shameem Ahmed
    Ram Sarkar
    Mita Nasipuri
    Multimedia Tools and Applications, 2022, 81 : 2681 - 2709
  • [29] Sub-Event Detection from Tweets
    Katragadda, Satya
    Benton, Ryan
    Raghavan, Vijay
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2128 - 2135
  • [30] Hierarchical news topic detection using improved LSH
    Lu, Mei-Lian, 1600, Beijing University of Posts and Telecommunications (37):