A Region-Adaptive Method for Real-Time Bursty Event Detection in Public Service Hotline

被引:0
|
作者
Mai C.-C. [1 ,2 ]
Chen Y.-T. [1 ,2 ]
Qiu X.-M. [1 ,2 ]
Liu J. [1 ,2 ]
Zhao B. [1 ,2 ]
Yuan C.-F. [1 ,2 ]
Huang Y.-H. [1 ,2 ]
机构
[1] State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
[2] Department of Computer Science and Technology, Nanjing University, Nanjing
来源
关键词
Burstiness analysis; Data mining; Event detection; Public services hotlines; Region adaptation;
D O I
10.11897/SP.J.1016.2020.02259
中图分类号
学科分类号
摘要
With the popularization of information technology, the civic public service platform has accumulated a large number of public livelihood complaint data that need to be analyzed. The traditional event detection methods do not take the regional patterns of events into consideration. Meanwhile, the GPS geographic information used by these methods is also not easy to obtain. Therefore, many studies are looking for efficient and accurate methods to deal with the problem of recognizing the region patterns of events. However, it is not efficient to use the existing event detection methods to capture the potential events in the civic public service. In this paper, we propose a real-time region-adaptive method for bursty event detection, called RAEDetection. First, the recognition of bursty words from data stream is the basis of discovering the bursty events. The traditional Kleinberg model can only find these bursty words from the static data. Therefore, we propose an improved incremental Kleinberg model to identify the bursty words from the real-time data stream. Then, after obtaining the bursty words, we propose an algorithm based on hierarchical semantic analysis for recognizing the candidate bursty events. With bursty words as clues, this algorithm finds the topic bursty events with semantic information from topics and then divides these events into more fine-grained candidate bursty events with the semantic information from the complaint records. Finally, in order to filter out the noise records in the candidate events, the event region tree is constructed to recognize the regional patterns of events. The event region tree has a three-level structure corresponding to the addresses in the city, district and street level respectively. According to the maximum entropy principle, we assume that the address distribution of one certain event obeys the discrete uniform distribution. We use KL distance to compare the distance between the statistic address distribution and the assumed address distribution. We choose the number of addresses which can minimize the value of the KL distance to indicate the geographical regions of that event, so as to realize the adaptive recognition of regional patterns. The experimental results from two real-world datasets from civic public service and one social media dataset from Twitter show that our method outperforms the state-of-the-art methods for both detection accuracy and computing performance, with good data and system scalability. In the real application scenarios, compared with algorithm TrioVecEvent, GeoBurst, and TopicSketch, the pseudo F1 values of our algorithm RAEDetection are increased by 54.85%, 221.13%, and 84.26% on average, respectively. To further explore the influences of the size of the sliding window and the threshold value of semantic similarity on our method, we carried out the relevant experiments and find that the our RAEDetection achieves the best performance when the sliding window size is set to 40 minutes and the threshold value of semantic similarity is set to 0.5 and 0.6 on the Nanjing and Suzhou dataset respectively, which has an important guiding role for the practical application of our algorithm. Finally, the proposed method has been successfully adopted and validated by the civic public service platform of Jiangsu province. © 2020, Science Press. All right reserved.
引用
收藏
页码:2259 / 2275
页数:16
相关论文
共 25 条
  • [1] Allan J, Papka R, Lavrenko V., On-line new event detection and tracking, Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval, pp. 37-45, (1998)
  • [2] Fung G P C, Yu J X, Yu P S, Et al., Parameter free bursty events detection in text streams, Proceedings of the 31st International Conference on Very Large Data Bases, pp. 181-192, (2005)
  • [3] He Q, Chang K Y, Lim E., Analyzing feature trajectories for event detection, Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 207-214, (2007)
  • [4] Li C L, Sun A, Datta A., Twevent: Segment-based event detection from tweets, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 155-164, (2012)
  • [5] Abdelhaq H, Sengstock C, Gertz M., EvenTweet: Online localized event detection from Twitter, VLDB Journal, 6, 12, pp. 1326-1329, (2013)
  • [6] Zhang Lu-Min, Jia Yan, Zhou Bin, Et al., Online bursty events detection based on emoticons, Chinese Journal of Computers, 36, 8, pp. 1659-1667, (2013)
  • [7] Feng W, Zhang C, Zhang W, Et al., STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream, Proceedings of the 31st IEEE International Conference on Data Engineering, pp. 1561-1572, (2015)
  • [8] Zhang C, Zhou G Y, Yuan Q, Et al., GeoBurst: Real-time local event detection in geo-tagged tweet streams, Proceedings of the 39th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 513-522, (2016)
  • [9] Comaniciu D, Meer P., Mean shift analysis and application, Proceedings of the 7th International Conference on Computer Vision, (1999)
  • [10] Leetaru K, Wang S W, Cao G F, Et al., Mapping the global Twitter heartbeat: The geography of Twitter, First Monday, 18, 5, pp. 1-38, (2013)