Stream-based live entity resolution approach with adaptive duplicate count strategy

被引:6
|
作者
Ma, Kun [1 ]
Yang, Bo [1 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Shandong, Peoples R China
基金
奥地利科学基金会;
关键词
big data; cloud computing; entity resolution; MapReduce; NoSQL; sorted neighbourhood; stream processing; RECORD; CACHE;
D O I
10.1504/IJWGS.2017.10006055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, researchers have been more concerned about large-scale news and tweet data generated by the social media. Some cloud service providers utilise the data to find public sentiments for the tenants. The challenge is how to clean the big data in the cloud before making further analysis. To address this issue, we propose a new live entity resolution approach at a time to find duplicates from the news and tweet data. We investigate possible solutions to address live entity resolution in the cloud, to make sliding window size adaptive using multistep distance and window size dependent duplicate count strategy with alterable window step, and find duplicates by overlapping boundary objects in adjacent blocks. Finally, our experimental evaluation based on the news data on large datasets shows the high effectiveness and efficiency of the proposed approaches.
引用
收藏
页码:351 / 373
页数:23
相关论文
共 50 条
  • [1] Stream-based live public opinion monitoring approach with adaptive probabilistic topic model
    Ma, Kun
    Yu, Ziqiang
    Ji, Ke
    Yang, Bo
    SOFT COMPUTING, 2019, 23 (16) : 7451 - 7470
  • [2] Optimization of stream-based live data migration strategy in the cloud
    Ma, Kun
    Yang, Bo
    Yu, Ziqiang
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
  • [3] Stream-based live public opinion monitoring approach with adaptive probabilistic topic model
    Kun Ma
    Ziqiang Yu
    Ke Ji
    Bo Yang
    Soft Computing, 2019, 23 : 7451 - 7470
  • [4] Stream-based live data replication approach of in-memory cache
    Ma, Kun
    Yang, Bo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (11):
  • [5] Adaptive Optimizations for Stream-based Workflows
    Liang, Liang
    Filguiera, Rosa
    Yan, Yan
    PROCEEDINGS OF 15TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS), 2020, : 33 - 40
  • [6] Adaptive Stream-based Entropy Coding
    Yamagiwa, Shinichi
    Hayakawa, Eisaku
    Marumo, Koichi
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 403 - 403
  • [7] Universal Adaptive Stream-Based Entropy Coding
    Yamagiwa, Shinichi
    Kato, Taiki
    IEEE ACCESS, 2024, 12 : 98768 - 98786
  • [8] LPSMon: A Stream-Based Live Public Sentiment Monitoring System
    Ma, Kun
    Tang, Zijie
    Zhong, Jialin
    Yang, Bo
    Web-Age Information Management, Pt II, 2016, 9659 : 534 - 536
  • [9] Timely Semantics: A Study of a Stream-Based Ranking System for Entity Relationships
    Fischer, Lorenz
    Blanco, Roi
    Mika, Peter
    Bernstein, Abraham
    SEMANTIC WEB - ISWC 2015, PT II, 2015, 9367 : 429 - 445
  • [10] Adaptive Graphical Approach to Entity Resolution
    Chen, Zhaoqi
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, : 204 - 213