Content-based Analytics: Moving Beyond Data Size

被引:0
|
作者
Tsoumakos, Dimitrios [1 ]
Giannakopoulos, Ioannis [2 ]
机构
[1] Ionian Univ, Dept Informat, Corfu, Greece
[2] NTUA, Sch Elect & Comp Engn, Comp Syst Lab, Athens, Greece
基金
欧盟地平线“2020”;
关键词
D O I
10.1109/BigDataService49289.2020.00013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efforts on Big Data technologies have been highly directed towards the amount of data a task can access or crunch. Yet, for content-driven decision making, it is not (only) about the size, but about the "right" data: The number of available datasets (a different type of volume) can reach astronomical sizes, making a thorough evaluation of each input prohibitively expensive. The problem is exacerbated as data sources regularly exhibit varying levels of uncertainty and velocity/churn. To date, there exists no efficient method to quantify the impact of numerous available datasets over different analytics tasks and workflows. This visionary work puts the spotlight on data content rather than size. It proposes a novel modeling, planning and processing research bundle that assesses data quality in terms of analytics performance. The main expected outcome is to provide efficient, continuous and intelligent management and execution of content-driven data analytics. Intelligent dataset selection can achieve massive gains on both accuracy and time required to reach a desired level of performance. This work introduces the notion of utilizing dataset similarity to infer operator behavior and, consequently, be able to build scalable, operator-agnostic performance models for Big Data tasks over different domains. We present an overview of the promising results from our initial work with numerical and graph data and respective operators. We then describe a reference architecture with specific areas of research that need to be tackled in order to provide a data-centric analytics ecosystem.
引用
收藏
页码:33 / 40
页数:8
相关论文
共 50 条
  • [41] Enhancing the transmission security of content-based hidden biometric data
    Khan, Muhammad Khurram
    Zhang, Jiashu
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 3, PROCEEDINGS, 2006, 3973 : 214 - 223
  • [42] Content-based multimedia data access in Internet video communication
    Laier, J
    Panis, S
    Cosmas, JP
    Schaefer, R
    Pearmain, AJ
    [J]. FIRST INTERNATIONAL WORKSHOP ON WIRELESS IMAGE/VIDEO COMMUNICATIONS, 1996, : 126 - 133
  • [43] Fuzzy adaptive resonance theory for content-based data retrieval
    Fard, Amin Milani
    Akbari, Hoda
    Mohammad, R.
    Akbarzadeh, T.
    [J]. 2006 Innovations in Information Technology, 2006, : 181 - 185
  • [44] Content-based singer classification on compressed domain audio data
    Tsai, Tsung-Han
    Huang, Yu-Siang
    Liu, Pei-Yun
    Chen, De-Ming
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (04) : 1489 - 1509
  • [45] A method for content-based news story classification in data mining
    Lei, Z
    Wu, LD
    Lao, SY
    Wang, C
    [J]. CONCURRENT ENGINEERING: THE WORLDWIDE ENGINEERING GRID, PROCEEDINGS, 2004, : 265 - 269
  • [46] Content-based filter queries on DNA data storage systems
    Alex El-Shaikh
    Bernhard Seeger
    [J]. Scientific Reports, 13
  • [47] Content-Based Management of Human Motion Data: Survey and Challenges
    Sedmidubsky, Jan
    Elias, Petr
    Budikova, Petra
    Zezula, Pavel
    [J]. IEEE ACCESS, 2021, 9 : 64241 - 64255
  • [48] Content-based filter queries on DNA data storage systems
    El-Shaikh, Alex
    Seeger, Bernhard
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [49] A content-based image authentication system with lossless data hiding
    Zou, DK
    Wu, CW
    Xuan, GR
    Shi, YQ
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 213 - 216
  • [50] Content-based multimedia data retrieval on cluster system environment
    Srakaew, S
    Alexandridis, N
    Piamsa-nga, P
    Blankenship, G
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 1999, 1593 : 1235 - 1241