A Content-Based Approach for Modeling Analytics Operators

被引:1
|
作者
Giannakopoulos, Ioannis [1 ]
Tsoumakos, Dimitrios [2 ]
Koziris, Nectarios [1 ]
机构
[1] Natl Tech Univ Athens, Comp Syst Lab, Sch ECE, Athens, Greece
[2] Ionian Univ, Dept Informat, Corfu, Greece
关键词
D O I
10.1145/3269206.3271731
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The plethora of publicly available data sources has given birth to a wealth of new needs and opportunities. The ever increasing amount of data has shifted the analysts' attention from optimizing the operators for specific business cases, to focusing on datasets per se, selecting the ones that are most suitable for specific operators, i.e., they make an operator produce a specific output. Yet, predicting the output of a given operator executed for different input datasets is not an easy task: It entails executing the operator for all of them, something that requires excessive computational power and time. To tackle this challenge, we propose a novel dataset profiling methodology that infers an operator's outcome based on examining the similarity of the available input datasets in specific attributes. Our methodology quantifies dataset similarities and projects them into a low-dimensional space. The operator is then executed for a mere subset of the available datasets and its output for the rest of them is approximated using Neural Networks trained using this space as input. Our experimental evaluation thoroughly examines the performance of our scheme using both synthetic and real-world datasets, indicating that the suggested approach is capable of predicting an operator's output with high accuracy. Moreover, it massively accelerates operator profiling in comparison to approaches that require an exhaustive operator execution, rendering our work ideal for cases where a multitude of operators need to be executed to a set of given datasets.
引用
收藏
页码:227 / 236
页数:10
相关论文
共 50 条
  • [1] Content-Based Multimedia Analytics: US and NATO Research
    Bowman, Elizabeth K.
    Burghouts, Gertj An
    Overher, Lasse
    Kase, Sue E.
    Zimmerman, Randal J.
    Oggero, Serena
    [J]. NEXT-GENERATION ANALYST VI, 2018, 10653
  • [2] Content-based Analytics: Moving Beyond Data Size
    Tsoumakos, Dimitrios
    Giannakopoulos, Ioannis
    [J]. 2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2020), 2020, : 33 - 40
  • [3] A Hybrid Feature Modeling Approach for Content-Based Medical Image Retrieval
    Karthik, K.
    Kamath, Sowmya S.
    [J]. 2018 IEEE 13TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (IEEE ICIIS), 2018, : 20 - 25
  • [4] Lucretian Symmetry and the Content-Based Approach
    Huiyuhl Yi
    [J]. Philosophia, 2022, 50 : 815 - 831
  • [5] A novel content-based recommendation approach based on LDA topic modeling for literature recommendation
    Bagul, Dhiraj Vaibhav
    Barve, Sunita
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 954 - 961
  • [6] An approach to content-based video retrieval
    Lee, AJT
    Hong, RW
    Chang, MF
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 273 - 276
  • [7] A Content-Based Approach to Profile Expansion
    Fernandez, Diego
    Formoso, Vreixo
    Cacheda, Fidel
    Carneiro, Victor
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2020, 28 (06) : 981 - 1002
  • [8] Lucretian Symmetry and the Content-Based Approach
    Yi, Huiyuhl
    [J]. PHILOSOPHIA, 2022, 50 (02) : 815 - 831
  • [9] A Visual Analytics Approach Using the Exploration of Multidimensional Feature Spaces for Content-Based Medical Image Retrieval
    Kumar, Ashnil
    Nette, Falk
    Klein, Karsten
    Fulham, Michael
    Kim, Jinman
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2015, 19 (05) : 1734 - 1746
  • [10] Content-based Modeling and Prediction of Information Dissemination
    Macropol, Kathy
    Singh, Ambuj
    [J]. 2011 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2011), 2011, : 21 - 28