A Content-Based Approach for Modeling Analytics Operators

被引:1
|
作者
Giannakopoulos, Ioannis [1 ]
Tsoumakos, Dimitrios [2 ]
Koziris, Nectarios [1 ]
机构
[1] Natl Tech Univ Athens, Comp Syst Lab, Sch ECE, Athens, Greece
[2] Ionian Univ, Dept Informat, Corfu, Greece
关键词
D O I
10.1145/3269206.3271731
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The plethora of publicly available data sources has given birth to a wealth of new needs and opportunities. The ever increasing amount of data has shifted the analysts' attention from optimizing the operators for specific business cases, to focusing on datasets per se, selecting the ones that are most suitable for specific operators, i.e., they make an operator produce a specific output. Yet, predicting the output of a given operator executed for different input datasets is not an easy task: It entails executing the operator for all of them, something that requires excessive computational power and time. To tackle this challenge, we propose a novel dataset profiling methodology that infers an operator's outcome based on examining the similarity of the available input datasets in specific attributes. Our methodology quantifies dataset similarities and projects them into a low-dimensional space. The operator is then executed for a mere subset of the available datasets and its output for the rest of them is approximated using Neural Networks trained using this space as input. Our experimental evaluation thoroughly examines the performance of our scheme using both synthetic and real-world datasets, indicating that the suggested approach is capable of predicting an operator's output with high accuracy. Moreover, it massively accelerates operator profiling in comparison to approaches that require an exhaustive operator execution, rendering our work ideal for cases where a multitude of operators need to be executed to a set of given datasets.
引用
收藏
页码:227 / 236
页数:10
相关论文
共 50 条
  • [21] A combinatorial approach to content-based music selection
    Pachet, F
    Roy, P
    Cazaly, D
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 457 - 462
  • [22] ESP教学与Content-based Approach
    瞿云华
    [J]. 浙江大学学报(人文社会科学版), 1998, (03) : 145 - 149
  • [23] An approach to a content-based retrieval of multimedia data
    Amato, G
    Mainetto, G
    Savino, P
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 1998, 7 (1-2) : 9 - 36
  • [24] A Counterbalanced Approach to Content-based Language Teaching
    Roy LYSTER
    [J]. 外语与外语教学, 2013, (05) : 5 - 9
  • [25] A hierarchical approach to content-based image retrieval
    You, J
    Cheung, KH
    Liu, J
    [J]. CISST'03: PROCEEDING OF THE INTERNATIONAL CONFERENCE ON IMAGING SCIENCE, SYSTEMS AND TECHNOLOGY, VOLS 1 AND 2, 2003, : 127 - 133
  • [26] Handbook of Family Theories: A Content-Based Approach
    Adamsons, Kari
    [J]. JOURNAL OF FAMILY THEORY & REVIEW, 2015, 7 (04) : 520 - 522
  • [27] A semantic approach for content-based flash retrieval
    Feng, B
    Li, Q
    Yang, J
    Ding, DW
    Liu, WY
    [J]. PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 1290 - 1294
  • [28] A Content-based Approach for Document Representation and Retrieval
    Rinaldi, Antonio M.
    [J]. DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 106 - 109
  • [29] A symbolic approach for content-based information filtering
    Bezerra, BLD
    de Carvalho, FD
    [J]. INFORMATION PROCESSING LETTERS, 2004, 92 (01) : 45 - 52
  • [30] A DLSI approach for content-based image classification
    Nilufar, S
    Chen, L
    Kwan, HK
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MEASUREMENT SYSTEMS AND APPLICATIONS, 2004, : 138 - 143