A similarity study of I/O traces via string kernels

被引:0
|
作者
Raul Torres
Julian M. Kunkel
Manuel F. Dolz
Thomas Ludwig
机构
[1] Universität Hamburg,Department of Informatics
[2] University of Reading,Department of Computer Science
[3] Universidad Carlos III de Madrid,Department of Computer Science
来源
关键词
Kernel functions; Kast2 spectrum kernel; I/O access pattern comparison; String kernels;
D O I
暂无
中图分类号
学科分类号
摘要
Understanding I/O for data-intense applications is the foundation for the optimization of these applications. The classification of the applications according to the expressed I/O access pattern eases the analysis. An access pattern can be seen as fingerprint of an application. In this paper, we address the classification of traces. Firstly, we convert them first into a weighted string representation. Due to the fact that string objects can be easily compared using kernel methods, we explore their use for fingerprinting I/O patterns. To improve accuracy, we propose a novel string kernel function called kast2 spectrum kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using kernel principal component analysis and hierarchical clustering. The evaluation showed that two out of four I/O access pattern groups were completely identified, while the other two groups conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.
引用
收藏
页码:7814 / 7826
页数:12
相关论文
共 50 条
  • [1] A similarity study of I/O traces via string kernels
    Torres, Raul
    Kunkel, Julian M.
    Dolz, Manuel F.
    Ludwig, Thomas
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7814 - 7826
  • [2] Accelerating Legacy String Kernels via Bounded Automata Learning
    Angstadt, Kevin
    Jeannin, Jean-Baptiste
    Weimer, Westley
    TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 235 - 249
  • [3] A Study of Self-similarity in Parallel I/O Workloads
    Zou, Qiang
    Zhu, Yifeng
    Feng, Dan
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [5] String Similarity in CBR Platforms: A Preliminary Study
    Mazzucchelli, Alice
    Sartori, Fabio
    METADATA AND SEMANTICS RESEARCH, MTSR 2014, 2014, 478 : 22 - 29
  • [6] String Kernels for Polarity Classification: A Study Across Different Languages
    Gimenez-Perez, Rosa M.
    Franco-Salvador, Marc
    Rosso, Paolo
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 489 - 493
  • [7] Automatic Generation of I/O Kernels for HPC Applications
    Behzad, Babak
    Hoang-Vu Dang
    Hariri, Farah
    Zhang, Weizhe
    Snir, Marc
    2014 9TH PARALLEL DATA STORAGE WORKSHOP (PDSW), 2014, : 31 - 36
  • [8] A Principled Approach for Selecting Block I/O Traces
    Desai, Omkar
    Shin, Seungmin
    Lee, Eunji
    Kim, Bryan S.
    PROCEEDINGS OF THE 2022 14TH ACM WORKSHOP ON HOT TOPICS IN STORAGE AND FILE SYSTEMS, HOTSTORAGE 2022, 2022, : 52 - 58
  • [9] I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance
    Koller, Ricardo
    Rangaswami, Raju
    ACM TRANSACTIONS ON STORAGE, 2010, 6 (03)
  • [10] Proposal and study of statistical features for string similarity computation and classification
    Rodrigues, E. O.
    Casanova, D.
    Teixeira, M.
    Pegorini, V
    Favarim, F.
    Clua, E.
    Conci, A.
    Liatsis, Panos
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2020, 12 (03) : 277 - 307