Understanding Data Access Patterns for dCache System

被引:0
|
作者
Bellavita, Julian [1 ]
Sim, Caitlin [1 ]
Wu, Kesheng [2 ]
Sim, Alex [2 ]
Yoo, Shinjae [3 ]
Ito, Hiro [3 ]
Garonne, Vincent [3 ]
Lancon, Eric
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
[3] Brookhaven Natl Lab, Upton, NY USA
关键词
D O I
10.1051/epjconf/202429501053
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The storage management system dCache acts as a disk cache for high-energy physics (HEP) data from the US ATLAS community. Since its disk capacity is considerably smaller than the total volume of ATLAS data, a heuristic is needed to determine what data should be kept on disks. An effective heuristic would be to keep the data files that are expected to be heavily accessed in the near future. Through a careful study of access statistics, we find a few most popular datasets are accessed much more frequently than others, even though these popular datasets change over time. If we could predict the near-term popularity of datasets, we could pin the most popular ones in the disk cache to prevent their accidental removal and guarantee their availability. To predict a dataset popularity, we present several methods for forecasting the number of times a dataset will be accessed in the next day. Test results show that these methods could predict the next-day access counts of popular datasets reliably. This observation is confirmed with dCache logs from two separate time ranges.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Understanding Data Characteristics and Access Patterns in a Cloud Storage System
    Liu, Songbin
    Huang, Xiaomeng
    Fu, Haohuan
    Yang, Guangwen
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 327 - 334
  • [2] dCache, a distributed storage data caching system
    Ernst, M
    Fuhrmann, P
    Gasthuber, M
    Mkrtchyan, T
    Waldman, C
    PROCEEDINGS OF CHEP 2001, 2001, : 241 - 244
  • [3] dCache, storage system for the future
    Fuhrmann, Patrick
    Guelzow, Volker
    EURO-PAR 2006 PARALLEL PROCESSING, 2006, 4128 : 1106 - 1113
  • [4] VO specific data browser for dCache
    Gavrilenko, M.
    Gorbunov, I.
    Korenkov, V.
    Oleynik, D.
    Petrosyan, A.
    Shmatov, S.
    NUCLEAR ELECTRONICS & COMPUTING (NEC'2011), 2011, : 145 - 147
  • [5] Understanding Data Access Patterns Using Object-Differentiated Memory Profiling
    Pena, Antonio J.
    Balaji, Pavan
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1143 - 1146
  • [6] Unlocking data: federated identity with LSDMA and dCache
    Millar, A. P.
    Behrmann, G.
    Bernardt, C.
    Fuhrmann, P.
    Hardt, M.
    Hayrapetyan, A.
    Litvintsev, D.
    Mkrtchyan, T.
    Rossi, A.
    Schwank, K.
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [7] dCache, Sync-and-Share for Big Data
    Millar, A. P.
    Fuhrmann, P.
    Mkrtchyan, T.
    Behrmann, G.
    Bernardt, C.
    Buchholz, Q.
    Guelzow, V.
    Litvintsev, D.
    Schwank, K.
    Rossi, A.
    van der Reest, P.
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [8] Analyzing data distribution on disk pools for dCache
    Halstenberg, S.
    Jung, C.
    Ressmann, D.
    17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09), 2010, 219
  • [9] Transparent handling of small files with dCache to optimize tape access
    Schwank, Karsten
    Kruecker, Dirk
    Fuhrmann, Patrick
    Mkrtchyan, Tigran
    Millar, Paul
    Bernardt, Christian
    Litventsev, Dmitry
    Rossi, Albert
    Behrmann, Gerd
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [10] Understanding the Bibliometric Patterns of Publications in IEEE Access
    Raman, Raghu
    Singh, Prashasti
    Singh, Vivek Kumar
    Vinuesa, Ricardo
    Nedungadi, Prema
    IEEE ACCESS, 2022, 10 : 35561 - 35577