Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

被引:11
|
作者
Asaduzzaman, Abu [2 ]
Sibai, Fadi N. [1 ]
Rani, Manira [2 ]
机构
[1] UAE Univ, CIT, Al Ain, U Arab Emirates
[2] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
关键词
Cache locking; Miss table; Multi-core architecture; Performance/power ratio; Timing predictability;
D O I
10.1016/j.sysarc.2010.02.002
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (11) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:151 / 162
页数:12
相关论文
共 33 条
  • [1] Instruction-Cache Locking for Improving Embedded Systems Performance
    Anand, Kapil
    Barua, Rajeev
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (03)
  • [2] Impact of L2 Cache Locking on GPU Performance
    Picchi, John
    Zhang, Wei
    IEEE SOUTHEASTCON 2015, 2015,
  • [3] Analytical Miss Rate Calculation of L2 Cache from the RD Profile of L1 Cache
    Sabarimuthu, Jasmine Madonna
    Venkatesh, T. G.
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (01) : 9 - 15
  • [4] Less Reused Filter: Improving L2 Cache Performance via Filtering Less Reused Lines
    Xiang, Lingxiang
    Chen, Tianzhou
    Shi, Qingsong
    Hu, Wei
    ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 68 - 79
  • [5] Reducing Last Level Cache Pollution in NUMA Multicore Systems for Improving Cache Performance
    An, Deukhyeon
    Kim, Jeehong
    Han, JungHyun
    Eom, Young Ik
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III, 2012, 7335 : 272 - 282
  • [6] Hardware/software techniques for improving cache performance in embedded systems
    Memik, G
    Kandemir, MT
    Choudhary, A
    Kadayif, I
    EMBEDDED SOFTWARE FOR SOC, 2003, : 387 - 401
  • [7] On Power and Performance Tradeoff of L2 Cache Compression
    Jena, Chandrika
    Mason, Tim
    Chen, Tom
    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS), 2010, : 724 - 727
  • [8] The Effect of Shared L2 cache on Determinism in Airborne Embedded System
    Bai, Lu
    Wang, Jiansheng
    Huang, Baolei
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 697 - 700
  • [9] An Effective Measurement Technique of Level-2 Cache Performance for Multicore Embedded Systems
    Mridh, Muhammad F.
    Asaduzzaman, Abu
    Saha, Aloke K.
    2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,
  • [10] Improving the Reliability of On-chip L2 Cache Using Redundancy
    Bhattacharya, K.
    Kim, S.
    Ranganathan, N.
    2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2, 2007, : 224 - 229