Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

被引：11

作者：

Asaduzzaman, Abu ^{[2
]}

Sibai, Fadi N. ^{[1
]}

Rani, Manira ^{[2
]}

机构：

[1] UAE Univ, CIT, Al Ain, U Arab Emirates

[2] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2010年 / 56卷 / 4-6期

关键词：

Cache locking; Miss table; Multi-core architecture; Performance/power ratio; Timing predictability;

D O I：

10.1016/j.sysarc.2010.02.002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (11) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：151 / 162

页数：12

共 33 条

[1] Instruction-Cache Locking for Improving Embedded Systems Performance
Anand, Kapil
Barua, Rajeev
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (03)
[2] Impact of L2 Cache Locking on GPU Performance
Picchi, John
Zhang, Wei
IEEE SOUTHEASTCON 2015, 2015,
[3] Analytical Miss Rate Calculation of L2 Cache from the RD Profile of L1 Cache
Sabarimuthu, Jasmine Madonna
Venkatesh, T. G.
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (01) : 9 - 15
[4] Less Reused Filter: Improving L2 Cache Performance via Filtering Less Reused Lines
Xiang, Lingxiang
Chen, Tianzhou
Shi, Qingsong
Hu, Wei
ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 68 - 79
[5] Reducing Last Level Cache Pollution in NUMA Multicore Systems for Improving Cache Performance
An, Deukhyeon
Kim, Jeehong
Han, JungHyun
Eom, Young Ik
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III, 2012, 7335 : 272 - 282
[6] Hardware/software techniques for improving cache performance in embedded systems
Memik, G
Kandemir, MT
Choudhary, A
Kadayif, I
EMBEDDED SOFTWARE FOR SOC, 2003, : 387 - 401
[7] On Power and Performance Tradeoff of L2 Cache Compression
Jena, Chandrika
Mason, Tim
Chen, Tom
PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS), 2010, : 724 - 727
[8] The Effect of Shared L2 cache on Determinism in Airborne Embedded System
Bai, Lu
Wang, Jiansheng
Huang, Baolei
PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 697 - 700
[9] An Effective Measurement Technique of Level-2 Cache Performance for Multicore Embedded Systems
Mridh, Muhammad F.
Asaduzzaman, Abu
Saha, Aloke K.
2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,
[10] Improving the Reliability of On-chip L2 Cache Using Redundancy
Bhattacharya, K.
Kim, S.
Ranganathan, N.
2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2, 2007, : 224 - 229

← 1 2 3 4 →