Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors

被引:4
|
作者
Li, Wenlong [1 ]
Tong, Xiaofeng [1 ]
Wang, Tao [1 ]
Zhang, Yimin [1 ]
Chen, Yen-Kuang [2 ]
机构
[1] Intel Corp, Microprocessor Technol Lab, Beijing, Peoples R China
[2] Intel Corp, Microprocessor Technol Lab, Corp Technol Grp, Santa Clara, CA USA
关键词
Media mining; Parallelization; Performance analysis; Multi-core processor; VIDEO;
D O I
10.1007/s11265-008-0320-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies how to parallelize the emerging media mining workloads on existing small-scale multi-core processors and future large-scale platforms. Media mining is an emerging technology to extract meaningful knowledge from large amounts of multimedia data, aiming at helping end users search, browse, and manage multimedia data. Many of the media mining applications are very complicated and require a huge amount of computing power. The advent of multi-core architectures provides the acceleration opportunity for media mining. However, to efficiently utilize the multi-core processors, we must effectively execute many threads at the same time. In this paper, we present how to explore the multi-core processors to speed up the computation-intensive media mining applications. We first parallelize two media mining applications by extracting the coarse-grained parallelism and evaluate their parallel speedups on a small-scale multi-core system. Our experiment shows that the coarse-grained parallelization achieves good scaling performance, but not perfect. When examining the memory requirements, we find that these coarse-grained parallelized workloads expose high memory demand. Their working set sizes increase almost linearly with the degree of parallelism, and the instantaneous memory bandwidth usage prevents them from perfect scalability on the 8-core machine. To avoid the memory bandwidth bottleneck, we turn to exploit the fine-grained parallelism and evaluate the parallel performance on the 8-core machine and a simulated 64-core processor. Experimental data show that the fine-grained parallelization demonstrates much lower memory requirements than the coarse-grained one, but exhibits significant read-write data sharing behavior. Therefore, the expensive inter-thread communication limits the parallel speedup on the 8-core machine, while excellent speedup is observed on the large-scale processor as fast core-to-core communication is provided via a shared cache. Our study suggests that (1) extracting the coarse-grained parallelism scales well on small-scale platforms, but poorly on large-scale system; (2) exploiting the fine-grained parallelism is suitable to realize the power of large-scale platforms; (3) future many-core chips can provide shared cache and sufficient on-chip interconnect bandwidth to enable efficient inter-core communication for applications with significant amounts of shared data. In short, this work demonstrates proper parallelization techniques are critical to the performance of multi-core processors. We also demonstrate that one of the important factors in parallelization is the performance analysis. The parallelization principles, practice, and performance analysis methodology presented in this paper are also useful for everyone to exploit the thread-level parallelism in their applications.
引用
收藏
页码:213 / 228
页数:16
相关论文
共 50 条
  • [1] Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors
    Wenlong Li
    Xiaofeng Tong
    Tao Wang
    Yimin Zhang
    Yen-Kuang Chen
    [J]. Journal of Signal Processing Systems, 2009, 57 : 213 - 228
  • [2] PARALLELIZATION OF ADABOOST ALGORITHM ON MULTI-CORE PROCESSORS
    Chen, Yen-Kuang
    Li, Wenlong
    Tong, Xiaofeng
    [J]. 2008 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: SIPS 2008, PROCEEDINGS, 2008, : 275 - 280
  • [3] Scalable Parallelization of Skyline Computation for Multi-core Processors
    Chester, Sean
    Sidlauskas, Darius
    Assent, Ira
    Bogh, Kenneth S.
    [J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1083 - 1094
  • [4] Parallelization of an Evolutionary Algorithm on a Platform with Multi-core Processors
    Tsutsui, Shigeyoshi
    [J]. ARTIFICIAL EVOLUTION, 2010, 5975 : 61 - 73
  • [5] On Investigation of Parallelization Effectiveness with the Help of Multi-core Processors
    Raba, Nikita
    Stankova, Elena
    Ampilova, Natalya
    [J]. ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 2757 - 2762
  • [6] Parallelization of Kvazaar HEVC Intra Encoder for Multi-core Processors
    Koivula, Ari
    Viitanen, Marko
    Vanne, Jarno
    Hamalainen, Timo D.
    Fasnacht, Laurent
    [J]. 2015 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2015), 2015,
  • [7] Parallelization of K-Means Clustering on Multi-Core Processors
    Kerdprasop, Kittisak
    Kerdprasop, Nittaya
    [J]. SELECTED TOPICS IN APPLIED COMPUTER SCIENCE, 2010, : 472 - +
  • [8] Parallelization of Spectral Clustering Algorithm on Multi-core Processors and GPGPU
    Zheng, Jing
    Chen, Wenguang
    Chen, Yurong
    Zhang, Yimin
    Zhao, Ying
    Zheng, Weimin
    [J]. 2008 13TH ASIA-PACIFIC COMPUTER SYSTEMS ARCHITECTURE CONFERENCE, 2008, : 253 - +
  • [9] Parallelization of group-based skyline computation for multi-core processors
    Zhu, Haoyang
    Zhu, Peidong
    Li, Xiaoyong
    Liu, Qiang
    Xun, Peng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (18):
  • [10] Design and Implementation of High Performance IPSec Applications with Multi-core Processors
    Liu, Yizhen
    Xu, Daxiong
    Song, Wuying
    Mu, Zhixin
    [J]. 2008 INTERNATIONAL SEMINAR ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, PROCEEDINGS, 2008, : 595 - +