Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

被引:0
|
作者
Andrés Ortiz [1 ]
Julio Ortega [2 ]
Antonio F.Díaz [2 ]
Alberto Prieto [2 ]
机构
[1] Department of Communications Engineering, University of Málaga
[2] Department of Computer Architecture and Technology/CITIC, University of Granada
关键词
interrupt affinity; processor affinity; network interface; offloading; SIMICS;
D O I
暂无
中图分类号
TP393.0 [一般性问题];
学科分类号
081201 ; 1201 ;
摘要
Improving the network interface performance is needed by the demand of applications with high communication requirements (for example, some multimedia, real-time, and high-performance computing applications), and the availability of network links providing multiple gigabits per second bandwidths that could require many processor cycles for communication tasks. Multicore architectures, the current trend in the microprocessor development to cope with the difficulties to further increase clock frequencies and microarchitecture efficiencies, provide new opportunities to exploit the parallelism available in the nodes for designing efficient communication architectures. Nevertheless, although present OS network stacks include multiple threads that make it possible to execute network tasks concurrently in the kernel, the implementations of packet-based or connection-based parallelism are not trivial as they have to take into account issues related with the cost of synchronization in the access to shared resources and the efficient use of caches. Therefore, a common trend in many recent researches on this topic is to assign network interrupts and the corresponding protocol and network application processing to the same core, as with this affinity scheduling it would be possible to reduce the contention for shared resources and the cache misses. In this paper we propose and analyze several configurations to distribute the network interface among the different cores available in the server. These alternatives have been devised according to the affinity of the corresponding communication tasks with the location (proximity to the memories where the different data structures are stored) and characteristics of the processing core. As this approach uses several cores to accelerate the communication path of a given connection, it can be seen as complementary to those that consider several cores to simultaneously process packets belonging to either the same or different connections. Message passing interface (MPI) workloads and dynamic web servers have been considered as applications to evaluate and compare the communication performance of these alternatives. In our experiments, performed by full-system simulation, improvements of up to 35% in the throughput and up to 23% in the latency have been observed in MPI workloads, and up to 100% in the throughput, up to 500% in the response time, and up to 82% in the requests attended per second have been measured in dynamic web servers.
引用
收藏
页码:508 / 524
页数:17
相关论文
共 50 条
  • [31] Network interfaces for programmable NICs and multicore platforms
    Ortiz, Andres
    Ortega, Julio
    Diaz, Antonio F.
    Prieto, Alberto
    COMPUTER NETWORKS, 2010, 54 (03) : 357 - 376
  • [32] An Adaptive Energy-Efficient Stream Decoding System for Cloud Multimedia Network on Multicore Architectures
    Lai, Chin-Feng
    Lai, Ying-Xun
    Wang, Ming-Shi
    Niu, Jian-Wei
    IEEE SYSTEMS JOURNAL, 2014, 8 (01): : 194 - 201
  • [33] Developing Efficient Discrete Simulations on Multicore and GPU Architectures
    Cagigas-Muniz, Daniel
    Diaz-del-Rio, Fernando
    Ramon Lopez-Torres, Manuel
    Jimenez-Morales, Francisco
    Luis Guisado, Jose
    ELECTRONICS, 2020, 9 (01)
  • [34] Towards Efficient Execution of Erasure Codes on Multicore Architectures
    Wyrzykowski, Roman
    Kuczynski, Lukasz
    Wozniak, Marcin
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 357 - 367
  • [35] Efficient Wavelet Tree Construction and Querying for Multicore Architectures
    Fuentes-Sepulveda, Jose
    Elejalde, Erick
    Ferres, Leo
    Seco, Diego
    EXPERIMENTAL ALGORITHMS, SEA 2014, 2014, 8504 : 150 - 161
  • [36] Efficient Directed Test Generation for Validation of Multicore Architectures
    Qin, Xiaoke
    Mishra, Prabhat
    2011 12TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2011, : 276 - 283
  • [37] EFFICIENT PARALLEL NONNEGATIVE LEAST SQUARES ON MULTICORE ARCHITECTURES
    Luo, Yuancheng
    Duraiswami, Ramani
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2848 - 2863
  • [38] Resource efficient finite element computing on multicore architectures
    Kopysov, S. P.
    Kadyrov, I. R.
    Novikov, A. K.
    IZVESTIYA INSTITUTA MATEMATIKI I INFORMATIKI-UDMURTSKOGO GOSUDARSTVENNOGO UNIVERSITETA, 2019, 53 : 83 - 97
  • [39] Power Consumption of Parallel Programming Interfaces in Multicore Architectures: A Case Study
    Garcia, Adriano Marques
    Schepke, Claudio
    Girardi, Alessandro Goncalves
    da Silva, Sherlon Almeida
    2018 SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (WSCAD 2018), 2018, : 77 - 83
  • [40] Local Affinity-Based Color Propagation of Images
    Ji, Yinghui
    Peng, Hongjing
    2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 959 - 963