The availability of multicore processors and programmable NICs, such as TOEs (TCP/IP Off-loading Engines), provides new opportunities for designing efficient network interfaces to cope with the gap between the improvement rates of link bandwidths and microprocessor performance. This gap poses important challenges related with the high computational requirements associated to the traffic volumes and wider functionality that the network interface has to support. This way, taking into account the rate of link bandwidth improvement and the ever changing and increasing application demands, efficient network interface architectures require scalability and flexibility. An opportunity to reach these goals comes from the exploitation of the parallelism in the communication path by distributing the protocol processing work across processors which are available in the computer, i.e. multicore microprocessors and programmable NICs. Thus, after a brief review of the different solutions that have been previously proposed for speeding up network interfaces, this paper analyzes the onloading and offloading alternatives. Both strategies try to release host CPU cycles by taking advantage of the communication workload execution in other processors present in the node. Nevertheless, whereas onloading uses another general-purpose processor, either included in a chip multiprocessor (CMP) or in a symmetric multiprocessor (SMP), offloading takes advantage of processors in programmable network interface cards (NICs). From our experiments, implemented by using a full-system simulator, we provide a fair and more complete comparison between onloading and offloading. Thus, it is shown that the relative improvement on peak throughput offered by offloading and onloading depends on the rate of application workload to communication overhead, the message sizes, and on the characteristics of the system architecture, more specifically the bandwidth of the buses and the way the NIC is connected to the system processor and memory. In our implementations, offloading provides lower latencies than onloading, although the CPU utilization and interrupts are lower for onloading. Taking into account the conclusions of our experimental results, we propose a hybrid network interface that can take advantage of both, programmable NICs and multicore processors. (C) 2009 Elsevier B.V. All rights reserved.