KNEM

High-Performance Intra-Node MPI Communication


News: KNEM 0.8.0 has been released (2010/03/03). It brings small improvements all over the place.

See the Download page for details about all releases.

News: The KNEM OpenMPI has been branch into trunk (2009/12/16), it will be available in OpenMPI 1.5.

Get the latest KNEM news by subscribing to the knem-announce mailing list.

Summary

KNEM is a Linux kernel module enabling high-performance intra-node MPI communication for large messages. KNEM works on all Linux kernel since 2.6.15 and offers support for asynchronous and vectorial data transfers as well as offloading memory copies on to Intel I/OAT hardware.

MPICH2 (since release 1.1.1) uses KNEM in the DMA LMT to improve large message performance within a single node. Open MPI also includes KNEM support in its SM BTL component since release 1.5. Discover how to use them here.

The programming interface is documented here.

To get the latest KNEM news, you should subscribe to the knem-announce mailing list. For discussions regarding the KNEM development, see the knem-devel mailing list.

Why?

MPI implementations usually offer a user-space double-copy based intra-node communication strategy. It's very good for small message latency, but it wastes many CPU cycles, pollutes the caches, and saturates memory busses. KNEM transfers data from one process to another through a single copy within the Linux kernel. The system call overhead (about 100ns these days) isn't good for small message latency but having a single memory copy is very good for large messages (usually starting from dozens of kilobytes).

Some vendor-specific MPI stacks (such as Myricom MX, Qlogic PSM, ...) offer similar abilities but they may only run on specific hardware interconnect while KNEM is generic (and open-source). Also, none of these competitors offers asynchronous completion models, I/OAT copy offload and/or vectorial memory buffers support as KNEM does.

Download

KNEM is freely available under the terms of the CeCILL-B licence (BSD-like).

The latest stable release is KNEM 0.8.0 Source code access and all tarballs are available from the Download page.

Bugs and questions

Bug reports and questions should be sent to the knem-devel mailing list.

Papers

  1. Stéphanie Moreaud, Brice Goglin, Dave Goodell, and Raymond Namyst. Optimizing MPI Communication within large Multicore nodes with Kernel assistance. In CAC 2010: The 10th Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2010. Atlanta, GA, April 2010. IEEE Computer Society Press. Available here.
    This paper discusses the use of kernel assistance and memory copy offload for various point-to-point and collective operations on a wide variety of modern shared-memory multicore machines up to 96 cores.
  2. Darius Buntinas, Brice Goglin, Dave Goodell, Guillaume Mercier, and Stéphanie Moreaud. Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis. In Proceedings of the 38th International Conference on Parallel Processing (ICPP-2009), Vienna, Austria, September 2009. IEEE Computer Society Press. Available here.
    This paper describes the initial design and performance of the KNEM implementation when used within MPICH2/Nemesis and compares it to a vmsplice-based implementation as well as the usual double-buffering strategy.
  3. Stéphanie Moreaud. Adaptation des communications MPI intra-noeud aux architectures multicoeurs modernes. In 19ème Rencontres Francophones du Parallélisme (RenPar'19), Toulouse, France, September 2009. Available here.
    This french paper presents KNEM and its use in MPICH2/Nemesis before looking in depth at its performance for point-to-point and collective MPI operations.
  4. Brice Goglin. High Throughput Intra-Node MPI Communication with Open-MX. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2009), Weimar, Germany, February 2009. IEEE Computer Society Press. Available here.
    The Open-MX intra-communication subsystem achieves very high throughput thanks to overlapped memory pinning and I/OAT copy offload. This paper led to the development of KNEM to provide generic MPI implementations with similar performance without requiring Open-MX.

All KNEM papers are also listed here with the corresponding Bibtex entries.

Credits

KNEM is developed by the INRIA Runtime Team-Project (headed by Raymond Namyst) in collaboration with the MPICH2 team at Argonne National Laboratory and the Open MPI community. The main developer is Brice Goglin, with contributions from Dave Goodell, Stéphanie Moreaud, Jeff Squyres, and George Bosilca.


Last updated on 2010/03/03.