hwloc (supersedes former libtopology)

Portable Hardware Locality


Introduction

The democratization of multicore processors and NUMA architectures (AMD HyperTransport, Intel QPI, ...) leads to the spreading of complex hardware topologies into the whole server world. While large shared-memory machines were formerly very rare, nodaways every single cluster node may contain 8 cores, hierarchical caches, or multiple threads per core, making its topology far for flat.

Such complex and hierarchical topologies have strong impact of the application performance. The developer must take hardware affinities into account when trying to exploit the actual hardware performance. For instance, two tasks that tightly cooperate should probably rather be placed onto cores sharing a cache. However, two independent memory-intensive tasks should better be spread out onto different sockets so as to maximize their memory throughput. For instance, MPI processes and OpenMP threads have to be placed according to their affinities and to the hardware characteristics.

hwloc provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various attributes such as cache and memory information. It builds a hierarchical tree that the application may walk to retrieve information about the hardware or to bind tasks properly.

More details are available in the Documentation, as well as Examples of outputs and an Interface example. The whole documentation is also available in PDF.

Download

The hwloc project is hosted as an Open MPI sub-project here. Source code is availble from there under the new BSD licence. See also the SVN repository access page, and details about the Installation process.

Credits

hwloc is the evolution and merger of the INRIA libtopology project and Open MPI's Portable Linux Processor Affinity (PLPA) project. Because of functional and idological overlap, these two code bases and ideas were merged and released under the name "hwloc". Both are now deprecated in favor of hwloc.

Before being merged with PLPA as hwloc, libtopology was only developed by the INRIA Runtime Team-Project (headed by Raymond Namyst). hwloc is now developed in collaboration with the Open MPI community, and more.

libtopology was initially implemented inside the Marcel threading library as a way to inform the BubbleSched frame-work of hardware affinities. With the advent of multicore machines, this work became interesting for much more than multithreading. So libtopology was extracted from Marcel and became an independent library offering a portable abstraction of hierarchical architectures for high-performance computing.

Papers

  1. François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), Pisa, Italia, February 2010. IEEE Computer Society Press. Available here.
    The paper introduces hwloc, its goals and its implementation. It then shows how hwloc may be used by MPI implementations and OpenMP runtime systems as a way to carefully place processes and adapt communication strategies to the underlying hardware.

All INRIA Runtime hwloc papers are also listed here with the corresponding Bibtex entries.


Last updated on 2009/11/06.