The democratization of multicore processors and NUMA architectures (AMD HyperTransport, Intel QPI, ...) leads to the spreading of complex hardware topologies into the whole server world. While large shared-memory machines were formerly very rare, nodaways every single cluster node may contain 8 cores, hierarchical caches, or multiple threads per core, making its topology far for flat.
Such complex and hierarchical topologies have strong impact of the application performance. The developer must take hardware affinities into account when trying to exploit the actual hardware performance. For instance, two tasks that tightly cooperate should probably rather be placed onto cores sharing a cache. However, two independent memory-intensive tasks should better be spread out onto different sockets so as to maximize their memory throughput. For instance, MPI processes and OpenMP threads have to be placed according to their affinities and to the hardware characteristics.
hwloc provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various attributes such as cache and memory information. It builds a hierarchical tree that the application may walk to retrieve information about the hardware or to bind tasks properly.
More details are available in the Documentation, as well as Examples of outputs and an Interface example. The whole documentation is also available in PDF.
The hwloc project is hosted as an Open MPI sub-project here. Source code is availble from there under the new BSD licence. See also the SVN repository access page, and details about the Installation process.
hwloc is the evolution and merger of the INRIA libtopology project and Open MPI's Portable Linux Processor Affinity (PLPA) project. Because of functional and idological overlap, these two code bases and ideas were merged and released under the name "hwloc". Both are now deprecated in favor of hwloc.
Before being merged with PLPA as hwloc, libtopology was only developed by the INRIA Runtime Team-Project (headed by Raymond Namyst). hwloc is now developed in collaboration with the Open MPI community, and more.
libtopology was initially implemented inside the Marcel threading library as a way to inform the BubbleSched frame-work of hardware affinities. With the advent of multicore machines, this work became interesting for much more than multithreading. So libtopology was extracted from Marcel and became an independent library offering a portable abstraction of hierarchical architectures for high-performance computing.
All INRIA Runtime hwloc papers are also listed here with the corresponding Bibtex entries.
Last updated on 2009/11/06.