Software releases are announced and available on the Open MPI subproject page here.
This page is the original research-side of the Hardware Locality (hwloc) software development. Several other research teams are now also working on hwloc-related topics that may not be listed below. The hwloc software and documentation is available for download as an Open MPI subproject here.
The democratization of multicore processors and NUMA architectures (AMD HyperTransport, Intel QPI, ...) leads to the spreading of complex hardware topologies into the whole server world. While large shared-memory machines were formerly very rare, nodaways every single cluster node may contain 12 cores, hierarchical caches, or multiple threads per core, making its topology far from flat.
Such complex and hierarchical topologies have strong impact of the application performance. The developer must take hardware affinities into account when trying to exploit the actual hardware performance. For instance, two tasks that tightly cooperate should probably rather be placed onto cores sharing a cache. However, two independent memory-intensive tasks should better be spread out onto different sockets so as to maximize their memory throughput. For instance, MPI processes and OpenMP threads have to be placed according to their affinities and to the hardware characteristics.
hwloc provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various attributes such as cache and memory information. It builds a hierarchical tree that the application may walk to retrieve information about the hardware or to bind tasks properly.
hwloc also offers affinity information about I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It enables better I/O data transfer thanks to processes and data being properly placed on the host part that is closer to devices.
hwloc is the evolution and merger of the INRIA libtopology project and Open MPI's Portable Linux Processor Affinity (PLPA) project. libtopology was developed by the INRIA Runtime Team-Project (headed by Raymond Namyst). hwloc is now developed in collaboration with the Open MPI community, and more.
libtopology was initially implemented inside the Marcel threading library as a way to inform the BubbleSched frame-work of hardware affinities. With the advent of multicore machines, this work became interesting for much more than multithreading. So libtopology was extracted from Marcel and became an independent library offering a portable abstraction of hierarchical architectures for high-performance computing.
All INRIA Runtime hwloc papers are also listed here with the corresponding Bibtex entries.
Last updated on 2012/10/02.