PadicoTM: A High-performance Communication Framework for Grids

Runtime

LaBRI, INRIA Bordeaux - Sud-Ouest

High Performance Runtime Systems for Parallel Architectures

Download

Powered By GForge Collaborative Development Environment

PadicoTM overview

[Logo Padico]

PadicoTM is the runtime infrastructure for the Padico software environment for computational grids developed in collaboration between the RUNTIME and the PARIS research teams.

PadicoTM is composed of a core which provides a high-performance framework for networking and multi-threading, and services plugged into the core. High-performance communications and threads are obtained thanks to Marcel and Madeleine, provided by the PM2 software suite. The PadicoTM core aims at making the different services running at the same time run in a cooperative way rather than competitive. [big picture PadicoTM]

PadicoTM-enabled middleware

PadicoTM exhibits standard interface (VIO: virtual sockets; Circuit: Madeleine-like API; etc.) usable by various middleware systems. Thanks to symbol interception by PadicoTM, middleware is unmodified and utilizes PadicoTM communication methods seamlessly. The middleware systems available over PadicoTM are:

  • CORBA implementations: omniORB and Mico. omniORB is turned by PadicoTM into a high-bandwidth low-latency CORBA implementation;
  • MPI: a high-performance MPI implementation derived from MPICH (ANL, USA) has been ported over PadicoTM (actually, a special flavor of our own MPICH/Madeleine). An experimental version of GridMPI (Grid Technology Research Center, AIST, Japan) is currently worked on;
  • a Java Virtual Machine based on Kaffe;
  • the gSOAP SOAP/Web services development toolkit;
  • an implementation of the JXTA P2P specifications: JXTA-C.

These middleware systems used the PadicoTM core, thus they: take benefit from high-performance networks (Myrinet, Infiniband, SCI) where it is available; use high-performance Marcel multi-threading system; share their access to the network, without lowering the performance; are usable at the same time, in the same process; are dynamically loadable and unloadable.

PadicoTM internals

PadicoTM core

Basically, every piece of code in PadicoTM is embedded into a module, namely a binary object (one or more ``.so'') and a description file (written in XML). Modules may be loaded, run and unloaded dynamically, on one node, on all nodes, or on a group of nodes. The core itself is composed of three modules: Puk (a nickname for Padico micro-kernel) is the foundation module. Its task is to manage modules (loading, running and unloading). The ThreadManager manages multi-threading in a coherent way; it provides hooks for periodic operations, and manages queues of I/O operations so that they do not block the whole process. The third module of PadicoTM core is called NetAccess. It multiplexes the network accesses so that several modules can use networks that require exclusive access otherwise. Hence different middlewares (CORBA, MPI,...) can efficiently share the same process and the same network without disturbing each other.

Abstraction layer: dynamically assembled communication stack

On top of the PadicoTM core, the abstraction layer of PadicoTM is built with freely and dynamically assembled components. Various communications methods are embedded in components that the user may assemble to get the needed communication stack. The available communication methods are:

  • firewall crossing: TCP splicing (simultaneous connect), SSH tunnel, routing through gateways;
  • compression: LZO, ZIP, BZIP2;
  • encryption/authentication: TLS;
  • network drivers: Infiniband, Madeleine (Myrinet, Quadrics, SCI), interoperable TCP/IP;
  • API: VIO (virtual sockets over PadicoTM), FM over PadicoTM, Madeleine over PadicoTM.

The NetSelector configurator

The assembly process is driven by a configurable selector called NetSelector. Several implementations of NetSelector exist in PadicoTM. The main ones are:

  • the best-effort selector which automatically choses between TCP/IP for inter-cluster communication, Madeleine for intra-cluster, and shared memory for intra-node inter-process communication.
  • the basic selector lets the user configure precisely what assembly use in what context. No automatic decision is made. The user precisely defines the component assembly and parameters to use.
  • the fallback selector tries first the basic selector, then falls back to the best-effort selector if the basic selector has no reply.

The NetSelector configurator is a GUI designed for configuring the basic NetSelector. It lets the user graphically define its component assembly. However, it is more than just an assembly tool for software components. A configuration contains target cases which defines when to use a given assembly, e.g. we can define to use a given assembly between to given sets of machines.

Some screenshot of the NetSelector configurator follow. On the left, screenshot of the topology panel describing the target topology; on the right, an exemple of an assembly in the process of being constructed.

   

PadicoTM performance

[Logo Grid 5000] We have benchmarked PadicoTM over various networks. The following figures reports the latency and bandwidth measured for various PadicoTM API over Infiniband, Myrinet-10G, Myrinet-2000, and SCI.

These experiments were performed mostly on the Grid 5000 experimental platform.

       
Infiniband benchmark was done on AMD Opteron equiped with Mellanox MT25408 (ConnectX) cards (PCIe) and OpenFabrics IB verbs (OFED 1.0).
 
       
Myrinet-10G benchmark was done on AMD Opteron equiped with Myri-10G PCI-E 8x cards and MX 1.2 drivers.
 
       
Myrinet-2000 benchmark was done on AMD Opteron equiped with Myrinet-2000 PCI-X cards ('D' card) and MX 1.1 drivers.
 
       
SCI benchmark was done on Intel Xeon equiped with Dolphin D337 SCI cards and SiSCI 1.10 drivers.

Publication about PadicoTM

Contact

For any question regarding PadicoTM, please contact Alexandre DENIS.
alexandre.denis@inria.fr