MPICH-Madeleine Installer's, User's and Developer's Guide

Runtime

LaBRI, INRIA Bordeaux - Sud-Ouest

High Performance Runtime Systems for Parallel Architectures

A. Testing the Installation of PM2

This section presents a list of commands aiming to test if PM2 is fully working on your cluster. In case of problems, you can consult the PM2 web site at http://runtime.futurs.inria.fr/pm2/.

You first need to generate the flavors for PM2.

% cd $PM2_ROOT
% make clean
...
% make init
...

You can now compile and execute a sample Marcel application.

% cd marcel/examples
% export PM2_FLAVOR=marcel
% make clean
...
% make sumtime
...
&lt;<<< Generating libraries: done
    building sumtime.o
    linking sumtime
% pm2-load sumtime 1000
Sum from 1 to 1000 = 500500
time = 4.829ms

You can now compile and execute a sample Madeleine application.

% cd ../../mad3/examples/
% export PM2_FLAVOR=mad3
% make clean
% make mad_ping
...
&lt;<<< Generating libraries: done
    building mad_ping.o
    linking mad_ping
% pm2-conf localhost localhost
The current PM2 configuration contains 2 host(s) :
0 : localhost
1 : localhost
% pm2-load mad_ping
Directory /home/bordeaux/nfurmento/build/leonie/leonie/bin not found

Do you want to try to compile it by executing :
cd /home/bordeaux/nfurmento/work/pm2 ; make FLAVOR=leonie
[Y/n]              
...
                                              
    linking leonie

***************************************************
Restarting leonie --appli=mad_ping --flavor=mad3
  --net=/home/bordeaux/nfurmento/soft/pm2/leonie/examples/networks.cfg
  --d --x --p --l /home/bordeaux/nfurmento/.pm2/conf/mad3/.pm2conf.cfg
##### cict-034.toulouse.grid5000.fr
##### cict-034.toulouse.grid5000.fr
(cict-034.toulouse.grid5000.fr): My global rank is 0
(cict-034.toulouse.grid5000.fr): My global rank is 1
The configuration size is = 2
Channel: pm2
The configuration size is = 2
Channel: pm2
My local channel rank is = 0
Channel: pm2
My local channel rank is = 1
ping with = 1
pong with = 0
src|dst|size        |latency     |10^6 B/s|MB/s    |
  0   1            4       10.964    0.365    0.348
...
  0   1      2097152    33431.436   62.730   59.824
Exiting
test series completed
Exiting

A.1 Debugging the PM2 modules

The PM2 bootstrap code, leonie, is used by pm2-load and MPICH-Madeleine to launch the application on the requested processors. leonie accepts different parameters for debug purpose. It is possible to trace or log any of the modules used by PM2. The general syntax of leonie is:

% leonie [leonie parameters] configuration file [application parameters
            to be passed over to the processes]

A simple call of leonie would be:

% leonie --x --p --appli=mad_ping appli.cfg

where the option --x indicates that session processes should not be started within a new graphical console (i.e. xterm), and the option --p indicates there should be no pause following the termination of the session processes. Start leonie without these options to fully understand their behavior. The call leonie --help shows the list of all the available options.

Debug parameters allow to trace specific modules. The general format of a debug parameter is --debug:<MODULE_NAME>-<TRACE_LEVEL>. For example, the debug parameter --debug:ntbx-trace will display all the trace messages within the module ntbx either made by leonie or by the processes started by leonie (depending on the parameter is specified as a leonie parameter or as a application parameter). Note that the module must have been compiled with the option debug. More debug parameters are available, you can print the list as follows:

% leonie --x --p --appli=mad_ping appli.cfg --debug:register
(ffffffff:-99:               ) register debug name: register [default] (show=5)
(ffffffff:-99:               ) register debug name: default [default] (show=2)
(ffffffff:-99:               ) register debug name: ma [default] (show=DEFAULT (2))
(ffffffff:-99:               ) register debug name: mar-mdebug [ma] (show=DEFAULT (2))
(ffffffff:-99:               ) register debug name: marcel-init [mar-mdebug] (show=DEFAULT (2))
(ffffffff:-99:               ) register debug name: log [default] (show=DEFAULT (2))
....

% leonie --x --p --appli=mad_ping appli.cfg --debug:mar-mdebug
(ffffffff:-99:               )                  &lt;main_thread is bffefe00>
(ffffffff:-99:               ) Init running level 3 (Init scheduler) start
(ffffffff:-99:               ) Init running level 0 (Init self)
....

The leonie parameter -l indicates the output of the debug should be redirected to a file in the default temporary directory. On a typical Unix system, the name of the file will be similar to /tmp/pm2log-$USER-x.

When executing a Madeleine application, the flavor (i.e. the configuration) leonie is used by leonie itself, and the flavor mad3 is used for the application started by leonie. The following command will print the list of modules for a specific flavor:

% pm2-config --flavor=mad3 --modules
mad3 marcel tbx ntbx init

The debug parameters can be specified directly for leonie:

% leonie --x --p --debug:leonie-trace --appli=mad_ping appli.cfg 
% leonie --x --p --debug:ntbx-trace --appli=mad_ping appli.cfg 

or for the processes started by leonie:

% leonie --x --p --appli=mad_ping appli.cfg --debug:mad3-log
% leonie --x --p --appli=mad_ping appli.cfg --debug:mad3-trace
%
% leonie -l --p --appli=mad_ping appli.cfg --debug:mad3-trace
% leonie -l --p --appli=mad_ping appli.cfg --debug:mad3-log

or for both leonie and the processes started by leonie:

% leonie -l --p --debug:leonie-trace --appli=mad_ping appli.cfg --debug:mad3-trace
% leonie -l --p --debug:ntbx-trace --appli=mad_ping appli.cfg --debug:ntbx-trace

A.2 Debugging PM2 processes

When starting leonie, you can specify processes should be started under the debugger by using the option -d as in the following example:

% leonie -d --appli=mad_ping appli.cfg

This command will start the processes under the GNU debugger, each within a new graphical console. If you do not have the option of starting graphical tools, you should try the following command:

% leonie -d --x --appli=mad_ping appli.cfg

If your system allows users to create core files, this command will dump the execution of the faulty processes into a core file. You can then use the GNU debugger to examine the execution in more detail.

% pm2-which mad_ping
/home/bordeaux/nfurmento/build/mad3/examples/bin/mad_ping
% gdb /home/bordeaux/nfurmento/build/mad3/examples/bin/mad_ping ~/core.2994
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Using host libthread_db library "/lib64/libthread_db.so.1".

Core was generated by `/home/bordeaux/nfurmento/build/mad3/examples/bin/mad_ping
      --mad_leonie node-22.'.
Program terminated with signal 11, Segmentation fault.
...
#0  0x0000000000404084 in pseudo_main (_madeleine=0x5bfef0) at mad_ping.c:820
820       session       = madeleine->session;
(gdb) bt
#0  0x0000000000404084 in pseudo_main (_madeleine=0x5bfef0) at mad_ping.c:820
#1  0x000000000044fe65 in marcel_sched_internal_create (cur=0x0, new_task=0x0,
    attr=0x0, dont_schedule=0, base_stack=0)
    at /home/bordeaux/nfurmento/work/pm2/marcel/include/scheduler-marcel/marcel_sched.h:436
#2  0x0000000000000000 in ?? ()
(gdb)
...

A.3 Debugging Leonie

You might need to start leonie itself under the debugger. To do so, you need to set the environment variable LEO_DEBUG to the value 1 before starting leonie.

% export LEO_DEBUG=1
% leonie --x --p --appli=mad_ping appli.cfg
GNU gdb 6.3-debian
...

(gdb) 

The debugger then waits for some user input, you can for example set breakpoints or start the application. The file $HOME/.leo_gdb_init can be used to define a list of GDB commands to execute when starting the debugger. You can for example automatically start the execution of the application.

% echo "r" > ~/.leo_gdb_init
% leonie --x --p --appli=mad_ping appli.cfg
GNU gdb 6.3-debian
...

##### joe
##### joe
(joe): My global rank is 1
(joe): My global rank is 0
test series completed
...
Program exited normally.
(gdb) 

RETURN HOME | BACK: References

Copyright © July 2008 Team Runtime