Last Update: April 5, 2006
Abstract: This Web page is primarily a manual for the author, Tomonori Kouya, to construct a commodity PC cluster and to run parallel computation programs with MPI (Message Passing Interface) on it. So, it shows that the same things described here is only to be able to make MPI programs work: it doesn't include the greater or less tech than that. I greatly appreciate your bug reports to me on this page.
[1] Kouya, How to build a PC cluster with Vine Linux (in Japanese),
March 20, 2003
[2] Tomonori Kouya, How to build a PC cluster with Vine Linux Version 2(in Japanese), February 28, 2004
[3] Fedora Project
[4] MPI Forum
[5] MPICH
[6] LAM/MPI
The spec of introduced 4 PCs and the network structure(the name of NIS domain: "cs-pccluster3") are as follows:
The underlined parts must be inputted or confirmed.
[root@cs-room443-d01 user01]# export http_proxy=http://proxy_ip_addr:port/
[root@cs-room443-d01 user01]# yum install lam
(omitted)
lam-7.1.1-11.x86_64.rpm 100% |=========================| 112 kB 00:07
(omitted)
libaio-devel-0.3.106-2.2. 100% |=========================| 6.6 kB 00:00
(omitted)
libaio-0.3.106-2.2.x86_64 100% |=========================| 7.6 kB 00:00
(omitted)
=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
lam x86_64 2:7.1.1-11 core 3.1 M
Installing for dependencies:
libaio x86_64 0.3.106-2.2 core 19 k
libaio-devel x86_64 0.3.106-2.2 core 11 k
Transaction Summary
=============================================================================
Install 3 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 3.2 M
Is this ok [y/N]: y
Downloading Packages:
(omitted)
(1/3): libaio-devel-0.3.1 100% |=========================| 11 kB 00:00
(2/3): libaio-0.3.106-2.2 100% |=========================| 19 kB 00:00
(3/3): lam-7.1.1-11.x86_6 100% |=========================| 3.1 MB 00:47
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing: libaio ######################### [1/3]
Installing: libaio-devel ######################### [2/3]
Installing: lam ######################### [3/3]
Installed: lam.x86_64 2:7.1.1-11
Dependency Installed: libaio.x86_64 0:0.3.106-2.2 libaio-devel.x86_64 0:0.3.106-2.2
Complete!
[root@cs-room443-d01 user01]# /sbin/ldconfig -v | grep mpi <-- it is necessary if ldconfig doesn't run automarically
liblamf77mpi.so.0 -> liblamf77mpi.so.0.0.0
liblammpi++.so.0 -> liblammpi++.so.0.0.0
libmpi.so.0 -> libmpi.so.0.0.0
[root@cs-room443-d01 user01]# /sbin/chkconfig --list nfs
nfs 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@cs-room443-d01 user01]# /sbin/chkconfig nfs on
[root@cs-room443-d01 user01]# /sbin/chkconfig --list nfs
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@cs-room443-d01 user01]# /sbin/service nfs start
Starting NFS Service: [ OK ]
Starting NFS Quota: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]
[root@cs-room443-d01 user01]# cat /etc/exports
# Apr. 3, 2006 Tomonroi Kouya
/home 192.168.1.0/255.255.255.0(rw,async)
/usr/local 192.168.1.0/255.255.255.0(rw,async)
[root@cs-room443-d01 user01]# /usr/sbin/exportfs -a -v
exporting 192.168.1.0/255.255.255.0:/usr/local
exporting 192.168.1.0/255.255.255.0:/home
[root@cs-room443-d01 user01]# /usr/sbin/exportfs -v
/usr/local 192.168.1.0/255.255.255.0(rw,async,wdelay,root_squash)
/home 192.168.1.0/255.255.255.0(rw,async,wdelay,root_squash)
[root@cs-room43-d02 user01]# /sbin/chkconfig --list nfs
nfs 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@cs-room43-d02 user01]# /sbin/chkconfig nfs on
[root@cs-room43-d02 user01]# /sbin/chkconfig --list nfs
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@cs-room43-d02 user01]# cat /etc/fstab
(omitted)
cs-room443-d01-in:/usr/local /usr/local nfs rw,hard,intr 0 0
cs-room443-d01-in:/home /home nfs rw,hard,intr 0 0
[root@cs-room43-d02 user01]# mount /usr/local
[root@cs-room43-d02 user01]# mount
(omitted)
cs-room443-d01-in:/usr/local on /usr/local type nfs (rw,hard,intr,addr=192.168.1.21)
[root@cs-room443-d01 user01]# yum install ypserv
(omitted)
---> Downloading header for ypserv to pack into transaction set.
ypserv-2.19-0.x86_64.rpm 100% |=========================| 19 kB 00:00
(omitted)
=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
ypserv x86_64 2.19-0 core 140 k
(omitted)
Installing: ypserv ######################### [1/1]
Installed: ypserv.x86_64 0:2.19-0
Complete!
[root@cs-room443-d01 user01]# /sbin/chkconfig --list
(omitted)
portmap 0:off 1:off 2:off 3:on 4:on 5:on 6:off
(omitted)
ypbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
yppasswdd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
ypserv 0:off 1:off 2:off 3:off 4:off 5:off 6:off
ypxfrd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@cs-room443-d01 user01]# /sbin/chkconfig ypbind on
[root@cs-room443-d01 user01]# /sbin/chkconfig yppasswdd on
[root@cs-room443-d01 user01]# /sbin/chkconfig ypserv on
[root@cs-room443-d01 user01]# /sbin/chkconfig ypxfrd on
[root@cs-room443-d01 user01]# /sbin/chkconfig --list
ypbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
yppasswdd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
ypserv 0:off 1:off 2:on 3:on 4:on 5:on 6:off
ypxfrd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@cs-room443-d01 user01]# cat /etc/yp.conf
(omitted)
domain cs-pccluster3 server cs-room443-d01-nis
[root@cs-room443-d01 user01]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cs-room443-d01
NISDOMAIN="cs-pccluster3"
[root@cs-room443-d01 user01]# domainname cs-pccluster3
[root@cs-room443-d01 user01]# domainname
cs-pccluster3
[root@cs-room443-d01 yp]# cat /etc/nsswitch.conf
(omitted)
#passwd: files
passwd: db files nisplus nis
#shadow: files
shadow: db files nisplus nis
#group: files
group: db files nisplus nis
hosts: db files nisplus nis dns
(omitted)
[root@cs-room443-d01 yp]# cd /var/yp
[root@cs-room443-d01 yp]# /sbin/service ypserv start
Starting YP Server Service: [ OK ]
[root@cs-room443-d01 yp]# /sbin/service ypbind start
Binding to NIS domain: [ OK ]
Listening for NIS domain server
[root@cs-room443-d01 yp]# /sbin/service yppasswdd start
Starting YP password serviceF [ OK ]
[root@cs-room443-d01 yp]# /sbin/service ypxfrd start
Starting YP map server: [ OK ]
[root@cs-room443-d01 yp]# make
gmake[1]: Entering directory `/var/yp/cs-pccluster3'
Updating netid.byname...
gmake[1]: Leaving directory `/var/yp/cs-pccluster3'
[root@cs-room443-d01 yp]# ypcat passwd
user01:(omitted):500:500:User 01:/home/user01:/bin/bash
[root@cs-room443-d01 yp]# ypcat hosts
192.168.2.3 cs-room443-d03
192.168.2.4 cs-room443-d04
133.88.120.88 cs-room443-d01-out cs-room443-d01.cs.sist.ac.jp
192.168.1.22 cs-room443-d02-in
127.0.0.1 localhost localhost.localdomain
192.168.1.23 cs-room443-d03-in
127.0.0.1 localhost localhost.localdomain
192.168.1.24 cs-room443-d04-in
192.168.2.1 cs-room443-d01
192.168.2.2 cs-room443-d02
133.88.120.88 cs-room443-d01-out cs-room443-d01.cs.sist.ac.jp
192.168.1.21 cs-room443-d01-in
[root@cs-room443-d01 yp]# reboot
(after restarted)
[root@cs-room443-d01 user01]# domainname
cs-pccluster3
[root@cs-room443-d01 user01]# ypcat passwd
user01:$1$bEK4MwPo$4fFNm4iLyzqrgbRX82MrM1:500:500:Tomonori Kouya:/home/user01:/bin/bash
[root@cs-room443-d01 user01]# ypcat hosts
192.168.2.3 cs-room443-d03
192.168.2.4 cs-room443-d04
133.88.120.88 cs-room443-d01-out cs-room443-d01.cs.sist.ac.jp
192.168.1.22 cs-room443-d02-in
127.0.0.1 localhost localhost.localdomain
192.168.1.23 cs-room443-d03-in
127.0.0.1 localhost localhost.localdomain
192.168.1.24 cs-room443-d04-in
192.168.2.1 cs-room443-d01
192.168.2.2 cs-room443-d02
133.88.120.88 cs-room443-d01-out cs-room443-d01.cs.sist.ac.jp
192.168.1.21 cs-room443-d01-in
[root@cs-room443-d01 user01]# /usr/sbin/exportfs -v
/usr/local 192.168.1.0/255.255.255.0(rw,async,wdelay,root_squash)
/home 192.168.1.0/255.255.255.0(rw,async,wdelay,root_squash)
[root@cs-room443-d01 user01]#
[root@cs-room443-d02 user01]# cat /etc/nsswitch.conf
(omitted)
#passwd: files
passwd: db files nisplus nis
#shadow: files
shadow: db files nisplus nis
#group: files
group: db files nisplus nis
#hosts: db files nisplus nis dns
#hosts: files dns
hosts: db files nisplus nis dns
(omitted)
[root@cs-room43-d02 user01]# cat /etc/yp.conf
(omitted)
domain cs-pccluster3 server cs-room443-d01-in
[root@cs-room43-d02 user01]# /sbin/chkconfig --list ypbind
ypbind 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@cs-room43-d02 user01]# /sbin/chkconfig ypbind on
[root@cs-room43-d02 user01]# /sbin/chkconfig --list ypbind
ypbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@cs-room43-d02 user01]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cs-room43-d02
GATEWAY=192.168.11.254
NISDOMAIN="cs-pccluster3"
[root@cs-room43-d02 user01]# cat /etc/fstab
(omitted)
cs-room443-d01-in:/usr/local /usr/local nfs rw,hard,intr 0 0
cs-room443-d01-in:/home /home nfs rw,hard,intr 0 0
(after restarted)
[root@cs-room43-d02 user01]# ypcat passwd
user01:$1$bEK4MwPo$4fFNm4iLyzqrgbRX82MrM1:500:500: User 01:/home/user01:/bin/bash
[root@cs-room43-d02 user01]# ypcat hosts
192.168.2.3 cs-room443-d03
192.168.2.4 cs-room443-d04
192.168.1.22 cs-room443-d02-in
127.0.0.1 localhost localhost.localdomain
192.168.1.23 cs-room443-d03-in
127.0.0.1 localhost localhost.localdomain
192.168.1.24 cs-room443-d04-in
192.168.2.1 cs-room443-d01
192.168.2.2 cs-room443-d02
192.168.1.21 cs-room443-d01-in
[root@cs-room43-d02 user01]# mount
(omitted)
cs-room443-d01-in:/usr/local on /usr/local type nfs (rw,hard,intr,addr=192.168.1.21)
cs-room443-d01-in:/home on /home type nfs (rw,hard,intr,addr=192.168.1.21)
[root@cs-room443-d01 user01]# cat /etc/lam/lam-bhost.def
(omitted)
cs-room443-d01
cs-room443-d02
cs-room443-d03
cs-room443-d04
[user01@cs-room443-d01 ~]$ lamboot -v <-- must be executed before running MPI programs
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<5522> ssi:boot:base:linear: booting n0 (cs-room443-d01)
n-1<5522> ssi:boot:base:linear: booting n1 (cs-room443-d02)
user01cs-room443-d02's password:Y O U R P A S S W O R D
user01cs-room443-d02's password:Y O U R P A S S W O R D
n-1<5522> ssi:boot:base:linear: booting n2 (cs-room443-d03)
user01cs-room443-d03's password:Y O U R P A S S W O R D
user01cs-room443-d03's password:Y O U R P A S S W O R D
n-1<5522> ssi:boot:base:linear: booting n3 (cs-room443-d04)
user01cs-room443-d04's password:Y O U R P A S S W O R D
user01cs-room443-d04's password:Y O U R P A S S W O R D
n-1<5522> ssi:boot:base:linear: finished
[user01@cs-room443-d01 ~]$ lamnodes
n0 cs-room443-d01:1:origin,this_node
n1 cs-room443-d02:1:
n2 cs-room443-d03:1:
n3 cs-room443-d04:1:
[user01@cs-room443-d01 ~]$ cat mpi_hellow.c
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int myrank, numprocs, length_name;
char nodename[128];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Get_processor_name(nodename, &length_name);
printf("Hellow, MPI! (%0d/%0d)-- %s\n", myrank, numprocs, nodename);
MPI_Finalize();
return 0;
}
[user01@cs-room443-d01 ~]$ mpicc mpi_hellow.c@<-- compiles "mpi_hellow.c" as MPI program
[user01@cs-room443-d01 ~]$ mpirun -np 8 ./a.out <--- executes with 8 PEs
Hellow, MPI! (0/8)-- cs-room443-d01
Hellow, MPI! (2/8)-- cs-room443-d03
Hellow, MPI! (1/8)-- cs-room443-d02
Hellow, MPI! (3/8)-- cs-room443-d04
Hellow, MPI! (6/8)-- cs-room443-d03
Hellow, MPI! (5/8)-- cs-room443-d02
Hellow, MPI! (7/8)-- cs-room443-d04
Hellow, MPI! (4/8)-- cs-room443-d01
[user01@cs-room443-d01 ~]$ lamhalt -v <-- should be executed after MPI jobs
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
Shutting down LAM
hreq: waiting for HALT ACKs from remote LAM daemons
hreq: received HALT_ACK from n2 (cs-room443-d03)
hreq: received HALT_ACK from n1 (cs-room443-d02)
hreq: received HALT_ACK from n3 (cs-room443-d04)
hreq: received HALT_ACK from n0 (cs-room443-d01)
LAM halted
Specify "cpu=2", if 2 PEs per 1 node are simultaneously available
[user01@cs-room443-d01 ~]$ cat /etc/lam/lam-bhost.def
cs-room443-d01 cpu=2
cs-room443-d02 cpu=2
cs-room443-d03 cpu=2
cs-room443-d04 cpu=2
[user01@cs-room443-d01 ~]$ lamnodes
n0 cs-room443-d01:2:origin,this_node
n1 cs-room443-d02:2:
n2 cs-room443-d03:2:
n3 cs-room443-d04:2:
[user01@cs-room443-d01 ~]$ mpirun -np 8 ./a.out
Hellow, MPI! (6/8)-- cs-room443-d04
Hellow, MPI! (0/8)-- cs-room443-d01
Hellow, MPI! (4/8)-- cs-room443-d03
Hellow, MPI! (1/8)-- cs-room443-d01
Hellow, MPI! (7/8)-- cs-room443-d04
Hellow, MPI! (2/8)-- cs-room443-d02
Hellow, MPI! (5/8)-- cs-room443-d03
Hellow, MPI! (3/8)-- cs-room443-d02
Probably, I think that the use of MPICH on PC clusters with Fedora Core 5 need to install MPICH from tarball, under the condition that LAM is not introduced.