MPI on Rohan at SDSU - Fall 03
10Nov03

In CS 575 this semester, we have been working towards understanding performance and the need to look to parallel processing to deliver performance.

Our main instructional machine, Rohan is a SunFire 4800 running Solaris 8 at San Diego State University, with 8 CPUs.

Chap 3: SPARC Optimization and Parallel Processing from Sun Docs

You must add /opt/SUNWhpc/HPC5.0/bin to your path and /opt/SUNWhpc/HPC5.0/man to your MANPATH.

Sun provides some introductory codes, with Makefile, to demonstrate how to write MPI:

/opt/SUNWhpc/HPC5.0/examples/mpi

After copying these codes to a directory in your Rohan account, you will want to execute the command

make runcre

You should examine the contents of the Makefile, focussing on the CRE to discover how these run-time examples were created.
CRE stands for Sun's Clustertools Runtime Environment.

You will also want to examine the source code for the Fortran, C and C++ sample codes. Until you update your MANPATH, you can use the following commands to access more information on the needed software, so you can use:

man -M /opt/SUNWhpc/HPC5.0/man mprun
man -M /opt/SUNWhpc/HPC5.0/man mpcc
man -M /opt/SUNWhpc/HPC5.0/man mpf95

An even simpler starting example would be a "hello world" example I obtained from a workshop I attend at the San Diego Supercomputer. Below you have the samples code and its execution on Rohan. Obtain you own copies from the instructor's account:

cp ~stewart/cs575/fall03/hello*.* .

Hello code using MPI in C
Execution of Hello using 1,2,4,8 processors
Hello code using MPI in Fortran
Execution of Hello using 1,2,4,8 processors

Using the template example from our text, in Chapter 11 : Programming Shared-Memory Multiprocessors (p. 223), we can example the following run-time information (with timings) from a student account masc0155. From the top command, you see the job running, with the number of threads from autoparallelizing. At the end of each file, you see the /bin/time timing results, revealing that the real time (wall clock time) is reduced when more threads are running. You also see that the user time grows, since the user is charged for the amount of time used on each processor. The final case with 8 threads is very costly, probably due to thrashing by the operating systems. Recall, Rohan has a total of 8 CPUs.

Run Text P. 223 Example - one processor (Real 26.8; user 24.0; sys 0.1)
Run Text P. 223 Example - two processors (Real 16.1; user 27.8; sys 0.1)
Run Text P. 223 Example - four processors (Real 8.8; user 31.2; sys 0.2)
Run Text P. 223 Example - eight processors (Real 50.9; user 4:06.1; sys 0.4)
Return to CS 575 home page