The purpose of this lab is to become familiar with MPI account setup and documentation as well as compiling and running elementary MPI programs on CRAY Darter.
In this lab you will utilize the most fundamental MPI calls necessary for development of any MPI code, as well as learn how to compile and run MPI code. During this lab you will encounter the following exercises:
But first, the following . . .
During the workshop exercises you may be asked to write a code; if possible write the codes on your own. To do so use the files with the .start extensions that are provided in the directories where programming is required.
For those of you wishing to concentrate only on the message passing aspects of the code, files with .template extensions have been provided in the exercise directories that require programming. To modify the template files, first copy filename._.template to filename.c or filename.f depending on whether you are using C or FORTRAN. For example:
% cp filename.c.template filename.c
% cp filename.f.template filename.f
Next, invoke your favorite text editor and modify the template by replacing all of the XXXXX's with the appropriate MPI calls.
For your convenience, there are completed solutions to the programming exercises available as files with .soln extensions. Remember, the only way to learn how to program is by actually programming, SO LOOK AT THE SOLUTIONS ONLY AS A LAST RESORT.
Finally, extra exercises have been provided at the end of some of the main exercises. Please, do not work on these until after you have completed the main exercises for the day. These extra exercises have been provided without templates or solutions, and your lab assistants may be able to give you only very general help with them. In other words, try them at your own risk.
The objective of this exercise is not to write a code but to demonstrate the fundamentals of compiling an MPI program and submitting it via qsub.
% cc filename.c -o filename or
% ftn filename.f -o filename
where filename is the resulting executable.
% make
in the appropriate directory. The resulting executable will be filename.
For the "Hello World!" program, enter either
% cc hello.c -o hello for C programs, or
% ftn hello.f -o hello for FORTRAN programs
or, enter
% make
Again, if you use the provided makefile, first make sure you understand what it is doing.
Hello World (from masternode) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node) Hello WORLD!!! (from worker node)Since a parallel program runs on several processors, there needs to be some method of starting the program on each of the different processors. On Darter this is done by using the Batch Scripts. Batch scripts can be used to run a set of commands on a systems compute partition. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. Batch scripts are submitted to the batch manager, PBS, where they are parsed. Based on the parsed data, PBS places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources. The batch script example below explains how to submit the batch scripts, and common usage tips.
1: #!/bin/bash 2: #PBS -A XXXYYY 3: #PBS -N test 4: #PBS -j oe 5: #PBS -l walltime=1:00:00,size=256 6: 7: cd $PBS_O_WORKDIR 8: date 9: aprun -n 256 ./a.out
This batch script can be broken down into the following sections:
date
command.a.out
on 256 cores with a.out
.Batch scripts can be submitted for execution using the qsub
command on Darter. For example, the following will submit the batch script named
test.pbs
:
% qsub test.pbs
If successfully submitted, a PBS job ID will be returned. This ID can be used to track the job.
For more information about qsub see the man pages.
Look at the running job page for more (and some redundant) information. In particular, look at the PBS commands for submitting jobs, removing jobs from the queue, etc.
% qsub pbssub
The objective of this exercise it to become familiar with the basic MPI routines used in almost any MPI program. You are asked to write an SPMD(Single Process, Multiple Data) program where, again, each process checks its rank, and decides if it is the master (if its rank is 0), or a worker (if its rank is 1 or greater).
This program calculates pi using a simple integration of a tangent function.
% cc mpi_pi.c -o mpi_pi
% ftn mpi_pi.f -o mpi_pi
The objective of this exercise is to investigate the amount of time required for message passing between two processes.
In this exercise different size messages are sent back and forth between two processes a number of times. Timings are made for each message before it is sent and after it has been received. The difference is computed to obtain the actual communication time. Finally, the average communication time and the bandwidth are calculated and output to the screen.
We will run this code on two nodes (one process on each node) passing messages of length 1, 100, 10,000, and 1,000,000. You can record your results in a table like the one below.
communication | ||
length | time (uSec) |
bandwidth (Megabit/Sec) |
1 | 0.000001 | 65.440140 |
100 | 0.000002 | 2936.930591 |
10,000 | 0.000052 | 12321.465896 |
1,000,000 | 0.005133 | 12468.521884 |
The objective of this exercise is to introduce some intermediate MPI features, and to understand how a possible deadlock situation can occur during message passing.
Write a program (pingpong) in which two processes pass a message (a certain number of real or float numbers) back and forth (perhaps 100 times). You will use the MPI_Wtime() routine as a timer in the following exercise. This routine returns a time expressed in seconds, so in order to time something, two calls are needed and the difference should be taken between them to obtain the total elapsed time (in wall clock seconds).
Consider a set of processes arranged in a ring as shown below. Use a
token passing method to compute the sum of the ranks of the processes.
Figure 1: Four processors arranged in a ring, messages are sent
from 0 to 1 to 2 to 3 to 0 again, sum of ranks is 6.
Each processor stores its rank in MPI_COMM_WORLD in an integer and sends this value to the processor on its right. It then receives an integer from its left neighbor. It keeps track of the sum of all the integers received. The processors continue passing on the values they receive until they get their own rank back. Each process should finish by printing out the sum of the values. Use synchronous sends MPI_Ssend() (blocking) or MPI_Issend() (non-blocking) for this program. Watch out for deadlock situations. If you use non-blocking sends, make sure that you do not overwrite information. You are asked to use synchronous message passing because the standard send can be either buffered or synchronous, and you should learn to program for either possibility.
This is a simple array assignment used to demonstrate the distribution of data among multiple tasks and the communications required to accomplish that distribution.
The master distributes an equal portion of the array to each worker. Each worker receives its portion of the array and performs a simple value assignment to each of its elements. Each worker then sends its portion of the array back to the master. As the master receives a portion of the array from each worker, selected elements are displayed.
Note: For this example, the number of processes should be set to an odd number(aprun -n 7), to ensure even distribution of the array to numtasks-1 worker tasks.
This example is a simple matrix multiplication program.
The data is distributed among the workers who perform the actual multiplication in smaller blocks and send back their respective results to the master.
Note: The C and FORTRAN versions of this code differ because of the way arrays are stored/passed. C arrays are stored in row-major order while FORTRAN arrays are stored in column-major order.
This example solves a two-dimensional Laplace equation using the point Jacobi iteration method over a rectangular domain. The initial guess value of the function is zero. The boundaries are held at 100 throughout the calculation. Domain decomposition will be used for the parallel implementation of the problem. To run this exercise, run it on 4 processes (aprun -n 4).
Type exit to close the connection with the CRAY Darter machine.
% man MPI
% man cc
% man ftn
% man qsub
% man MPI_Init
% man MPI_Finalize
The original MPI training materials for workstations were developed under the Joint Information Systems Committee (JISC) New Technologies Initiative by the Training and Education Centre at Edinburgh Parallel Computing Centre (EPCC-TEC), University of Edinburgh, United Kingdom.
Thanks also to Blaise Barney from Cornell University Theory Center for his modifications of the labs available through the MHPCC on the World Wide Web. These labs have since been modified for this workshop.