Skip to content

Commit

Permalink
Normalise coding style across episodes (#53)
Browse files Browse the repository at this point in the history
* make example code closer to solution to make it easier for non-c programmers

* example of mpi with no command line arguments

* update chain image

* improve sentence clarity

* update code style for all episodes, other than 7

* update code style in episode 07
  • Loading branch information
Edward-RSE authored Jul 31, 2024
1 parent f5d84db commit 46b301c
Show file tree
Hide file tree
Showing 10 changed files with 475 additions and 351 deletions.
24 changes: 12 additions & 12 deletions _episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ Now, let's take a brief look at these fundamental concepts and explore the diffe
> |Creation of process/thread instances and communication can result in higher costs and overhead.|Offers lower overhead, as inter-process communication is handled through shared memory, reducing the need for expensive process/thread creation.|
{: .callout}

## Parallel Paradigms
## Parallel Paradigms

Thinking back to shared vs distributed memory models, how to achieve a parallel computation
is divided roughly into **two paradigms**. Let's set both of these in context:
Expand All @@ -226,7 +226,7 @@ kind of problems you have. Sometimes, one has to use both!
Consider a simple loop which can be sped up if we have many cores for illustration:

~~~
for(i=0; i<N; i++) {
for (i = 0; i < N; ++i) {
a[i] = b[i] + c[i];
}
~~~
Expand All @@ -239,11 +239,11 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
>{: .checklist}
> One standard method for programming using data parallelism is called
> "OpenMP" (for "**O**pen **M**ulti**P**rocessing").
> To understand what data parallelism means, let's consider the following bit of OpenMP code which
> To understand what data parallelism means, let's consider the following bit of OpenMP code which
> parallelizes the above loop:
> ~~~
> #pragma omp parallel for
> for(i=0; i<N; i++) {
> for (i = 0; i < N; ++i) {
> a[i] = b[i] + c[i];
> }
> ~~~
Expand All @@ -268,7 +268,7 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
> data. For example, using this paradigm to parallelise the above loop instead:
>
> ~~~
> for(i=0; i<m; i++) {
> for ( i = 0; i < m; ++i) {
> a[i] = b[i] + c[i];
> }
> ~~~
Expand All @@ -288,10 +288,10 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
>
> <img src="fig/dataparallel.png" alt="Each rank has its own data"/>
> Therefore, each rank essentially operates on its own set of data, regardless of paradigm.
> In some cases, there are advantages to combining data parallelism and message passing methods
> together, e.g. when there are problems larger than one GPU can handle. In this case, _data
> parallelism_ is used for the portion of the problem contained within one GPU, and then _message
> passing_ is used to employ several GPUs (each GPU handles a part of the problem) unless special
> In some cases, there are advantages to combining data parallelism and message passing methods
> together, e.g. when there are problems larger than one GPU can handle. In this case, _data
> parallelism_ is used for the portion of the problem contained within one GPU, and then _message
> passing_ is used to employ several GPUs (each GPU handles a part of the problem) unless special
> hardware/software supports multiple GPU usage.
{: .callout}
Expand Down Expand Up @@ -441,13 +441,13 @@ decompose the domain so that many cores can work in parallel.
>
>> ## Solution
>>
>>
>>
>> First Loop: Each iteration depends on the results of the previous two iterations in vector_1. So it is not parallelisable within itself.
>>
>> Second Loop: Each iteration is independent and can be parallelised.
>>
>> Third loop: Each iteration is independent within itself. While there are dependencies on vector_2[i] and vector_1[i], these dependencies are local to each iteration. This independence
>> allows for the potential parallelization of the third loop by overlapping its execution with the second loop, assuming the results of the first loop are available or can be made
>> Third loop: Each iteration is independent within itself. While there are dependencies on vector_2[i] and vector_1[i], these dependencies are local to each iteration. This independence
>> allows for the potential parallelization of the third loop by overlapping its execution with the second loop, assuming the results of the first loop are available or can be made
>> available dynamically.
>>
>> ~~~
Expand Down
15 changes: 9 additions & 6 deletions _episodes/02-mpi-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ following code in a file named **`hello_world.c`**
~~~
#include <stdio.h>
int main (int argc, char *argv[]) {
int main(int argc, char **argv) {
printf("Hello World!\n");
}
~~~
Expand Down Expand Up @@ -235,7 +235,7 @@ Here's a more complete example:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int main(int argc, char **argv) {
int num_ranks, my_rank;
// First call MPI_Init
Expand Down Expand Up @@ -322,8 +322,9 @@ number of iterations. This ensures the entire desired workload is calculated:
~~~
// catch cases where the work can't be split evenly
if (rank_end > NUM_ITERATIONS || (my_rank == (num_ranks-1) && rank_end < NUM_ITERATIONS))
if (rank_end > NUM_ITERATIONS || (my_rank == (num_ranks-1) && rank_end < NUM_ITERATIONS)) {
rank_end = NUM_ITERATIONS;
}
~~~
{: .language-c}
Expand All @@ -334,11 +335,12 @@ subset of the problem, and output the result, e.g.:
// each rank is dealing with a subset of the problem between rank_start and rank_end
int prime_count = 0;
for (int n = rank_start; n <= rank_end; ++n) {
bool is_prime = true;
bool is_prime = true; // remember to include <stdbool.h>
// 0 and 1 are not prime numbers
if (n == 0 || n == 1)
if (n == 0 || n == 1) {
is_prime = false;
}
// if we can only divide n by i, then n is not prime
for (int i = 2; i <= n / 2; ++i) {
Expand All @@ -348,8 +350,9 @@ for (int n = rank_start; n <= rank_end; ++n) {
}
}
if (is_prime)
if (is_prime) {
prime_count++;
}
}
printf("Rank %d - primes between %d-%d is: %d\n", my_rank, rank_start, rank_end, prime_count);
~~~
Expand Down
1 change: 0 additions & 1 deletion _episodes/03-communicating-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,4 +348,3 @@ communication and calculation is often worth the more difficult implementation a
> {: .solution}
>
{: .challenge}
73 changes: 37 additions & 36 deletions _episodes/04-point-to-point-communication.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ and will not return until the communication on both sides is complete.

The `MPI_Send` function is defined as follows:

~~~
~~~c
int MPI_Send(
const void* data,
void *data,
int count,
MPI_Datatype datatype,
int destination,
Expand All @@ -48,13 +48,11 @@ int MPI_Send(
~~~
{: .language-c}
The arguments to

| `data`: | Pointer to the start of the data being sent. We would not expect this to change, hence it's defined as `const` |
| `count`: | Number of elements to send |
| `datatype`: | The type of the element data being sent, e.g. MPI_INTEGER, MPI_CHAR, MPI_FLOAT, MPI_DOUBLE, ... |
| `destination`: | The rank number of the rank the data will be sent to |
| `tag`: | An optional message tag (integer), which is optionally used to differentiate types of messages. We can specify `0` if we don't need different types of messages |
| `tag`: | An message tag (integer), which is used to differentiate types of messages. We can specify `0` if we don't need different types of messages |
| `communicator`: | The communicator, e.g. MPI_COMM_WORLD as seen in previous episodes |
{: .show-c}
Expand Down Expand Up @@ -95,15 +93,15 @@ having to send more than one type of message. This call is synchronous, and will
Conversely, the `MPI_Recv` function looks like the following:
~~~
~~~c
int MPI_Recv(
void* data,
void *data,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm communicator,
MPI_Status* status)
MPI_Status *status)
~~~

| `data`: | Pointer to where the received data should be written |
Expand Down Expand Up @@ -138,31 +136,31 @@ from rank 0 to rank 1:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int main(int argc, char **argv) {
int rank, n_ranks;
// First call MPI_Init
MPI_Init(&argc, &argv);
// Check that there are two ranks
MPI_Comm_size(MPI_COMM_WORLD,&n_ranks);
if( n_ranks != 2 ){
MPI_Comm_size(MPI_COMM_WORLD, &n_ranks);
if (n_ranks != 2) {
printf("This example requires exactly two ranks\n");
MPI_Finalize();
return(1);
return 1;
}
// Get my rank
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
if( rank == 0 ){
if (rank == 0) {
char *message = "Hello, world!\n";
MPI_Send(message, 14, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
}
if( rank == 1 ){
if (rank == 1) {
char message[14];
MPI_Status status;
MPI_Status status;
MPI_Recv(message, 14, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
printf("%s",message);
}
Expand Down Expand Up @@ -243,7 +241,7 @@ int main(int argc, char** argv) {
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main(int argc, char** argv) {
>> int main(int argc, char **argv) {
>> int rank, n_ranks, my_pair;
>>
>> // First call MPI_Init
Expand All @@ -256,21 +254,20 @@ int main(int argc, char** argv) {
>> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>
>> // Figure out my pair
>> if( rank%2 == 1 ){
>> if (rank % 2 == 1) {
>> my_pair = rank - 1;
>> } else {
>> my_pair = rank + 1;
>> }
>>
>> // Run only if my pair exists
>> if( my_pair < n_ranks ){
>>
>> if( rank%2 == 0 ){
>> if (my_pair < n_ranks) {
>> if (rank % 2 == 0) {
>> char *message = "Hello, world!\n";
>> MPI_Send(message, 14, MPI_CHAR, my_pair, 0, MPI_COMM_WORLD);
>> }
>>
>> if( rank%2 == 1 ){
>> if (rank % 2 == 1) {
>> char message[14];
>> MPI_Status status;
>> MPI_Recv(message, 14, MPI_CHAR, my_pair, 0, MPI_COMM_WORLD, &status);
Expand All @@ -295,7 +292,7 @@ int main(int argc, char** argv) {
> #include <stdio.h>
> #include <mpi.h>
>
> int main(int argc, char** argv) {
> int main(int argc, char **argv) {
> int rank;
> int message[30];
>
Expand All @@ -320,7 +317,7 @@ int main(int argc, char** argv) {
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main(int argc, char** argv) {
>> int main(int argc, char **argv) {
>> int rank, n_ranks, numbers_per_rank;
>>
>> // First call MPI_Init
Expand All @@ -329,7 +326,7 @@ int main(int argc, char** argv) {
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &n_ranks);
>>
>> if( rank != 0 ) {
>> if (rank != 0) {
>> // All ranks other than 0 should send a message
>>
>> char message[30];
Expand Down Expand Up @@ -360,7 +357,7 @@ int main(int argc, char** argv) {
>
> Try the code below with two ranks and see what happens. How would you change the code to fix the problem?
>
> _Note: If you are using the MPICH library, this example might automagically work. With OpenMPI it shouldn't!)_
> _Note: If you are using MPICH, this example might work. With OpenMPI it shouldn't!_
>
> ~~~
> #include <mpi.h>
Expand All @@ -378,8 +375,8 @@ int main(int argc, char** argv) {
> MPI_Status recv_status;
>
> if (rank == 0) {
> /* synchronous send: returns when the destination has started to
> receive the message */
> // synchronous send: returns when the destination has started to
> // receive the message
> MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD);
> MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD, &recv_status);
> } else {
Expand All @@ -405,12 +402,17 @@ int main(int argc, char** argv) {
>> Even when this happens, the actual transfer will not start before the receive is posted.
>>
>> For this example, let's have rank 0 send first, and rank 1 receive first.
>> So all we need to do to fix this is to swap the send and receive in the case of rank 1
>> (after the `else`):
>> So all we need to do to fix this is to swap the send and receive for rank 1:
>>
>> ~~~
>> MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD, &recv_status);
>> MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD);
>> if (rank == 0) {
>> MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD);
>> MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD, &recv_status);
>> } else {
>> // Change the order, receive then send
>> MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD, &recv_status);
>> MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD);
>> }
>> ~~~
>>{: .language-c}
>{: .solution}
Expand All @@ -435,7 +437,7 @@ int main(int argc, char** argv) {
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main(int argc, char** argv) {
>> int main(int argc, char **argv) {
>> int rank, neighbour;
>> int max_count = 1000000;
>> int counter;
Expand All @@ -450,13 +452,13 @@ int main(int argc, char** argv) {
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> // Call the other rank the neighbour
>> if( rank == 0 ){
>> if (rank == 0) {
>> neighbour = 1;
>> } else {
>> neighbour = 0;
>> }
>>
>> if( rank == 0 ){
>> if (rank == 0) {
>> // Rank 0 starts with the ball. Send it to rank 1
>> MPI_Send(&ball, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
>> }
Expand All @@ -465,8 +467,7 @@ int main(int argc, char** argv) {
>> // the behaviour is the same for both ranks
>> counter = 0;
>> bored = 0;
>> while( !bored )
>> {
>> while (!bored) {
>> // Receive the ball
>> MPI_Recv(&ball, 1, MPI_INT, neighbour, 0, MPI_COMM_WORLD, &status);
>>
Expand Down
Loading

0 comments on commit 46b301c

Please sign in to comment.