Normalise coding style across episodes (#53)

* make example code closer to solution to make it easier for non-c programmers * example of mpi with no command line arguments * update chain image * improve sentence clarity * update code style for all episodes, other than 7 * update code style in episode 07
Southampton-RSG-Training · Jul 31, 2024 · 46b301c · 46b301c
1 parent f5d84db
commit 46b301c
Show file tree

Hide file tree

Showing 10 changed files with 475 additions and 351 deletions.
diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md
@@ -202,7 +202,7 @@ Now, let's take a brief look at these fundamental concepts and explore the diffe
 > |Creation of process/thread instances and communication can result in higher costs and overhead.|Offers lower overhead, as inter-process communication is handled through shared memory, reducing the need for expensive process/thread creation.|
 {: .callout}
 
-## Parallel Paradigms 
+## Parallel Paradigms
 
 Thinking back to shared vs distributed memory models, how to achieve a parallel computation
 is divided roughly into **two paradigms**. Let's set both of these in context:
@@ -226,7 +226,7 @@ kind of problems you have. Sometimes, one has to use both!
 Consider a simple loop which can be sped up if we have many cores for illustration:
 
 ~~~
-for(i=0; i<N; i++) {
+for (i = 0; i < N; ++i) {
   a[i] = b[i] + c[i];
 }
 ~~~
@@ -239,11 +239,11 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
 >{: .checklist}
 > One standard method for programming using data parallelism is called
 > "OpenMP" (for "**O**pen **M**ulti**P**rocessing").
-> To understand what data parallelism means, let's consider the following bit of OpenMP code which 
+> To understand what data parallelism means, let's consider the following bit of OpenMP code which
 > parallelizes the above loop:
 > ~~~
 > #pragma omp parallel for
-> for(i=0; i<N; i++) {
+> for (i = 0; i < N; ++i) {
 >   a[i] = b[i] + c[i];
 > }
 > ~~~
@@ -268,7 +268,7 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
 > data.  For example, using this paradigm to parallelise the above loop instead:
 >
 > ~~~
-> for(i=0; i<m; i++) {
+> for ( i = 0; i < m; ++i) {
 >   a[i] = b[i] + c[i];
 > }
 > ~~~
@@ -288,10 +288,10 @@ just one step (for a factor of $$N$$ speed-up). Let's look into both paradigms i
 >
 > <img src="fig/dataparallel.png" alt="Each rank has its own data"/>
 > Therefore, each rank essentially operates on its own set of data, regardless of paradigm.
-> In some cases, there are advantages to combining data parallelism and message passing methods 
-> together, e.g. when there are problems larger than one GPU can handle. In this case, _data 
-> parallelism_ is used for the portion of the problem contained within one GPU, and then _message 
-> passing_ is used to employ several GPUs (each GPU handles a part of the problem) unless special 
+> In some cases, there are advantages to combining data parallelism and message passing methods
+> together, e.g. when there are problems larger than one GPU can handle. In this case, _data
+> parallelism_ is used for the portion of the problem contained within one GPU, and then _message
+> passing_ is used to employ several GPUs (each GPU handles a part of the problem) unless special
 > hardware/software supports multiple GPU usage.
 {: .callout}
 
@@ -441,13 +441,13 @@ decompose the domain so that many cores can work in parallel.
 >
 >> ## Solution
 >>
->> 
+>>
 >> First Loop: Each iteration depends on the results of the previous two iterations in vector_1. So it is not parallelisable within itself.
 >>
 >> Second Loop: Each iteration is independent and can be parallelised.
 >>
->> Third loop: Each iteration is independent within itself. While there are dependencies on vector_2[i] and vector_1[i], these dependencies are local to each iteration. This independence 
->> allows for the potential parallelization of the third loop by overlapping its execution with the second loop, assuming the results of the first loop are available or can be made 
+>> Third loop: Each iteration is independent within itself. While there are dependencies on vector_2[i] and vector_1[i], these dependencies are local to each iteration. This independence
+>> allows for the potential parallelization of the third loop by overlapping its execution with the second loop, assuming the results of the first loop are available or can be made
 >> available dynamically.
 >>
 >> ~~~

diff --git a/_episodes/02-mpi-api.md b/_episodes/02-mpi-api.md
@@ -108,7 +108,7 @@ following code in a file named **`hello_world.c`**
 ~~~
 #include <stdio.h>
 
-int main (int argc, char *argv[]) {
+int main(int argc, char **argv) {
     printf("Hello World!\n");
 }
 ~~~
@@ -235,7 +235,7 @@ Here's a more complete example:
 #include <stdio.h>
 #include <mpi.h>
 
-int main(int argc, char *argv[]) {
+int main(int argc, char **argv) {
     int num_ranks, my_rank;
 
     // First call MPI_Init
@@ -322,8 +322,9 @@ number of iterations. This ensures the entire desired workload is calculated:
 
 ~~~
 // catch cases where the work can't be split evenly
-if (rank_end > NUM_ITERATIONS || (my_rank == (num_ranks-1) && rank_end < NUM_ITERATIONS))
+if (rank_end > NUM_ITERATIONS || (my_rank == (num_ranks-1) && rank_end < NUM_ITERATIONS)) {
     rank_end = NUM_ITERATIONS;
+}
 ~~~
 {: .language-c}
 
@@ -334,11 +335,12 @@ subset of the problem, and output the result, e.g.:
 // each rank is dealing with a subset of the problem between rank_start and rank_end
 int prime_count = 0;
 for (int n = rank_start; n <= rank_end; ++n) {
-    bool is_prime = true;
+    bool is_prime = true;  // remember to include <stdbool.h>
 
     // 0 and 1 are not prime numbers
-    if (n == 0 || n == 1)
+    if (n == 0 || n == 1) {
         is_prime = false;
+    }
 
     // if we can only divide n by i, then n is not prime
     for (int i = 2; i <= n / 2; ++i) {
@@ -348,8 +350,9 @@ for (int n = rank_start; n <= rank_end; ++n) {
         }
     }
 
-    if (is_prime)
+    if (is_prime) {
         prime_count++;
+    }
 }
 printf("Rank %d - primes between %d-%d is: %d\n", my_rank, rank_start, rank_end, prime_count);
 ~~~

diff --git a/_episodes/03-communicating-data.md b/_episodes/03-communicating-data.md
@@ -348,4 +348,3 @@ communication and calculation is often worth the more difficult implementation a
 > {: .solution}
 >
 {: .challenge}
-
diff --git a/_episodes/04-point-to-point-communication.md b/_episodes/04-point-to-point-communication.md
@@ -37,9 +37,9 @@ and will not return until the communication on both sides is complete.
 
 The `MPI_Send` function is defined as follows:
 
-~~~
+~~~c
 int MPI_Send(
-    const void* data,
+    void *data,
     int count,
     MPI_Datatype datatype,
     int destination,
@@ -48,13 +48,11 @@ int MPI_Send(
 ~~~
 {: .language-c}
 
-The arguments to
-
 | `data`:         | Pointer to the start of the data being sent. We would not expect this to change, hence it's defined as `const` |
 | `count`:        | Number of elements to send |
 | `datatype`:     | The type of the element data being sent, e.g. MPI_INTEGER, MPI_CHAR, MPI_FLOAT, MPI_DOUBLE, ... |
 | `destination`:  | The rank number of the rank the data will be sent to |
-| `tag`:          | An optional message tag (integer), which is optionally used to differentiate types of messages. We can specify `0` if we don't need different types of messages |
+| `tag`:          | An message tag (integer), which is used to differentiate types of messages. We can specify `0` if we don't need different types of messages |
 | `communicator`: | The communicator, e.g. MPI_COMM_WORLD as seen in previous episodes |
 {: .show-c}
 
@@ -95,15 +93,15 @@ having to send more than one type of message. This call is synchronous, and will
 
 Conversely, the `MPI_Recv` function looks like the following:
 
-~~~
+~~~c
 int MPI_Recv(
-    void* data,
+    void *data,
     int count,
     MPI_Datatype datatype,
     int source,
     int tag,
     MPI_Comm communicator,
-    MPI_Status* status)
+    MPI_Status *status)
 ~~~
 
 | `data`:         | Pointer to where the received data should be written |
@@ -138,31 +136,31 @@ from rank 0 to rank 1:
 #include <stdio.h>
 #include <mpi.h>
 
-int main(int argc, char** argv) {
+int main(int argc, char **argv) {
   int rank, n_ranks;
 
   // First call MPI_Init
   MPI_Init(&argc, &argv);
 
   // Check that there are two ranks
-  MPI_Comm_size(MPI_COMM_WORLD,&n_ranks);
-  if( n_ranks != 2 ){
+  MPI_Comm_size(MPI_COMM_WORLD, &n_ranks);
+  if (n_ranks != 2) {
     printf("This example requires exactly two ranks\n");
     MPI_Finalize();
-    return(1);
+    return 1;
   }
 
   // Get my rank
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);
 
-  if( rank == 0 ){
+  if (rank == 0) {
      char *message = "Hello, world!\n";
      MPI_Send(message, 14, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
   }
 
-  if( rank == 1 ){
+  if (rank == 1) {
      char message[14];
-     MPI_Status  status;
+     MPI_Status status;
      MPI_Recv(message, 14, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
      printf("%s",message);
   }
@@ -243,7 +241,7 @@ int main(int argc, char** argv) {
 >> #include <stdio.h>
 >> #include <mpi.h>
 >>
->> int main(int argc, char** argv) {
+>> int main(int argc, char **argv) {
 >>     int rank, n_ranks, my_pair;
 >>
 >>     // First call MPI_Init
@@ -256,21 +254,20 @@ int main(int argc, char** argv) {
 >>     MPI_Comm_rank(MPI_COMM_WORLD,&rank);
 >>
 >>     // Figure out my pair
->>     if( rank%2 == 1 ){
+>>     if (rank % 2 == 1) {
 >>        my_pair = rank - 1;
 >>     } else {
 >>        my_pair = rank + 1;
 >>     }
 >>
 >>     // Run only if my pair exists
->>     if( my_pair < n_ranks ){
->>
->>        if( rank%2 == 0 ){
+>>     if (my_pair < n_ranks) {
+>>        if (rank % 2 == 0) {
 >>            char *message = "Hello, world!\n";
 >>            MPI_Send(message, 14, MPI_CHAR, my_pair, 0, MPI_COMM_WORLD);
 >>        }
 >>
->>        if( rank%2 == 1 ){
+>>        if (rank % 2 == 1) {
 >>            char message[14];
 >>            MPI_Status  status;
 >>            MPI_Recv(message, 14, MPI_CHAR, my_pair, 0, MPI_COMM_WORLD, &status);
@@ -295,7 +292,7 @@ int main(int argc, char** argv) {
 > #include <stdio.h>
 > #include <mpi.h>
 >
-> int main(int argc, char** argv) {
+> int main(int argc, char **argv) {
 >     int rank;
 >     int message[30];
 >
@@ -320,7 +317,7 @@ int main(int argc, char** argv) {
 >> #include <stdio.h>
 >> #include <mpi.h>
 >>
->> int main(int argc, char** argv) {
+>> int main(int argc, char **argv) {
 >>     int rank, n_ranks, numbers_per_rank;
 >>
 >>     // First call MPI_Init
@@ -329,7 +326,7 @@ int main(int argc, char** argv) {
 >>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 >>     MPI_Comm_size(MPI_COMM_WORLD, &n_ranks);
 >>
->>     if( rank != 0 ) {
+>>     if (rank != 0) {
 >>        // All ranks other than 0 should send a message
 >>
 >>        char message[30];
@@ -360,7 +357,7 @@ int main(int argc, char** argv) {
 >
 > Try the code below with two ranks and see what happens. How would you change the code to fix the problem?
 >
-> _Note: If you are using the MPICH library, this example might automagically work. With OpenMPI it shouldn't!)_
+> _Note: If you are using MPICH, this example might work. With OpenMPI it shouldn't!_
 >
 > ~~~
 > #include <mpi.h>
@@ -378,8 +375,8 @@ int main(int argc, char** argv) {
 >     MPI_Status recv_status;
 >
 >     if (rank == 0) {
->         /* synchronous send: returns when the destination has started to
->            receive the message */
+>         // synchronous send: returns when the destination has started to
+>         // receive the message
 >         MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD);
 >         MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD, &recv_status);
 >     } else {
@@ -405,12 +402,17 @@ int main(int argc, char** argv) {
 >> Even when this happens, the actual transfer will not start before the receive is posted.
 >>
 >> For this example, let's have rank 0 send first, and rank 1 receive first.
->> So all we need to do to fix this is to swap the send and receive in the case of rank 1
->> (after the `else`):
+>> So all we need to do to fix this is to swap the send and receive for rank 1:
 >>
 >> ~~~
->>         MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD, &recv_status);
->>         MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD);
+>> if (rank == 0) {
+>>    MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD);
+>>    MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 1, comm_tag, MPI_COMM_WORLD, &recv_status);
+>> } else {
+>>    // Change the order, receive then send
+>>    MPI_Recv(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD, &recv_status);
+>>    MPI_Ssend(&numbers, ARRAY_SIZE, MPI_INT, 0, comm_tag, MPI_COMM_WORLD);
+>> }
 >> ~~~
 >>{: .language-c}
 >{: .solution}
@@ -435,7 +437,7 @@ int main(int argc, char** argv) {
 >> #include <stdio.h>
 >> #include <mpi.h>
 >>
->> int main(int argc, char** argv) {
+>> int main(int argc, char **argv) {
 >>     int rank, neighbour;
 >>     int max_count = 1000000;
 >>     int counter;
@@ -450,13 +452,13 @@ int main(int argc, char** argv) {
 >>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 >>
 >>     // Call the other rank the neighbour
->>     if( rank == 0 ){
+>>     if (rank == 0) {
 >>         neighbour = 1;
 >>     } else {
 >>         neighbour = 0;
 >>     }
 >>
->>     if( rank == 0 ){
+>>     if (rank == 0) {
 >>         // Rank 0 starts with the ball. Send it to rank 1
 >>         MPI_Send(&ball, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
 >>     }
@@ -465,8 +467,7 @@ int main(int argc, char** argv) {
 >>     // the behaviour is the same for both ranks
 >>     counter = 0;
 >>     bored = 0;
->>     while( !bored )
->>     {
+>>     while (!bored) {
 >>         // Receive the ball
 >>         MPI_Recv(&ball, 1, MPI_INT, neighbour, 0, MPI_COMM_WORLD, &status);
 >>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -348,4 +348,3 @@ communication and calculation is often worth the more difficult implementation a
		> {: .solution}
		>
		{: .challenge}