From 0c0c45f7bf767125249f5a81d5e67c577e1e80e6 Mon Sep 17 00:00:00 2001 From: Edward Parkinson <94305757+Edward-RSE@users.noreply.github.com> Date: Fri, 26 Jul 2024 10:41:48 +0100 Subject: [PATCH] Improve clarification about communication methods in episode 3 (#50) * improve clarification between communication modes * re-arrange episode to improve flow into communication modes and following section * change structure of lesson to flow with the new blocking vs. non-blocking section * remove submodule folder * add callout to explain that bsend is not non-blocking * one last fix for grammar and spelling mistakes * Delete .gitmodules * Apply suggestions from code review for 05-collective-communication Co-authored-by: Steve Crouch * address review comments --------- Co-authored-by: Steve Crouch --- _episodes/03-communicating-data.md | 427 +++++++++++++---------- _episodes/05-collective-communication.md | 7 +- 2 files changed, 244 insertions(+), 190 deletions(-) diff --git a/_episodes/03-communicating-data.md b/_episodes/03-communicating-data.md index 6c682a8..b8ced3a 100644 --- a/_episodes/03-communicating-data.md +++ b/_episodes/03-communicating-data.md @@ -23,91 +23,265 @@ parallelisation often relies on how you communicate data. ## Communicating data using messages -MPI is a standardised framework for passing data and other messages between independently running processes. If we want -to share or access data from one rank to another, we use the MPI API to transfer data in a "message." To put it simply, -a message is merely a data structure which contains the data, and is usually expressed as a collection of data elements -of a particular data type. - -Sending and receiving data happens in one of two ways. We either want to send data from one specific rank to another, -known as point-to-point communication, or to/from multiple ranks all at once to a single target, known as collective -communication. In both cases, we have to *explicitly* "send" something and to *explicitly* "receive" something. We've -emphasised *explicitly* here to make clear that data communication can't happen by itself. That is a rank can't just -"pluck" data from one rank, because a rank doesn't automatically send and receive the data it needs. If we don't program -in data communication, data can't be shared. None of this communication happens for free, either. With every message -sent, there is an associated overhead which impacts the performance of your program. Often we won't notice this -overhead, as it is quite small. But if we communicate large amounts data or too often, those small overheads can rapidly -add up into a noticeable performance hit. - -> ## Common mistakes -> -> A common mistake for new MPI users is to write code using point-to-point communication which emulates what the -> collective communication functions are designed to do. This is an inefficient way to share data. The collective -> routines in MPI have multiple tricks and optimizations up their sleeves, resulting in communication overheads much -> lower than the equivalent point-to-point approach. One other advantage is that collective communication often requires -> less code to achieve the same thing, which is always a win. It is almost always better to use collective -> operations where you can. -> -{: .callout} +MPI is a framework for passing data and other messages between independently running processes. If we want to share or +access data from one rank to another, we use the MPI API to transfer data in a "message." A message is a data structure +which contains the data we want to send, and is usually expressed as a collection of data elements of a particular data +type. + +Sending and receiving data can happen happen in two patterns. We either want to send data from one specific rank to +another, known as point-to-point communication, or to/from multiple ranks all at once to a single or multiples targets, +known as collective communication. Whatever we do, we always have to *explicitly* "send" something and to *explicitly* +"receive" something. Data communication can't happen by itself. A rank can't just get data from one rank, and ranks +don't automatically send and receive data. If we don't program in data communication, data isn't exchanged. +Unfortunately, none of this communication happens for free, either. With every message sent, there is an overhead which +impacts the performance of your program. Often we won't notice this overhead, as it is usually quite small. But if we +communicate large data or small amounts too often, those (small) overheads add up into a noticeable performance hit. To get an idea of how communication typically happens, imagine we have two ranks: rank A and rank B. If rank A wants to -send data to rank B (e.g., point-to-point), it must first call the appropriate MPI send function which puts that data -into an internal *buffer*; sometimes known as the send buffer or envelope. Once the data is in the buffer, MPI figures -out how to route the message to rank B (usually over a network) and sends it to B. To receive the data, rank B must call -a data receiving function which will listen for any messages being sent. When the message has been successfully routed -and the data transfer complete, rank B sends an acknowledgement back to rank A to say that the transfer has finished, -similarly to how read receipts work in e-mails and instant messages. +send data to rank B (e.g., point-to-point), it must first call the appropriate MPI send function which typically (but +not always, as we'll find out later) puts that data into an internal *buffer*; known as the **send buffer** or +the **envelope**. Once the data is in the buffer, MPI figures out how to route the message to rank B (usually over a +network) and then sends it to B. To receive the data, rank B must call a data receiving function which will listen for +messages being sent to it. In some cases, rank B will then send an acknowledgement to say that the transfer has +finished, similar to read receipts in e-mails and instant messages. > ## Check your understanding > -> In an imaginary simulation, each rank is responsible for calculating the physical properties for a subset of -> cells on a larger simulation grid. Another calculation, however, needs to know the average of, for example, the -> temperature for the subset of cells for each rank. What approaches could you use to share this data? +> Consider a simulation where each rank calculates the physical properties for a subset of cells on a very large +> grid of points. One step of the calculation needs to know the average temperature across the entire grid +> of points. How would you approach calculating the average temperature? > > > ## Solution > > > > There are multiple ways to approach this situation, but the most efficient approach would be to use collective -> > operations to send the average temperature to a root rank (or all ranks) to perform the final calculation. You can, -> > of course, also use a point-to-point pattern, but it would be less efficient. +> > operations to send the average temperature to a main rank which performs the final calculation. You can, of course, +> > also use a point-to-point pattern, but it would be less efficient, especially with a large number of ranks. +> > +> > If the simulation wasn't done in parallel, or was instead using shared-memory parallelism, such as OpenMP, we +> > wouldn't need to do any communication to get the data required to calculate the average. +> > +> {: .solution} +{: .challenge} + +## MPI data types + +When we send a message, MPI needs to know the size of the data being transferred. The size is not the number of bytes of +data being sent, as you may expect, but is instead the number of elements of a specific data type being sent. When we +send a message, we have to tell MPI how many elements of "something" we are sending and what type of data it is. If we +don't do this correctly, we'll either end up telling MPI to send only *some* of the data or try to send more data than +we want! For example, if we were sending an array and we specify too few elements, then only a subset of the array will +be sent or received. But if we specify too many elements, than we are likely to end up with either a segmentation fault +or undefined behaviour! And the same can happen if we don't specify the correct data type. + +There are two types of data type in MPI: "basic" data types and derived data types. The basic data types are in essence +the same data types we would use in C such as `int`, `float`, `char` and so on. However, MPI doesn't use the same +primitive C types in its API, using instead a set of constants which internally represent the data types. These data +types are in the table below: + +| MPI basic data type | C equivalent | +| ---------------------- | ---------------------- | +| MPI_SHORT | short int | +| MPI_INT | int | +| MPI_LONG | long int | +| MPI_LONG_LONG | long long int | +| MPI_UNSIGNED_CHAR | unsigned char | +| MPI_UNSIGNED_SHORT | unsigned short int | +| MPI_UNSIGNED | unsigned int | +| MPI_UNSIGNED_LONG | unsigned long int | +| MPI_UNSIGNED_LONG_LONG | unsigned long long int | +| MPI_FLOAT | float | +| MPI_DOUBLE | double | +| MPI_LONG_DOUBLE | long double | +| MPI_BYTE | char | + +Remember, these constants aren't the same as the primitive types in C, so we can't use them to create variables, e.g., + +```c +MPI_INT my_int = 1; +``` + +is not valid code because, under the hood, these constants are actually special data structures used by MPI. Therefore +we can only them as arguments in MPI functions. + +> ## Don't forget to update your types +> +> At some point during development, you might change an `int` to a `long` or a `float` to a `double`, or something to +> something else. Once you've gone through your codebase and updated the types for, e.g., variable declarations and +> function arguments, you must do the same for MPI functions. If you don't, you'll end up running into communication +> errors. It could be helpful to define compile-time constants for the data types and use those instead. If you ever do +> need to change the type, you would only have to do it in one place, e.g.: +> +> ```c +> /* define constants for your data types */ +> #define MPI_INT_TYPE MPI_INT +> #define INT_TYPE int +> /* use them as you would normally */ +> INT_TYPE my_int = 1; +> ``` +> +{: .callout} + +Derived data types, on the other hand, are very similar to C structures which we define by using the basic MPI data +types. They're often useful to group together similar data in communications, or when you need to send a structure from +one rank to another. This will be covered in more detail in the [Advanced Communication +Techniques](dirac-intro-to-mpi-advanced-communication) episode. + +> ## What type should you use? +> +> For the following pieces of data, what MPI data types should you use? +> +> 1. `a[] = {1, 2, 3, 4, 5};` +> 2. `a[] = {1.0, -2.5, 3.1456, 4591.223, 1e-10};` +> 3. `a[] = "Hello, world!";` +> +> > ## Solution +> > +> > The fact that `a[]` is an array does not matter, because all of the elemnts in `a[]` will be the same data type. In +> > MPI, as we'll see in the next episode, we can either send a single value or multiple values (in an array). > > +> > 1. `MPI_INT` +> > 2. `MPI_DOUBLE` - `MPI_FLOAT` would not be correct as `float`'s contain 32 bits of data whereas `double`'s +> > are 64 bit. +> > 3. `MPI_BYTE` or `MPI_CHAR` - you may want to use [strlen](https://man7.org/linux/man-pages/man3/strlen.3.html) to +> > calculate how many elements of `MPI_CHAR` being sent > {: .solution} {: .challenge} -### Communication modes +## Communicators -There are multiple "modes" on how data is sent in MPI: standard, buffered, synchronous and ready. When an MPI -communication function is called, control/execution of the program is passed from the calling program to the MPI -function. Your program won't continue until MPI is happy that the communication happened successfully. The difference -between the communication modes is the criteria of a successful communication. +All communication in MPI is handled by something known as a **communicator**. We can think of a communicator as being a +collection of ranks which are able to exchange data with one another. What this means is that every communication +between two (or more) ranks is linked to a specific communicator. When we run an MPI application, every rank will belong +to the default communicator known as `MPI_COMM_WORLD`. We've seen this in earlier episodes when, for example, we've used +functions like `MPI_Comm_rank()` to get the rank number, -To use the different modes, we don't pass a special flag. Instead, MPI uses different functions to separate the -different modes. The table below lists the four modes with a description and their associated functions (which will be -covered in detail in the following episodes). +```c +int my_rank; +MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* MPI_COMM_WORLD is the communicator the rank belongs to */ +``` -| Mode | Description | MPI Function | -| - | - | - | -| Synchronous | Returns control to the program when the message has been sent and received successfully | `MPI_Ssend()` | -| Buffered | Control is returned when the data has been copied into in the send buffer, regardless of the receive being completed or not | `MPI_Bsend()` | -| Standard | Either buffered or synchronous, depending on the size of the data being sent and what your specific MPI implementation chooses to use | `MPI_Send()` | -| Ready | Will only return control if the receiving rank is already listening for a message | `MPI_Rsend()` | +In addition to `MPI_COMM_WORLD`, we can make sub-communicators and distribute ranks into them. Messages can only be sent +and received to and from the same communicator, effectively isolating messages to a communicator. For most applications, +we usually don't need anything other than `MPI_COMM_WORLD`. But organising ranks into communicators can be helpful in +some circumstances, as you can create small "work units" of multiple ranks to dynamically schedule the workload, create +one communicator for each physical hardware node on a HPC cluster, or to help compartmentalise the problem into smaller +chunks by using a [virtual cartesian topology](https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node192.htm#Node192). +Throughout this course, we will stick to using `MPI_COMM_WORLD`. + +## Communication modes + +When sending data between ranks, MPI will use one of four communication modes: synchronous, buffered, ready or standard. +When a communication function is called, it takes control of program execution until the send-buffer is safe to be +re-used again. What this means is that it's safe to re-use the memory/variable you passed without affecting the data +that is still being sent. If MPI didn't have this concept of safety, then you could quite easily overwrite or destroy +any data before it is transferred fully! This would lead to some very strange behaviour which would be hard to debug. +The difference between the communication mode is when the buffer becomes safe to re-use. MPI won't guess at which mode +*should* be used. That is up to the programmer. Therefore each mode has an associated communication function: + +| Mode | Blocking function | +| ----------- | ----------------- | +| Synchronous | `MPI_SSend()` | +| Buffered | `MPI_Bsend()` | +| Ready | `MPI_Rsend()` | +| Send | `MPI_Send()` | In contrast to the four modes for sending data, receiving data only has one mode and therefore only a single function. -| Mode | Description | MPI Function | -| - | - | - | -| Receive | Returns control when data has been received successfully | `MPI_Recv()` | +| Mode | MPI Function | +| ------- | ------------ | +| Receive | `MPI_Recv()` | + +### Synchronous sends + +In synchronous communication, control is returned when the receiving rank has received the data and sent back, or +"posted", confirmation that the data has been received. It's like making a phone call. Data isn't exchanged until +you and the person have both picked up the phone, had your conversation and hung the phone up. + +Synchronous communication is typically used when you need to guarantee synchronisation, such as in iterative methods or +time dependent simulations where it is vital to ensure consistency. It's also the easiest communication mode to develop +and debug with because of its predictable behaviour. + +### Buffered sends -### Blocking vs. non-blocking communication +In a buffered send, the data is written to an internal buffer before it is sent and returns control back as soon as the +data is copied. This means `MPI_Bsend()` returns before the data has been received by the receiving rank, making this an +asynchronous type of communication as the sending rank can move onto its next task whilst the data is transmitted. This +is just like sending a letter or an e-mail to someone. You write your message, put it in an envelope and drop it off in +the postbox. You are blocked from doing other tasks whilst you write and send the letter, but as soon as it's in the +postbox, you carry on with other tasks and don't wait for the letter to be delivered! -Communication can also be done in two additional ways: blocking and non-blocking. In blocking mode, communication -functions will only return once the send buffer is ready to be re-used, meaning that the message has been both sent and -received. In terms of a blocking synchronous send, control will not be passed back to the program until the message sent -by rank A has reached rank B, and rank B has sent an acknowledgement back. If rank B is never listening for messages, -rank A will become *deadlocked*. A deadlock happens when your program hangs indefinitely because the send (or receive) -is unable to complete. Deadlocks occur for a countless number of reasons. For example, we may forget to write the -corresponding receive function when sending data. Alternatively, a function may return earlier due to an error which -isn't handled properly, or a while condition may never be met creating an infinite loop. Furthermore, ranks can -sometimes crash silently making communication with them impossible, but this doesn't stop any attempts to send data to -crashed rank. +Buffered sends are good for large messages and for improving the performance of your communication patterns by taking +advantage of the asynchronous nature of the data transfer. + +### Ready sends + +Ready sends are different to synchronous and buffered sends in that they need a rank to already be listening to receive +a message, whereas the other two modes can send their data before a rank is ready. It's a specialised type of +communication used **only** when you can guarantee that a rank will be ready to receive data. If this is not the case, +the outcome is undefined and will likely result in errors being introduced into your program. The main advantage of this +mode is that you eliminate the overhead of having to check that the data is ready to be sent, and so is often used in +performance critical situations. + +You can imagine a ready send as like talking to someone in the same room, who you think is listening. If they are +listening, then the data is transferred. If it turns out they're absorbed in something else and not listening to you, +then you may have to repeat yourself to make sure your transmit the information you wanted to! + +### Standard sends + +The standard send mode is the most commonly used type of send, as it provides a balance between ease of use and +performance. Under the hood, the standard send is either a buffered or a synchronous send, depending on the availability +of system resources (e.g. the size of the internal buffer) and which mode MPI has determined to be the most efficient. + +> ## Which mode should I use? +> +> Each communication mode has its own use cases where it excels. However, it is often easiest, at first, to use the +> standard send, `MPI_Send()`, and optimise later. If the standard send doesn't meet your requirements, or if you need +> more control over communication, then pick which communication mode suits your requirements best. You'll probably need +> to experiment to find the best! +{: .callout} + +> ## Communication mode summary +> +> | Mode | Description | Analogy | MPI Function | +> | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | ------------- | +> | Synchronous | Returns control to the program when the message has been sent and received successfully. | Making a phone call | `MPI_Ssend()` | +> | Buffered | Returns control immediately after copying the message to a buffer, regardless of whether the receive has happened or not. | Sending a letter or e-mail | `MPI_Bsend()` | +> | Ready | Returns control immediately, assuming the matching receive has already been posted. Can lead to errors if the receive is not ready. | Talking to someone you think/hope is listening | `MPI_Rsend()` | +> | Standard | Returns control when it's safe to reuse the send buffer. May or may not wait for the matching receive (synchronous mode), depending on MPI implementation and message size. | Phone call or letter | `MPI_Send()` | +{: .prereq} + +## Blocking and non-blocking communication + +In addition to the communication modes, communication is done in two ways: either by blocking execution +until the communication is complete (like how a synchronous send blocks until an receive acknowledgment is sent back), +or by returning immediately before any part of the communication has finished, with non-blocking communication. Just +like with the different communication modes, MPI doesn't decide if it should use blocking or non-blocking communication +calls. That is, again, up to the programmer to decide. As we'll see in later episodes, there are different functions +for blocking and non-blocking communication. + +A blocking synchronous send is one where the message has to be sent from rank A, received by B and an acknowledgment +sent back to A before the communication is complete and the function returns. In the non-blocking version, the function +returns immediately even before rank A has sent the message or rank B has received it. It is still synchronous, so rank +B still has to tell A that it has received the data. But, all of this happens in the background so other work can +continue in the foreground which data is transferred. It is then up to the programmer to check periodically if the +communication is done -- and to not modify/use the data/variable/memory before the communication has been completed. + +> ## Is `MPI_Bsend()` non-blocking? +> +> The buffered communication mode is a type of asynchronous communication, because the function returns before the data +> has been received by another rank. But, it's not a non-blocking call **unless** you use the non-blocking version +> `MPI_Ibsend()` (more on this later). Even though the data transfer happens in the background, allocating and copying +> data to the send buffer happens in the foreground, blocking execution of our program. On the other hand, +> `MPI_Ibsend()` is "fully" asynchronous because even allocating and copying data to the send buffer happens in the +> background. +{: .callout} + +One downside to blocking communication is that if rank B is never listening for messages, rank A will become +*deadlocked*. A deadlock happens when your program hangs indefinitely because the send (or receive) is unable to +complete. Deadlocks occur for a countless number of reasons. For example, we may forget to write the corresponding +receive function when sending data. Or a function may return earlier due to an error which isn't handled properly, or a +while condition may never be met creating an infinite loop. Ranks can also can silently, making communication with them +impossible, but this doesn't stop any attempts to send data to crashed rank. > ## Avoiding communication deadlocks > @@ -133,7 +307,7 @@ work. Non-blocking communication hands back control, immediately, before the communication has finished. Instead of your program being *blocked* by communication, ranks will immediately go back to the heavy work and instead periodically -check if there is data to receive (which you must remember to program) instead of waiting around. The advantage of this +check if there is data to receive (which is up to the programmer) instead of waiting around. The advantage of this communication pattern is illustrated in the diagram below, where less time is spent communicating. Non-blocking communication @@ -175,126 +349,3 @@ communication and calculation is often worth the more difficult implementation a > {: .challenge} -## Communicators - -Communication in MPI happens in something known as a *communicator*. We can think of a communicator as fundamentally -being a collection of ranks which are able to exchange data with one another. What this means is that every -communication between two (or more) ranks is linked to a specific communicator in the program. When we run an MPI -application, the ranks will belong to the default communicator known as `MPI_COMM_WORLD`. We've seen this in earlier -episodes when, for example, we've used functions like `MPI_Comm_rank()` to get the rank number, - -```c -int my_rank; -MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* MPI_COMM_WORLD is the communicator the rank belongs to */ -``` - -In addition to `MPI_COMM_WORLD`, we can make sub-communicators and distribute ranks into them. Messages can only be -sent and received to and from the same communicator, effectively isolating messages to a communicator. For most -applications, we usually don't need anything other than `MPI_COMM_WORLD`. But organising ranks into communicators can be -helpful in some circumstances, as you can create small "work units" of multiple ranks to dynamically schedule the -workload, or to help compartmentalise the problem into smaller chunks by using a [virtual cartesian -topology](https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node192.htm#Node192). Throughout this lesson, we will -stick to using `MPI_COMM_WORLD`. - - - -## Basic MPI data types - -To send a message, we need to know the size of it. The size is not the number of bytes of data that is being sent but is -instead expressed as the number of elements of a particular data type you want to send. So when we send a message, we -have to tell MPI how many elements of "something" we are sending and what type of data it is. If we don't do this -correctly, we'll either end up telling MPI to send only *some* of the data or try to send more data than we want! For -example, if we were sending an array and we specify too few elements, then only a subset of the array will be sent or -received. But if we specify too many elements, than we are likely to end up with either a segmentation fault or some -undefined behaviour! If we don't specify the correct data type, then bad things will happen under the hood when it -comes to communicating. - -There are two types of data type in MPI: basic MPI data types and derived data types. The basic data types are in -essence the same data types we would use in C (or Fortran), such as `int`, `float`, `bool` and so on. When defining the -data type of the elements being sent, we don't use the primitive C types. MPI instead uses a set of compile-time -constants which internally represents the data types. These data types are in the table below: - -| MPI basic data type | C equivalent | -| - | - | -| MPI_SHORT | short int | -| MPI_INT | int | -| MPI_LONG | long int | -| MPI_LONG_LONG | long long int | -| MPI_UNSIGNED_CHAR | unsigned char | -| MPI_UNSIGNED_SHORT |unsigned short int | -| MPI_UNSIGNED | unsigned int | -| MPI_UNSIGNED_LONG | unsigned long int | -| MPI_UNSIGNED_LONG_LONG | unsigned long long int | -| MPI_FLOAT | float | -| MPI_DOUBLE | double | -| MPI_LONG_DOUBLE | long double | -| MPI_BYTE | char | - -These constants don't expand out to actual date types, so we can't use them in variable declarations, e.g., - -```c -MPI_INT my_int; -``` - -is not valid code because under the hood, these constants are actually special structs used internally. Therefore we can -only uses these expressions as arguments in MPI functions. - -> ## Don't forget to update your types -> -> At some point during development, you might change an `int` to a `long` or a `float` to a `double`, or something to -> something else. Once you've gone through your codebase and updated the types for, e.g., variable declarations and -> function signatures, you must also do the same for MPI functions. If you don't, you'll end up running into -> communication errors. It may be helpful to define compile-time constants for the data types and use those instead. If -> you ever do need to change the type, you would only have to do it in one place. -> -> ```c -> /* define constants for your data types */ -> #define MPI_INT_TYPE MPI_INT -> #define INT_TYPE int -> /* use them as you would normally */ -> INT_TYPE my_int = 1; -> ``` -> -{: .callout} - -Derived data types are data structures which you define, built using the basic MPI data types. These derived types are -analogous to defining structures or type definitions in C. They're most often helpful to group together similar data to -send/receive multiple things in a single communication, or when you need to communicate non-contiguous data such as -"vectors" or sub-sets of an array. This will be covered in the [Advanced Communication -Techniques](dirac-intro-to-mpi-advanced-communication) episode. - -> ## What type should you use? -> -> For the following pieces of data, what MPI data types should you use? -> -> 1. `a[] = {1, 2, 3, 4, 5};` -> 2. `a[] = {1.0, -2.5, 3.1456, 4591.223, 1e-10};` -> 3. `a[] = "Hello, world!";` -> -> > ## Solution -> > -> > 1. `MPI_INT` -> > 2. `MPI_DOUBLE` - `MPI_FLOAT` would not be correct as `float`'s contain 32 bits of data whereas `double`'s -> > are 64 bit. -> > 3. `MPI_BYTE` or `MPI_CHAR` - you may want to use [strlen](https://man7.org/linux/man-pages/man3/strlen.3.html) to -> > calculate how many elements of `MPI_CHAR` being sent -> {: .solution} -{: .challenge} diff --git a/_episodes/05-collective-communication.md b/_episodes/05-collective-communication.md index eecafe1..bd1f10d 100644 --- a/_episodes/05-collective-communication.md +++ b/_episodes/05-collective-communication.md @@ -63,8 +63,11 @@ int main(int argc, char** argv) { For it's use case, the code above works perfectly fine. However, it isn't very efficient when you need to communicate large amounts of data, have lots of ranks, or when the workload is uneven (due to the blocking communication). It's also -a lot of code to do not much, which makes it easy to introduce mistakes in our code. A common mistake in this example -would be to start the loop over ranks from 0, which would cause a deadlock! +a lot of code to not do very much, which makes it easy to introduce mistakes in our code. Another mistake in this example + +would be to start the loop over ranks from 0, which would cause a deadlock! It's actually quite a common mistake for new +MPI users to write something like the above. + We don't need to write code like this (unless we want *complete* control over the data communication), because MPI has access to collective communication functions to abstract all of this code for us. The above code can be replaced by a