From 8bb9cc8c0c711f636b32a0c3f3e3105cab60c1b3 Mon Sep 17 00:00:00 2001 From: Lu Weizheng Date: Mon, 12 Feb 2024 17:35:01 +0800 Subject: [PATCH] mpi --- .gitignore | 1 - _toc.yml | 15 +- ch-dask-dataframe/read-write.ipynb | 21 +- ch-mpi/broadcast.py | 10 +- ch-mpi/collective.ipynb | 69 +++--- ch-mpi/index.md | 2 +- ch-mpi/master-worker.py | 6 +- ch-mpi/mpi-hello-world.ipynb | 63 ++++-- ch-mpi/mpi-intro.md | 50 ++-- ch-mpi/point-to-point.ipynb | 121 +++++----- ch-mpi/rectangle-pi.py | 6 +- ch-mpi/remote-memory-access.ipynb | 44 ++-- ch-mpi/send-np.py | 6 +- conf.py | 16 +- .../parquet-row-group.drawio | 151 ------------ drawio/ch-dask-dataframe/parquet.drawio | 133 +++++++++++ drawio/ch-intro/distributed-timeline.drawio | 214 ++++++++++++++++++ drawio/ch-mpi/blocking.drawio | 30 +-- drawio/ch-mpi/communications.drawio | 8 +- drawio/ch-mpi/non-blocking.drawio | 36 +-- img/ch-dask-dataframe/parquet-row-group.svg | 4 - img/ch-dask-dataframe/parquet.svg | 4 + img/ch-intro/distributed-timeline.svg | 4 + img/ch-mpi/blocking.svg | 2 +- img/ch-mpi/communications.svg | 2 +- img/ch-mpi/non-blocking.svg | 2 +- 26 files changed, 630 insertions(+), 390 deletions(-) delete mode 100644 drawio/ch-dask-dataframe/parquet-row-group.drawio create mode 100644 drawio/ch-dask-dataframe/parquet.drawio create mode 100644 drawio/ch-intro/distributed-timeline.drawio delete mode 100644 img/ch-dask-dataframe/parquet-row-group.svg create mode 100644 img/ch-dask-dataframe/parquet.svg create mode 100644 img/ch-intro/distributed-timeline.svg diff --git a/.gitignore b/.gitignore index 6d7f471..f35cbdf 100644 --- a/.gitignore +++ b/.gitignore @@ -7,7 +7,6 @@ data/ *.DS_Store *.csv *egg-info* -dist* _build/* test*.md .idea diff --git a/_toc.yml b/_toc.yml index 863f17e..7329a4b 100644 --- a/_toc.yml +++ b/_toc.yml @@ -22,7 +22,6 @@ subtrees: - file: ch-dask/task-graph-partitioning - file: ch-dask-dataframe/index entries: - - file: ch-dask-dataframe/dask-pandas - file: ch-dask-dataframe/read-write # - file: ch-ray-core/index # entries: @@ -37,13 +36,13 @@ subtrees: # - file: ch-ray-data/data-load-inspect-save # - file: ch-ray-data/data-transform # - file: ch-ray-data/preprocessor - # - file: ch-mpi/index - # entries: - # - file: ch-mpi/mpi-intro - # - file: ch-mpi/mpi-hello-world - # - file: ch-mpi/point-to-point - # - file: ch-mpi/collective - # - file: ch-mpi/remote-memory-access + - file: ch-mpi/index + entries: + - file: ch-mpi/mpi-intro + - file: ch-mpi/mpi-hello-world + - file: ch-mpi/point-to-point + - file: ch-mpi/collective + - file: ch-mpi/remote-memory-access # - file: ch-mpi-large-model/index # entries: # - file: ch-mpi-large-model/nccl diff --git a/ch-dask-dataframe/read-write.ipynb b/ch-dask-dataframe/read-write.ipynb index e9a3cf4..6412595 100644 --- a/ch-dask-dataframe/read-write.ipynb +++ b/ch-dask-dataframe/read-write.ipynb @@ -1257,7 +1257,16 @@ "* Embedded schema\n", "* Data compression\n", "\n", - "Columnar storage organizes data by columns instead of rows. Specifically, unlike CSV, which stores data by rows, Parquet stores data via columns. Data analysts are usually interested in specific columns rather than all of them. Parquet allows convenient filtering of unnecessary columns during data reading, resulting in improved performance by reducing the amount of data to be read. Parquet is extensively used in the Apache Spark, Apache Hive, and Apache Flink ecosystems. Parquet inherently includes schema information, embedding metadata such as column names and data types for each column within each Parquet file. This feature eliminates inaccuracies in data types inferring of Dask DataFrame. Data in Parquet is compressed, making it more space-efficient compared to CSV.\n", + "As shown in {numref}`parquet-with-row-group, columnar storage organizes data by columns instead of rows. Data analysts are usually interested in specific columns rather than all of them. Parquet allows filtering of unnecessary columns during data reading, resulting in enhanced performance by reducing the amount of data to be read. Parquet is extensively used in the Apache Spark, Apache Hive, and Apache Flink ecosystems. Parquet inherently includes schema information, embedding metadata such as column names and data types for each column within each Parquet file. This feature eliminates inaccuracies in data types inferring of Dask DataFrame. Data in Parquet is compressed, making it more space-efficient compared to CSV.\n", + "\n", + "\n", + "```{figure} ../img/ch-dask-dataframe/parquet.svg\n", + "---\n", + "width: 600px\n", + "name: parquet-with-row-group\n", + "---\n", + "A Parquet file consists of at least one Row Group. The data within a Row Group is further divided into Column Chunks, each dedicated to storing a specific column.\n", + "```\n", "\n", "For instance, it is advisable to read only the necessary columns rather than all of them:\n", "\n", @@ -1268,17 +1277,9 @@ ")\n", "```\n", "\n", - "Additionally, Parquet introduces the concept of Row Groups, as illustrated in {numref}`parquet-row-group`. Data in a Parquet file is grouped into Row Groups, defining the number of rows within each group. In the example, there are three Row Groups, each containing two rows of data. Each Row Group stores metadata such as maximum and minimum values of their columns. When querying certain columns, metadata helps determine whether to read a specific Row Group, minimizing unnecessary data retrieval.\n", + "Additionally, Parquet introduces the concept of Row Groups, as illustrated in {numref}`parquet-with-row-group`. Data in a Parquet file is grouped into Row Groups, defining the number of rows within each group. In the example, there are three Row Groups, each containing two rows of data. Each Row Group stores metadata such as maximum and minimum values of their columns. When querying certain columns, metadata helps determine whether to read a specific Row Group, minimizing unnecessary data retrieval.\n", "For example, if a column represents a time series, and a query involves \"sales from 9:00 to 12:00 every day,\" the metadata within the Row Group records the maximum and minimum values of the time column. By utilizing this metadata, it becomes possible to determine whether it is necessary to read the specific Row Group.\n", "\n", - "```{figure} ../img/ch-dask-dataframe/parquet-row-group.svg\n", - "---\n", - "width: 600px\n", - "name: parquet-row-group\n", - "---\n", - "Parquet Columnar Storage with Row Groups\n", - "```\n", - "\n", "With Row Groups, a Parquet file may contain multiple groups. However, many Parquet files have only one Row Group.\n", "\n", "Typically, enterprise data files are split into multiple Parquet files and organized in a specific manner, such as by time:\n", diff --git a/ch-mpi/broadcast.py b/ch-mpi/broadcast.py index c96c91d..a613af0 100644 --- a/ch-mpi/broadcast.py +++ b/ch-mpi/broadcast.py @@ -7,12 +7,12 @@ N = 5 if comm.rank == 0: - A = np.arange(N, dtype=np.float64) # rank 0 初始化数据到变量 A + A = np.arange(N, dtype=np.float64) # rank 0 initializes data into variable A else: - A = np.empty(N, dtype=np.float64) # 其他节点的变量 A 为空 + A = np.empty(N, dtype=np.float64) # As on other processes are empty -# 广播 +# Broadcast comm.Bcast([A, MPI.DOUBLE]) -# 验证所有节点上的 A -print("Rank: %d, data: %s" % (comm.rank, A)) \ No newline at end of file +# Print to verify +print("Rank:%2d, data:%s" % (comm.rank, A)) \ No newline at end of file diff --git a/ch-mpi/collective.ipynb b/ch-mpi/collective.ipynb index 9751f1f..64c41c1 100644 --- a/ch-mpi/collective.ipynb +++ b/ch-mpi/collective.ipynb @@ -5,27 +5,27 @@ "metadata": {}, "source": [ "(mpi-collective)=\n", - "# 集合通信\n", + "# Collective Communication\n", "\n", - "{ref}`mpi-point2point` 介绍了点对点通信,即发送方和接收方之间相互传输数据。本节主要介绍一种全局的通信方式:集合通信,即在一个组里的多个进程间同时传输数据。集合通信目前只有阻塞的方式。\n", + "In the {ref}`mpi-point2point` section, we discussed point-to-point communication, which involves the mutual transfer of data between sender and receiver. This section focuses on a different type of communication – collective communication, where data is simultaneously transmitted among multiple processes within a group. Collective communication only supports blocking modes.\n", "\n", - "常用的集合通信主要有以下几类:\n", + "The commonly used collective communications include:\n", "\n", - "* 同步,比如 [`Comm.Barrier`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Barrier)。\n", - "* 数据移动,比如 [`Comm.Bcast`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Bcast), [`Comm.Scatter`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Scatter),[`Comm.Gather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Gather)等。\n", - "* 集合计算,比如 [`Comm.Reduce`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Reduce), [`Intracomm.Scan`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Intracomm.html#mpi4py.MPI.Intracomm.Scan) 等。\n", + "* Synchronization: For example, [`Comm.Barrier`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Barrier).\n", + "* Data Movement: Such as [`Comm.Bcast`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Bcast), [`Comm.Scatter`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Scatter), [`Comm.Gather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Gather), etc.\n", + "* Collective Computation: Including [`Comm.Reduce`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Reduce), [`Intracomm.Scan`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Intracomm.html#mpi4py.MPI.Intracomm.Scan), etc.\n", "\n", - "首字母大写的函数是基于缓存的,比如 `Comm.Bcast`, `Comm.Scatter`,`Comm.Gather`,[`Comm.Allgather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Allgather), [`Comm.Alltoall`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Alltoall)。首字母小写的函数可以收发 Python 对象,比如 [`Comm.bcast`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.bcast),[`Comm.scatter`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.scatter),[`Comm.gather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.gather),[`Comm.allgather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.allgather), [`Comm.alltoall`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.alltoall)。\n", + "Functions with uppercase initial letters are based on buffers, such as `Comm.Bcast`, `Comm.Scatter`, `Comm.Gather`, [`Comm.Allgather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Allgather), [`Comm.Alltoall`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Alltoall). Functions with lowercase initial letters can send and receive Python objects, such as [`Comm.bcast`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.bcast), [`Comm.scatter`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.scatter), [`Comm.gather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.gather), [`Comm.allgather`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.allgather), [`Comm.alltoall`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.alltoall).\n", "\n", - "## 同步\n", + "## Synchronization\n", "\n", - "MPI 的计算分布在多个进程,每个进程的计算速度有快有慢。`Comm.Barrier` 对 Communicator 里所有进程都执行同步等待,就像 Barrier 的英文名一样,设置一个屏障,等待所有进程都执行完。计算比较快的进程们达到 `Comm.Barrier`,无法执行 `Comm.Barrier` 之后的计算逻辑,必须等待其他所有进程都到达这里才行。\n", + "MPI computations are distributed across multiple processes, each with varying computational speeds. `Comm.Barrier` forces synchronization, as the name suggests, by setting up a barrier for all processes within the communicator. Faster processes reaching `Comm.Barrier` cannot proceed with the subsequent code logic until all other processes have also reached this point. It acts as a synchronization point, ensuring that all processes complete their computations before moving forward.\n", "\n", - "## 数据移动\n", + "## Data Movement\n", "\n", - "### 广播\n", + "### Broadcast\n", "\n", - "`Comm.Bcast` 将数据从一个发送者全局广播给组里所有进程,广播操作适用于需要将同一份数据发送给所有进程的场景,例如将一个全局变量的值发送给所有进程,如 {numref}`mpi-broadcast` 所示。\n", + "`Comm.Bcast` globally broadcasts data from one sender to all processes within the group. Broadcast operations are useful in scenarios where the same data needs to be sent to all processes, such as broadcasting the value of a global variable to all processes, as illustrated in {numref}`mpi-broadcast`.\n", "\n", "```{figure} ../img/ch-mpi/broadcast.svg\n", "---\n", @@ -34,9 +34,10 @@ "---\n", "Broadcast\n", "```\n", - "### 案例1:广播\n", "\n", - "{numref}`mpi-broadcast-py` 对如何将一个 NumPy 数组广播到所有的进程中进行了演示\n", + "### Example 1: Broadcast\n", + "\n", + "The example in {numref}`mpi-broadcast-py` demonstrates how to broadcast a NumPy array to all processes.\n", "\n", "```{code-block} python\n", ":caption: broadcast.py\n", @@ -51,14 +52,14 @@ "\n", "N = 5\n", "if comm.rank == 0:\n", - " A = np.arange(N, dtype=np.float64) # rank 0 初始化数据到变量 A\n", + " A = np.arange(N, dtype=np.float64) # rank 0 initializes data into variable A\n", "else:\n", - " A = np.empty(N, dtype=np.float64) # 其他节点的变量 A 为空\n", + " A = np.empty(N, dtype=np.float64) # As on other processes are empty\n", "\n", - "# 广播\n", + "# Broadcast\n", "comm.Bcast([A, MPI.DOUBLE])\n", "\n", - "# 验证所有节点上的 A\n", + "# Print to verify\n", "print(\"Rank:%2d, data:%s\" % (comm.rank, A))\n", "```" ] @@ -87,24 +88,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Scatter 和 Gather\n", + "### Scatter and Gather\n", + "\n", + "`Comm.Scatter` and `Comm.Gather` are a pair of corresponding operations:\n", "\n", - "`Comm.Scatter` 和 `Comm.Gather` 是一组相对应的操作,如\n", + "* `Comm.Scatter` scatters data from one process to all processes within the group. A process divides the data into multiple chunks, with each chunk sent to the corresponding process. Other processes receive and store their respective chunks. Scatter operations are suitable for partitioning a larger dataset into multiple smaller chunks.\n", "\n", - "`Comm.Scatter` 将数据从一个进程分散到组中的所有进程,一个进程将数据分散成多个块,每个块发送给对应的进程。其他进程接收并存储各自的块。Scatter 操作适用于将一个较大的数据集分割成多个小块。\n", - "`Comm.Gather` 与 `Comm.Scatter` 相反,将组里所有进程的小数据块归集到一个进程上。\n", + "* `Comm.Gather`, on the contrary, gathers small data chunks from all processes in the group to one process.\n", "\n", "```{figure} ../img/ch-mpi/scatter-gather.svg\n", "---\n", "width: 600px\n", "name: mpi-scatter-gather\n", "---\n", - "Scatter 与 Gather\n", + "Scatter and Gather\n", "```\n", "\n", - "### 案例2:Scatter\n", + "### Example 2: Scatter\n", "\n", - "{numref}`mpi-scatter` 演示了如何使用 Scatter 将数据分散到所有进程。\n", + "The example in {numref}`mpi-scatter` demonstrates how to use Scatter to distribute data to all processes.\n", "\n", "```{code-block} python\n", ":caption: scatter.py\n", @@ -158,11 +160,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Allgather 和 Alltoall\n", + "### Allgather and Alltoall\n", "\n", - "另外两个比较复杂的操作是 `Comm.Allgather` 和 `Comm.Alltoall`。\n", + "Two more complex operations are `Comm.Allgather` and `Comm.Alltoall`.\n", "\n", - "`Comm.Allgather` 是 `Comm.Gather` 的进阶版,如 {numref}`mpi-allgather` 所示,它把散落在多个进程的多个小数据块发送给每个进程,每个进程都包含了一份相同的数据。\n", + "`Comm.Allgather` is an advanced version of `Comm.Gather`, as shown in {numref}`mpi-allgather`. It sends multiple small data chunks scattered across various processes to every process, ensuring that each process contains an identical set of data.\n", "\n", "```{figure} ../img/ch-mpi/allgather.svg\n", "---\n", @@ -172,7 +174,8 @@ "Allgather\n", "```\n", "\n", - "`Comm.Alltoall` 是 `Comm.Scatter` 的 `Comm.Gather` 组合,如 {numref}`mpi-alltoall` 所示,先进行 `Comm.Scatter`,再进行 `Comm.Gather`。如果把数据看成一个矩阵,`Comm.Alltoall` 又可以被看做是一种全局的转置(Transpose)操作。\n", + "\n", + "`Comm.Alltoall` is a combination of `Comm.Scatter` and `Comm.Gather`, as illustrated in {numref}`mpi-alltoall`. It first performs `Comm.Scatter` and then follows with `Comm.Gather`. If the data is viewed as a matrix, `Comm.Alltoall` can be considered a global transpose operation.\n", "\n", "```{figure} ../img/ch-mpi/alltoall.svg\n", "---\n", @@ -182,13 +185,13 @@ "Alltoall\n", "```\n", "\n", - "## 集合计算\n", + "## Collective Computation\n", "\n", - "集合计算指的是在将散落在不同进程的数据聚合在一起的时候,对数据进行计算,比如 `Comm.Reduce` 和 [`Intracomm`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Intracomm.html) 等。如 {numref}`mpi-reduce` 和 {numref}`mpi-scan` 所示,数据归集到某个进程时,还执行了聚合函数 `f`,常用的聚合函数有求和 [`MPI.SUM`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.SUM.html) 等。\n", + "Collective computation refers to performing computations on data when aggregating scattered data from different processes, such as `Comm.Reduce` and [`Intracomm`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Intracomm.html). As shown in {numref}`mpi-reduce` and {numref}`mpi-scan`, when data is gathered to a specific process, an aggregation function `f` is applied. Common aggregation functions include summation [`MPI.SUM`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.SUM.html) and others.\n", "\n", "```{figure} ../img/ch-mpi/reduce.svg\n", "---\n", - "width: 600px\n", + "width: 500px\n", "name: mpi-reduce\n", "---\n", "Reduce\n", @@ -196,7 +199,7 @@ "\n", "```{figure} ../img/ch-mpi/scan.svg\n", "---\n", - "width: 600px\n", + "width: 500px\n", "name: mpi-scan\n", "---\n", "Scan\n", diff --git a/ch-mpi/index.md b/ch-mpi/index.md index aa402c5..9fe0656 100644 --- a/ch-mpi/index.md +++ b/ch-mpi/index.md @@ -1,4 +1,4 @@ -# MPI +# MPI for Python ```{tableofcontents} ``` \ No newline at end of file diff --git a/ch-mpi/master-worker.py b/ch-mpi/master-worker.py index d5346fd..e1a0d93 100644 --- a/ch-mpi/master-worker.py +++ b/ch-mpi/master-worker.py @@ -6,15 +6,15 @@ size = comm.Get_size() if rank < size - 1: - # Worker 进程 + # Worker process np.random.seed(rank) - # 随机生成 + # Generate random data data_count = np.random.randint(100) data = np.random.randint(100, size=data_count) comm.send(data, dest=size - 1) print(f"Worker: worker ID: {rank}; count: {len(data)}") else: - # Master 进程 + # Master process for i in range(size - 1): status = MPI.Status() data = comm.recv(source=MPI.ANY_SOURCE, status=status) diff --git a/ch-mpi/mpi-hello-world.ipynb b/ch-mpi/mpi-hello-world.ipynb index 5b6fd4e..2a029be 100644 --- a/ch-mpi/mpi-hello-world.ipynb +++ b/ch-mpi/mpi-hello-world.ipynb @@ -7,31 +7,43 @@ "(mpi-hello-world)=\n", "# MPI Hello World\n", "\n", - "## 通信模式\n", + "## Communication Models\n", "\n", - "MPI 提供的能力更加底层,对于通信模式,通常有两种:双边和单边。\n", + "There are two types of communication models: one-sided and two-sided.\n", "\n", - "* 双边(Cooperative):通信双方均同意数据的收发。发送进程调用发送函数,接收进程调用接收函数。\n", - "* 单边(One-sided):一方远程读或者写另一方的数据,一方无需等待另一方。\n", + "* One-Sided: One party can remotely read or write data on the other party without requiring the other party.\n", + "\n", + "* Two-Sided. Both parties agree to exchange data. The sending process invokes the send function, and the receiving process invokes the receive function.\n", "\n", "```{figure} ../img/ch-mpi/communications.svg\n", "---\n", "width: 600px\n", "name: mpi-communications\n", "---\n", - "两种通讯模式:双边和单边\n", - "```\n", - "\n", - "## World 和 Rank\n", + "Two-Sided v.s. One-Sided\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## World and Rank\n", "\n", - "在进行 MPI 编程时,进程之间要互相通信,我们首先要解决两个问题:在 MPI 的众多进程中,“我是谁?”,除了我,“还有谁?”。MPI 标准中, [`MPI_Comm_rank`](https://learn.microsoft.com/en-us/message-passing-interface/mpi-comm-rank-function) 定义了我是谁。[`MPI_COMM_WORLD`](https://learn.microsoft.com/en-us/message-passing-interface/mpi-comm-size-function) 定义了还有谁的问题。 开始一个 MPI 程序时,要先定义一个 World,这个世界里有 `size` 个进程,每个进程有一个识别自己的号码,这个号码被称为 Rank,Rank 是 0 到 `size - 1` 的整数。更严肃地阐述:\n", + "In MPI programming, when processes need to communicate with each other, two fundamental questions must be addressed: \"Who am I in the MPI world?\" and \"Who else is in the MPI world?\" In the MPI standard, [`MPI_Comm_rank`](https://learn.microsoft.com/en-us/message-passing-interface/mpi-comm-rank-function) defines \"Who am I?\" and [`MPI_COMM_WORLD`](https://learn.microsoft.com/en-us/message-passing-interface/mpi-comm-size-function) answers \"Who else?\" When starting an MPI program, you should first define a World. This world consists of `size` processes, each assigned a unique number known as rank. Ranks are integers ranging from 0 to `size` - 1. Formally:\n", "\n", - "* MPI 中的 World 是指所有参与并行计算的进程的总集合。在一个 MPI 程序中,所有的进程都属于一个默认的通信域,这个通信域就被称为 `MPI_COMM_WORLD`。所有在这个通信域中的进程都可以进行通信。\n", - "* World 中的每个进程都有一个唯一的 Rank,Rank 用来标识进程在通信域中的位置。由于每个进程有自己的 Rank 号码,那在编程时,可以控制,使得 Rank 为 0 的进程发送数据给 Rank 为 1 的进程。\n", + "* In MPI, the World refers to the total set of processes involved in parallel computation. In an MPI program, all processes belong to a default communication group known as `MPI_COMM_WORLD`. All processes within this communication group can communicate with each other.\n", "\n", - "## 案例:Hello World\n", + "* Each process in the World has a unique rank, which is used to identify the process in the communication group. As each process has its own rank number, we can program the process with rank 0 to send data to the process with rank 1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example: Hello World\n", "\n", - "{numref}`mpi-hello` 使用一个简单的例子来演示 MPI 编程。\n", + "{numref}`mpi-hello` uses a simple example to demonstrate MPI programming.\n", "\n", "```{code-block} python\n", ":caption: hello.py\n", @@ -46,9 +58,9 @@ "comm.Barrier()\n", "```\n", "\n", - "在这段程序中,`print` 是上单个进程内执行的,打印出当前进程的 Rank 和主机名。[`comm.Barrier()`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Barrier) 将每个进程做了阻断,直到所有进程执行完毕后,才会进行下面的操作。本例中,`comm.Barrier()` 后无其他操作,程序将退出。\n", + "In this program, the `print` statement is executed within each individual process, displaying the rank and hostname of the current process. [`comm.Barrier()`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.Barrier) halts each process until all processes have completed their execution, then proceeds. In this example, after `comm.Barrier()`, there are no further operations, and the program exits.\n", "\n", - "如果在个人电脑上,启动 8 个进程,在命令行中执行:" + "If you run 8 processes on your personal computer, execute the following command in the terminal:" ] }, { @@ -79,32 +91,37 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "不同厂商的 `mpiexec` 的参数会有一些区别。相比 C/C++ 或 Fortran,mpi4py 的方便之处在于不需要使用 `mpicc` 编译器进行编译,直接执行即可。\n", + "`mpiexec` of vendors may be slightly different. You can check the parameters of `mpiexec` from vendors' documentation.\n", "\n", - "如果有一个集群,且集群挂载了一个共享的文件系统,即集群上每个节点上的特定目录的内容是一样的,`hello.py` 和 `mpiexec` 是一模一样的。可以这样拉起:\n", + "If you have a cluster with a shared file system mounted on each node, the content in the `hello.py` folder on each node is identical. You can launch it as follows:\n", "\n", "```bash\n", "mpiexec –hosts h1:4,h2:4,h3:4,h4:4 –n 16 python hello.py\n", "```\n", "\n", - "这个启动命令一共在 16 个进程上执行,16 个进程分布在 4 个计算节点上,每个节点使用了 4 个进程。如果节点比较多,还可以单独编写一个节点信息文件,比如命名为 `hf`,内容为:\n", + "This launch command is executed with 16 processes, distributed across 4 computing nodes, each of which starts 4 processes. If there are more nodes, you can create a separate node file, for instance, named `hf`, with the following content:\n", "\n", "```\n", "h1:8\n", "h2:8\n", "```\n", "\n", - "这样执行:\n", + "And launch it:\n", "\n", "```\n", "mpiexec –hostfile hf –n 16 python hello.py\n", - "```\n", - "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "## Communicator\n", "\n", - "刚才我们提到了 World 的概念,并使用了 `MPI_COMM_WORLD`,更准确地说,`MPI_COMM_WORLD` 是一个 Communicator。MPI 将进程划分到不同的组(Group)中,每个 Group 有不同的 Color,Group 和 Color 共同组成了 Communicator,或者说 Communicator 是 Group + Color 的名字,一个默认的 Communicator 就是 `MPI_COMM_WORLD`。\n", + "We mentioned the concept of `MPI_COMM_WORLD`. More accurately, `MPI_COMM_WORLD` is a communicator. MPI divides processes into different Groups, and each Group has a different Color. The combination of Group and Color forms a Communicator. In other words, a communicator is the name of Group + Color. The predefined communicator is `MPI_COMM_WORLD`.\n", "\n", - "对于一个进程,它可能在不同的 Communicator 中,因此它在不同 Communicator 中的 Rank 可能也不一样。{numref}`mpi-communicatitor` (a) 、(b) 和(c)为三个 Communicator,圆圈为进程。当我们启动一个 MPI 程序时,就创建了默认的 Communicator(`MPI_COMM_WORLD`),如 {numref}`mpi-communicatitor` (a) 所示。每个进程在 Communicator 内被分配了一个 Rank 号码,图中圆圈上的数字是进程在这个 Communicator 中的 Rank。同样的进程可以被划归到不同的 Communicator 中,且相同的进程在不同 Communicator 的 Rank 可以不一样,如 {numref}`mpi-communicatitor` (b)和(c)所示。每个 Communicator 内进程通信是相互独立的。大部分简单的程序,`MPI_COMM_WORLD` 就足够了。\n", + "A process may belong to different communicators, so its rank in different communicators may differ. {numref}`mpi-communicatitor` (a), (b), and (c) represent three communicators, with circles representing processes. When we start an MPI program, the default communicator (`MPI_COMM_WORLD`) is created, as shown in {numref}`mpi-communicatitor` (a). Each process is assigned a rank number within this communicator, and the numbers on the circles in the diagram represent the rank of processes in this communicator. The same process can be assigned to different communicators, and the rank of the same process in different communicators may be different, as illustrated in {numref}`mpi-communicatitor` (b) and (c). Communication between processes within each communicator is independent. Messages inf one communicator will not affect messages in another. For most MPI programs, there is no need to create other communicators, the predefined `MPI_COMM_WORLD` is sufficient.\n", "\n", "```{figure} ../img/ch-mpi/communicator.svg\n", "---\n", diff --git a/ch-mpi/mpi-intro.md b/ch-mpi/mpi-intro.md index 87f2727..9473e19 100644 --- a/ch-mpi/mpi-intro.md +++ b/ch-mpi/mpi-intro.md @@ -1,57 +1,55 @@ (mpi-intro)= -# MPI 简介 +# Introduction to MPI -Message Passing Interface(MPI)是个经典的并行计算工具,由于它的“年龄”比较老,新一代程序员很少听说过这个“老古董”,也经常忽视其重要性。但随着人工智能大模型浪潮的到来,MPI 或者基于 MPI 思想的各类通讯库再次回到人们的视线内,因为大模型必须使用并行计算框架进行跨机通信。比如,大模型训练框架 deepspeed 就使用了 mpi4py 进行多机通信。 +Message Passing Interface (MPI) is a classic parallel computing tool. Due to its age, it may be unfamiliar to new generations of programmers who often overlook its importance. However, with the advent of large neural models, MPI or communication libraries based on the MPI philosophy have regained attention. This is because large models necessitate parallel computing frameworks for multi-machine communication. For instance, the [DeepSpeed](https://github.com/microsoft/DeepSpeed) framework utilizes mpi4py for multi-node communication. -## 历史 +## History -MPI 的发展可以追溯到20世纪80年代末和90年代初,彼时已经开始出现了超级计算机,主要用于科学和工程计算,包括气象模拟、核能研究、分子建模、流体动力学等领域。与现在的大数据集群一样,超级计算机主要是一组高性能计算机组成的集群。为了完成上述科学和工程计算问题,使得程序得以在多台计算机上并行运行。MPI 出现之前,多个研究小组和机构开始独立开发并推广自己的通信库,但这导致了互操作性和可移植性的问题。因此,社区迫切需要一种标准化的方法来编写并行应用程序。 +The development of MPI dates back to the late 1980s and early 1990s, with the emergence of supercomputers primarily used for scientific and engineering computations in fields such as weather forecasting, nuclear research, molecular dynamics, etc. Before MPI, to enable programs running in parallel, various research groups and organizations independently developed their communication libraries. However, this led to interoperability and portability issues. Consequently, there was a pressing need for a standardized approach to writing parallel applications. -1992年,图灵奖得主 Jack Dongarra 联合几位学者提出了并行计算第一个草案:MPI1。第一个标准版本 MPI 1.0 最终于 1994 年发布。之后,MPI-2、MPI-3 接连发布,来自学术界和工业界的多位专家共同参与修订,不断根据最新的并行计算需求修改 MPI 标准。 +In 1992, Jack Dongarra, who won the Turing Award, along with other scholars, proposed the first draft of parallel computing: MPI1. The initial standard version MPI 1.0 was eventually released in 1994. Subsequently, MPI-2 and MPI-3 were released, with contributions from academia and industry. -## 标准与实现 +## Standard and Implementations -MPI 是一个标准,不是一个编译器或者编程语言,也不是一个具体的实现或者产品。像 Dask、Ray 这样的框架是一个具体的实现,而 MPI 不一样,MPI 是一个标准,不同厂商在这个标准下可以有自己的实现。“标准”的意思是说,MPI 定义了一些标准的函数或方法,所有的厂商都需要遵循;“实现”是说,不同软硬件厂商可以根据标准去实现底层通信。比如,如果实现一个发送数据的需求,MPI 标准中定义了 `MPI_Send` 方法,所有厂商应遵循这个标准。 +MPI is a standard, not a compiler or programming language, nor is it a specific implementation or product. Frameworks like Dask and Ray are specific implementations, whereas MPI is different; it is a standard, and different vendors can have their implementations under this standard. "Standard" means that MPI defines standardized functions or methods that all vendors must follow, while "implementation" means that different software and hardware vendors can implement low-level communication based on the standard. For example, if there is a need to implement data sending, the MPI standard defines the `MPI_Send` method that all vendors must adhere to. -MPI 标准定义了: +The MPI standard defines: -* 每个函数的函数名、参数列表; -* 每个函数的语义,或者说每个函数的预期功能,又或者说每个函数能做什么、不能做什么。 +* The function name and parameter list for each function. +* The semantics of each function, indicating the expected functionality, what each function can or cannot do. -在具体实现上,现在常用的有 Open MPI 、MPICH、Intel MPI、Microsoft MPI 和 NVIDIA HPC-X 等。由于 MPI 是标准,因此,同样一份代码,可以被 OpenMPI 编译,也可以被 Intel MPI 编译。每个实现是由特定的厂商或开源社区开发的,因此使用起来也有一些差异。 +Commonly used implementations include Open MPI, MPICH, Intel MPI, Microsoft MPI, and NVIDIA HPC-X. Since MPI is a standard, the same code can be compiled by OpenMPI or Intel MPI. Each implementation is developed by specific vendors or open-source communities, leading to some differences in usage. -## 高速网络 +## High-Speed Networks -如果进行多机并行,机器之间需要有高速互联网络。如果你的集群已经部署了这些硬件,并安装了某个 MPI 实现,MPI 可以充分利用这些高速网络的高带宽、低延迟的特性。这些网络大部分拥有超过 100Gbps 的带宽,但其价格也非常昂贵,通常在面向高性能计算的场景上才会配置这些网络设备。 +For multi-node parallelism, high-speed interconnects are essential. MPI can efficiently utilize the high bandwidth and low latency networks. These networks typically have bandwidths exceeding 100 Gbps, but at a higher cost. -数据中心经常部署的万兆网络,带宽在 10Gbps 量级,也可以使用 MPI,并且会有一定加速效果。 +Cloud data centers often deploy 10/25 Gigabit Ethernet networks, which operate at a bandwidth level of 10/25 Gbps. MPI can also be used in such environments. -MPI 也可以在单机上运行,即:利用单台节点上的多个计算核心。 +MPI can also run on a single machine, leveraging multiple computing cores on a single node. -## 安装 +## Installation -刚才提到,MPI 有不同的实现,即不同的 MPI 厂商一般会提供: +As mentioned earlier, MPI has different implementations, and different MPI vendors generally provide: -* 编译器 `mpicc`、`mpicxx` 和 `mpifort`,分别用来编译 C、C++、Fortran 语言编写的源代码,源代码中一部分是多机通讯,一部分是单机计算,这些编译器通常将多机通讯与单机计算的代码一起编译,并生成可执行文件。 -* 在多台节点上将并行程序拉起的 `mpirun` 或 `mpiexec`。比如,在多少台节点上拉起多少进程等,都是通过 `mpiexec` 来完成的。 +* Compilers: `mpicc`, `mpicxx`, and `mpifort` used to compile source code written in C, C++, and Fortran, respectively. +* `mpirun` or `mpiexec` for launching parallel programs on multiple nodes. :::{note} -在很多 MPI 实现中,`mpirun` 和 `mpiexec` 的功能几乎相同,它们都可以将并行程序拉起。有些 MPI 实现的 `mpirun` 和 `mpiexec` 背后是同一个程序。但严禁地讲,MPI 标准中只定义了 `mpiexec`,并没有定义 `mpirun`,因此,`mpiexec` 应该更通用。 +In many MPI implementations, `mpirun` and `mpiexec` have nearly identical functionalities and can both launch parallel programs. In some MPI implementations, `mpirun` and `mpiexec` are backed by the same program. However, strictly speaking, the MPI standard only defines `mpiexec`, not `mpirun`. Therefore, `mpiexec` is more portable. ::: -如果使用 C/C++ 或 Fortran 这样的编译语言编写代码,一般的流程是:使用 `mpicc` 编译源代码,得到可执行文件,比如,将可执行文件命名为 `parallel.o`;使用 `mpiexec` 在多台节点上将 `parallel.o` 并行程序拉起。mpi4py 将上述的编译环节封装。 +Here are the steps for C/C++ or Fortran developers: (1) write your source code; (2) use `mpicc` to compile the source code and obtain an executable; and (3) use `mpiexec` to launch the parallel program on multiple nodes. However, mpi4py developers do not need to compile the source code. -如果你的集群环境已经安装了 MPI,可以先将 MPI 加载到环境变量里,然后使用 `pip` 安装: +If your cluster already has MPI installed, you can load MPI into your environment and then use `pip` to install mpi4py: ```bash pip install mpi4py ``` -如果你的集群环境没有 MPI,而又对编译这些流程不熟悉,可以直接用 `conda` 安装。 使用 `conda` 安装的软件已经完成了编译过程。 +If your cluster does not have MPI, and you are not familiar with compiling MPI from source code, you can install MPI using `conda`. ```bash conda install -c conda-forge mpich conda install -c conda-forge mpi4py -``` - -大部分 MPI 程序均需要在命令行中先编译再拉起。为解决这个问题,我们还安装了 ipyparallel,可以在 Jupyter Notebook 中完成并行程序的拉起。 \ No newline at end of file +``` \ No newline at end of file diff --git a/ch-mpi/point-to-point.ipynb b/ch-mpi/point-to-point.ipynb index 4039adc..f9fd77b 100644 --- a/ch-mpi/point-to-point.ipynb +++ b/ch-mpi/point-to-point.ipynb @@ -5,24 +5,22 @@ "metadata": {}, "source": [ "(mpi-point2point)=\n", - "# 点对点通信\n", + "# Point-to-Point Communication\n", "\n", - "一个最简单的通信模式点对点(Point-to-Point)通信,点对点通信又分为阻塞式(Blocking)和非阻塞式(Non-Blocking)。实现点对点时主要考虑两个问题:\n", + "One of the simplest communication patterns is Point-to-Point communication, which can be further divided into Blocking and Non-Blocking. When implementing Point-to-Point communication, two main considerations are:\n", "\n", - "* 如何控制和识别不同的进程?比如,想让 Rank 为 0 的进程给 Rank 为 1 的进程发消息。\n", - "* 如何控制数据的读写?多大的数据,数据类型是什么?\n", + "* How to identify different processes? For example, if you want the process with rank 0 to send a message to the process with rank 1.\n", + "* What kind of data to send or receive? For example, 1024 integers.\n", "\n", - "## 发送与接收\n", + "## Send and Receive\n", "\n", - "[`Comm.send`](https://mpi4py.readthedocs.io/en/latest/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.send) 和 [`Comm.recv`](https://mpi4py.readthedocs.io/en/latest/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.recv) 分别用来阻塞式地发送和接收数据。\n", + "[`Comm.send`](https://mpi4py.readthedocs.io/en/latest/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.send) and [`Comm.recv`](https://mpi4py.readthedocs.io/en/latest/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.recv) are used for blocking send and receive, respectively.\n", "\n", - "`Comm.send(obj, dest, tag=0)` 的参数主要是 `obj` 和 `dest`。`obj` 就是我们想要发送的数据,数据可以是 Python 内置的数据类型,比如 `list` 和 `dict` 等,也可以是 NumPy 的 `ndarray`,甚至是 GPU 上的 cupy 数据。上一节 {ref}`mpi-hello-world` 我们介绍了 Communicator 和 Rank,可以通过 Rank 的号码来定位一个进程。`dest` 可以用 Rank 号码来表示。`tag` 主要用来标识,给程序员一个精细控制的选项,使用 `tag` 可以实现消息的有序传递和筛选。接收方可以选择只接收特定标签的消息,或者按照标签的顺序接收消息,以便更加灵活地控制消息的发送和接收过程。\n", + "The key parameters for `Comm.send(obj, dest, tag=0)` are `obj` and `dest`. `obj` is the data we want to send, and it can be a Python built-in data type such as `list` and `dict`, a NumPy `ndarray`, or even CuPy data on a GPU. In the previous section {ref}`mpi-hello-world`, we introduced communicator and rank, and you can use the rank number to locate a process. `dest` is a rank number. `tag` provides programmers with more control options. For example, the receiver can choose to only receive messages with specific tags.\n", "\n", - "## 案例1:发送 Python 对象\n", + "## Example 1: Send Python Object\n", "\n", - "比如,我们发送一个 Python 对象。Python 对象在通信过程中的序列化使用的是 [pickle](https://docs.python.org/3/library/pickle.html#module-pickle)。\n", - "\n", - "{numref}`mpi-send-py-object` 演示了如何发送一个 Python 对象。\n", + "Here, we show how to send a Python object, which is serialized by [pickle](https://docs.python.org/3/library/pickle.html#module-pickle).\n", "\n", "```{code-block} python\n", ":caption: send-py-object.py\n", @@ -42,7 +40,7 @@ " print(f\"Received: {data}, to rank: {rank}.\")\n", "```\n", "\n", - "将这份代码保存文件为 `send-py-object.py`,在命令行中这样启动:" + "Save the code in a file named `send-py-object.py` and launch it in the command line:" ] }, { @@ -67,11 +65,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 案例2:发送 NumPy `ndarray`\n", - "\n", - "或者发送一个 NumPy `ndarray`:\n", + "## Example 2: Send NumPy `ndarray`\n", "\n", - "{numref}`mpi-send-np` 演示了如何发送一个 NumPy `ndarray`。\n", + "Send a NumPy `ndarray`:\n", "\n", "```{code-block} python\n", ":caption: send-np.py\n", @@ -83,8 +79,8 @@ "comm = MPI.COMM_WORLD\n", "rank = comm.Get_rank()\n", "\n", - "# 明确告知 MPI 数据类型为 int\n", - "# dtype='i', i 为 INT 的缩写\n", + "# tell MPI data type is int\n", + "# dtype='i', i is short for INT\n", "if rank == 0:\n", " data = np.arange(10, dtype='i')\n", " comm.Send([data, MPI.INT], dest=1)\n", @@ -94,7 +90,7 @@ " comm.Recv([data, MPI.INT], source=0)\n", " print(f\"Received: {data}, to rank: {rank}.\")\n", "\n", - "# MPI 自动发现数据类型\n", + "# MPI detects data type\n", "if rank == 0:\n", " data = np.arange(10, dtype=np.float64)\n", " comm.Send(data, dest=1)\n", @@ -132,14 +128,19 @@ "metadata": {}, "source": [ "```{note}\n", - "这里的 `Send` 和 `Recv` 函数的首字母都大写了,因为大写的 `Send` 和 `Recv` 等方法是基于缓存(Buffer)的。对于这些基于缓存的函数,应该明确数据的类型,比如传入这样的二元组 `[data, MPI.DOUBLE]` 或三元组 `[data, count, MPI.DOUBLE]`。刚才例子中,`comm.Send(data, dest=1)` 没有明确告知 MPI 其数据类型和数据大小,是因为 MPI 对 NumPy 和 cupy `ndarray` 做了类型的自动探测。\n", - "```\n", - "\n", - "## 案例3:Master-Worker\n", + "The initial letters of the `Send` and `Recv` functions are capitalized because these capitalized methods are based on buffers. For these buffer-based functions, it is crucial to explicitly specify the data type, such as passing a binary tuple `[data, MPI.DOUBLE]` or a triple `[data, count, MPI.DOUBLE]`. In {numref}`mpi-send-np`, the `comm.Send(data, dest=1)` does not explicitly inform MPI about the data type and size because MPI automatically detects the type of NumPy and CuPy `ndarray`.\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 3: Master-Worker\n", "\n", - "现在我们做一个 Master-Worker 案例,共有 `size` 个进程,前 `size-1` 个进程作为 Worker,随机生成数据,最后一个进程(Rank 为 `size-1`)作为 Master,接收数据,并将数据的大小打印出来。\n", + "In this example, we implement a Master-Worker computation with a total of `size` processes. The first `size-1` processes act as Workers, generating random data. The last process (rank `size-1`) serves as the Master, receiving data and printing its size.\n", "\n", - "{numref}`mpi-master-worker` 对 Master 与 Worker 进程间,数据的发送和接收过程进行了演示。\n", + "The data exchange process between Master and Worker processes is demonstrated in {numref}`mpi-master-worker`.\n", "\n", "```{code-block} python\n", ":caption: master-worker.py\n", @@ -153,15 +154,15 @@ "size = comm.Get_size()\n", "\n", "if rank < size - 1:\n", - " # Worker 进程\n", + " # Worker process\n", " np.random.seed(rank)\n", - " # 随机生成\n", + " # Generate random data\n", " data_count = np.random.randint(100)\n", " data = np.random.randint(100, size=data_count)\n", " comm.send(data, dest=size - 1)\n", " print(f\"Worker: worker ID: {rank}; count: {len(data)}\")\n", "else:\n", - " # Master 进程\n", + " # Master process\n", " for i in range(size - 1):\n", " status = MPI.Status()\n", " data = comm.recv(source=MPI.ANY_SOURCE, status=status)\n", @@ -170,7 +171,7 @@ "comm.Barrier()\n", "```\n", "\n", - "在这个例子中,`rank` 小于 `size - 1` 的进程是 Worker,随机生成数据,并发送出给最后一个进程(进程 Rank 号为 `size - 1`)。最后一个进程接收数据,并打印出接收数据的大小。" + "In this example, processes with rank less than `size - 1` are Workers, generating random data and sending it to the last process (with rank `size - 1`). The last process receives the data and prints the size of the received data." ] }, { @@ -207,21 +208,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 案例 4:长方形模拟求 $\\pi$ 值\n", + "## Example 4: Rectangle Simulation for Calculating $\\pi$\n", "\n", - "对半径为 R 的圆,我们可以采用微分方法将圆切分成 N 个小长方形,当长方形数量达到无穷大的 N 时, 所有长方形总面积接近于 1/4 圆面积,如 {numref}`rectangle-pi` 所示。\n", + "For a circle with a radius R, we can use a differential method to divide the circle into N small rectangles. When the number of rectangles, N, approaches infinity, the total area of all rectangles approximates 1/4 of the circle area, as shown in {numref}`rectangle-pi`.\n", "\n", "```{figure} ../img/ch-mpi/rectangle-pi.svg\n", "---\n", "width: 600px\n", "name: rectangle-pi\n", "---\n", - "使用 N 个小长方形模拟 1/4 圆\n", + "Simulating 1/4 of a circle using N small rectangles.\n", "```\n", "\n", - "假设此时有 `size` 个进程参与计算,首先求每个进程需要处理的长方形数量 (`N/size`)。每个进程各自计算长方形面积之和,并发送给 Master 进程。第一个进程作为 Master,接收各 Worker 发送数据,汇总所有矩形面积,从而近似计算出 $\\pi$ 值。\n", + "Assuming there are `size` processes involved in the calculation, we first determine the number of rectangles each process needs to handle (`N/size`). Each process calculates the sum of the areas of its rectangles and sends the result to the Master process. The first process acts as the Master, receiving data from each Worker, consolidating all rectangle areas, and thereby approximating the value of $\\pi$.\n", "\n", - "{numref}`mpi-rectangle-pi` 演示了长方形模拟求 $\\pi$ 值的过程。\n", + "{numref}`mpi-rectangle-pi` shows the process.\n", "\n", "```{code-block} python\n", ":caption: rectangle-pi.py\n", @@ -233,10 +234,10 @@ "from mpi4py import MPI\n", "\n", "communicator = MPI.COMM_WORLD\n", - "rank = communicator.Get_rank() # 进程唯一的标识Rank\n", + "rank = communicator.Get_rank()\n", "process_nums = communicator.Get_size()\n", "\"\"\"\n", - "参数设置:\n", + "Configuration:\n", "R=1\n", "N=64*1024*1024\n", "\"\"\"\n", @@ -251,13 +252,13 @@ "\n", " for i in range(step_size):\n", " x = rect_start + i * rect_width\n", - " # (x,y) 对应于第i个小矩形唯一在圆弧上的顶点\n", + " # (x,y) is the upper right point of the i-th rectangle\n", " # x^2+y^2=1 => y=sqrt(1-x^2)\n", " rect_length = math.pow(1 - x * x, 0.5)\n", " total_area += rect_width * rect_length\n", " return total_area\n", "\n", - "# 在每个进程上执行计算\n", + "# Calculating on each process\n", "total_area = cal_rect_area(rank, step_size, rect_width)\n", "\n", "if rank == 0:\n", @@ -273,7 +274,7 @@ " communicator.send(total_area, dest=0)\n", "```\n", "\n", - "上述例子中,我们设置参数为:`R=1`, `N=64*1024*1024`,保存文件为 `rectangle_pi.py`:" + "In this case, we set the following configurations:`R=1`, `N=64*1024*1024`, and save as `rectangle_pi.py`." ] }, { @@ -298,41 +299,49 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 阻塞 v.s. 非阻塞\n", + "## Blocking v.s. Non-blocking\n", "\n", - "### 阻塞\n", + "### Blocking\n", "\n", - "我们先分析一下阻塞式通信。`Send` 和 `Recv` 这两个基于缓存的方法:\n", + "Let's first analyze blocking communication. The `Send` and `Recv` methods, which are based on buffering:\n", "\n", - "* `Send` 直到缓存是空的时候,也就是说缓存中的数据都被发送出去后,才返回(`return`),允许运行用户代码中剩下的业务逻辑。缓存区域可以被接下来其他的 `Send` 循环再利用。\n", - "* `Recv` 直到缓存区域数据到达,才返回(`return`),,允许运行用户代码中剩下的业务逻辑。\n", + "* `Send`: It will not return until the buffer is empty, meaning all the data in the buffer has been sent. The buffer area can then be reused in subsequent `Send`s.\n", "\n", - "如 {ref}`mpi-communications` 所示,阻塞通信是数据完成传输,才会返回(`return`),否则一直在等待。\n", + "* `Recv`: It will not `return` until the buffer is full.\n", + "\n", + "As shown in {ref}`mpi-communications`, blocking communication returns only when the data transmission is completed; otherwise, it keeps waiting.\n", "\n", "```{figure} ../img/ch-mpi/blocking.svg\n", "---\n", - "width: 800px\n", + "width: 600px\n", "name: blocking-communications\n", "---\n", - "阻塞式通信示意图\n", + "Blocking communications\n", "```\n", "\n", - "阻塞式通信的代码更容易去设计,但出现问题是死锁,比如类似下面的逻辑,Rank = 1 的产生了死锁,应该将 `Send` 和 `Recv` 调用顺序互换\n", + "Code using blocking communication is easier to design, but a common issue is deadlock. For example, in the code below, rank 1 causes a deadlock. The order of `Send` and `Recv` calls should be swapped to avoid this:\n", "\n", "```python\n", "if rank == 0:\n", "\tcomm.Send(..to rank 1..)\n", " comm.Recv(..from rank 1..)\n", - "else if (rank == 1): <- 该进程死锁\n", - " comm.Send(..to rank 0..) <- 应将 Send Revc 互换\n", + "else if (rank == 1): <- deadlock\n", + " comm.Send(..to rank 0..) <- should swap Send and Recv\n", " comm.Recv(..from rank 0..)\n", "```\n", "\n", - "### 非阻塞\n", + "### Non-blocking\n", "\n", - "非阻塞式通信调用后直接返回 [`Request`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Request.html#mpi4py.MPI.Request) 句柄(Handle),程序员接下来再对 `Request` 做处理,比如等待 `Request` 涉及的数据传输完毕。非阻塞式通信有大写的 i(I) 作为前缀, [`Irecv`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.irecv) 的函数参数与之前相差不大,只不过返回值是一个 `Request`。 `Request` 类提供了 `wait` 方法,显示地调用 `wait()` 可以等待数据传输完毕。用 `Isend` 写的阻塞式的代码,可以改为 `Isend` + [`Request.wait()`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Request.html#mpi4py.MPI.Request.wait) 以非阻塞方式实现。\n", + "In contrast, non-blocking communication does not wait for the completion of data transmission. Non-blocking communication can enhance performance by overlapping communication and computation, i.e., the communications are handled on the network side, meanwhile the computational tasks are performed on the CPU side. The [`isend`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.isend) and [`irecv`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.irecv) methods are used for non-blocking communication:\n", "\n", - "{numref}`mpi-non-blocking` 展示了一个非阻塞式通信的例子。\n", + "* `isend`: Initiates a non-blocking send operation and immediately returns control to the user, allowing the execution of subsequent code.\n", + "\n", + "* `irecv`: Initiates a non-blocking receive operation and immediately returns control to the user, allowing the execution of subsequent code.\n", + "\n", + "After a non-blocking communication call, the [`Request`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Request.html#mpi4py.MPI.Request) handle is returned immediately. Subsequently, the programmer can perform further processing on the `Request`, such as waiting for the data transfer associated with the `Request` to complete. Non-blocking communication is denoted by an uppercase 'I' or a lowercase 'i', where 'I' is buffer-based and 'i' is not. \n", + "The function parameters of [`isend`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.isend) are similar to [`send`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.send), with the key distinction being that `isend` returns a `Request`. The `Request` class provides a `wait()` method, and explicitly calling `wait()` allows for waiting until the data transfer is complete. Code written in a blocking manner using `send` can be modified to utilize non-blocking communication by using `isend` + `Request.wait()`.\n", + "\n", + "Non-blocking communication is illustrated in {ref}`mpi-non-blocking`.\n", "\n", "```{code-block} python\n", ":caption: non-blocking.py\n", @@ -381,14 +390,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "{numref}`non-blocking-communications` 展示非阻塞通信 `wait()` 加入后数据流的变化。\n", + "{numref}`non-blocking-communications` demonstrates the data flow changes of non-blocking communication.\n", "\n", "```{figure} ../img/ch-mpi/non-blocking.svg\n", "---\n", - "width: 800px\n", + "width: 600px\n", "name: non-blocking-communications\n", "---\n", - "非阻塞式通信示意图\n", + "Non-blocking communications\n", "```" ] } diff --git a/ch-mpi/rectangle-pi.py b/ch-mpi/rectangle-pi.py index 4f8649d..d641006 100644 --- a/ch-mpi/rectangle-pi.py +++ b/ch-mpi/rectangle-pi.py @@ -7,7 +7,7 @@ rank = communicator.Get_rank() # 进程唯一的标识Rank process_nums = communicator.Get_size() """ -参数设置: +Configuration: R=1 N=64*1024*1024 """ @@ -22,13 +22,13 @@ def cal_rect_area(process_no, step_size, rect_width): for i in range(step_size): x = rect_start + i * rect_width - # (x,y) 对应于第i个小矩形唯一在圆弧上的顶点 + # (x,y) is the upper right point of the i-th rectangle # x^2+y^2=1 => y=sqrt(1-x^2) rect_length = math.pow(1 - x * x, 0.5) total_area += rect_width * rect_length return total_area -# 在每个进程上执行计算 +# Calculating on each process total_area = cal_rect_area(rank, step_size, rect_width) if rank == 0: diff --git a/ch-mpi/remote-memory-access.ipynb b/ch-mpi/remote-memory-access.ipynb index 7b431d4..d8dd40b 100644 --- a/ch-mpi/remote-memory-access.ipynb +++ b/ch-mpi/remote-memory-access.ipynb @@ -5,65 +5,65 @@ "metadata": {}, "source": [ "(remote-memory-access)=\n", - "# 远程内存访问\n", + "# Remote Memory Access\n", "\n", - "{numref}`mpi-hello-world` 中我们介绍了两种通信模式,即双边和单边。前几章节的点对点通信和集合通信主要针对的双边通信,本节主要介绍单边通信。单边通信是进程间直接访问远程内存,又称为远程内存访问(Remote Memory Access,RMA)。\n", + "In {numref}`mpi-hello-world`, we introduced two communication modes: two-sided and one-sided. The point-to-point and collective communications discussed in the earlier sections focus on two-sided communication. This section specifically delves into one-sided communication, also known as Remote Memory Access (RMA).\n", "\n", "## Window\n", "\n", - "任何进程所分配的内存都是私有的,即只被进程自己访问,远程内存访问要把进程自己的内存区域暴露给其他进程访问,这部分内存不再私有,而变成了公共的,需要特别处理。在 MPI 中,使用窗口(Window)定义可被远程访问的内存区域。某个内存区域被设置为允许远程访问,所有 Window 内的进程都可对这个区域进行读写。{numref}`rma-window` 分别介绍了私有的内存区域和可被远程访问的 Window。mpi4py 提供了 [`mpi4py.MPI.Win`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win) 类进行 Window 相关操作。\n", + "Any memory allocated by a process is private, meaning it is only accessible by the process itself. To enable remote memory access, exposing a portion of a process's private memory for access by other processes requires special handling. In MPI, a Window is used to define a memory region that can be accessed remotely. A designated memory region is set to allow remote access, and all processes within the Window can read and write to this shared region. {numref}`rma-window` provides a detailed explanation of private memory regions and windows that can be accessed remotely. The [`mpi4py.MPI.Win`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win) class in mpi4py facilitates Window-related operations.\n", "\n", "```{figure} ../img/ch-mpi/rma-window.svg\n", "---\n", "width: 600px\n", "name: rma-window\n", "---\n", - "进程私有内存与允许远程访问的 Window\n", + "Private memory regions and remotely accessible windows\n", "```\n", "\n", - "## 创建 Window\n", + "## Creating a Window\n", "\n", - "我们可以使用 [`mpi4py.MPI.Win.Allocate`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Allocate) 和 [`mpi4py.MPI.Win.Create`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Create) 创建 Window。其中 `mpi4py.MPI.Win.Allocate` 创建新的内存缓存,并且该缓存可被远程访问;`mpi4py.MPI.Win.Create` 将现有的某个内存缓存区域设置为可远程访问。具体而言,这个区别主要体现在这两个方法的第一个参数,`mpi4py.MPI.Win.Allocate(size)` 传入的要创建内存缓存的字节数 `size`; `mpi4py.MPI.Win.Create(memory)` 传入的是内存地址 `memory`。\n", + "Window can be created using [`mpi4py.MPI.Win.Allocate`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Allocate) and [`mpi4py.MPI.Win.Create`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Create). The `mpi4py.MPI.Win.Allocate` method creates a new memory buffer that can be accessed remotely, while `mpi4py.MPI.Win.Create` designates an existing memory buffer for remote access. Specifically, the distinction lies in the first parameter of these two methods: `mpi4py.MPI.Win.Allocate(size)` takes `size` bytes to create a new memory buffer, whereas `mpi4py.MPI.Win.Create(memory)` takes the memory address `memory` of an existing buffer.\n", "\n", - "## 读写操作\n", + "## Read and Write Operations\n", "\n", - "创建好可远程访问的 Window 后,可以使用三类方法来向内存区域读写数据:[`mpi4py.MPI.Win.Put`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Put),[`mpi4py.MPI.Win.Get`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Get) 和 [`mpi4py.MPI.Win.Accumulate`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Accumulate)。这三类方法都有两个参数:`origin` 和 `target_rank`,分别表示源进程和目标进程。源进程指的是调用读写方法的进程,目标进程是远程的进程。\n", + "Once a Window with remote access is created, three types of methods can be used to read and write data to the memory region: [`mpi4py.MPI.Win.Put`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Put), [`mpi4py.MPI.Win.Get`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Get), and [`mpi4py.MPI.Win.Accumulate`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Win.html#mpi4py.MPI.Win.Accumulate). These methods all take two parameters: `origin` and `target_rank`, representing the source process and the target process, respectively. The source process is the one invoking the read/write method, while the target process is the remote process.\n", "\n", - "* `Win.Put` 将数据从源进程移动至目标进程。\n", - "* `Win.Get` 将数据从目标进程移动至源进程。\n", - "* `Win.Accumulate` 与 `Win.Put` 类似,也是将数据从源进程移动至目标进程,同时对源进程的数据和目标进程的数据进行了聚合操作,聚合操作的操作符包括:[`mpi4py.MPI.SUM`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.SUM.html)、[`mpi4py.MPI.PROD`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.PROD.html) 等。\n", + "- `Win.Put` moves data from the origin process to the target process.\n", + "- `Win.Get` moves data from the target process to the origin process.\n", + "- `Win.Accumulate` is similar to `Win.Put`, moving data from the origin process to the target process, while also performing an aggregation operation on the data from the source and target processes. Aggregation operators include [`mpi4py.MPI.SUM`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.SUM.html), [`mpi4py.MPI.PROD`](https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.PROD.html), and others.\n", "\n", - "## 数据同步\n", + "## Data Synchronization\n", "\n", - "单机程序是顺序执行的,多机环境下,因为涉及多方数据的读写,可能会出现一些数据同步的问题,如 {numref}`rma-sync-problem` 所示,如果不明确读写操作的顺序,会导致某块内存区域的数据并非为程序员所期望的结果。\n", + "In a single-machine program, execution is sequential. However, in a multi-machine environment where multiple processes are involved in reading and writing data, data synchronization issues may arise. As illustrated in {numref}`rma-sync-problem`, without explicit control over the order of read and write operations, the data in a particular memory region may not yield the expected results.\n", "\n", "```{figure} ../img/ch-mpi/rma-sync-problem.png\n", "---\n", "width: 500px\n", "name: rma-sync-problem\n", "---\n", - "并行数据读写时会出现数据同步的问题\n", + "Data Synchronization in Parallel Data I/O\n", "```\n", "\n", - "为解决这个问题,需要一定的数据同步机制。MPI 一共有几类,包括主动同步(Active Target Synchronization)和被动同步(Passive Target Synchronization),如 {numref}`rma-synchronization` 所示。\n", + "To address this problem, various data synchronization mechanisms are available in MPI, broadly categorized into Active Target Synchronization and Passive Target Synchronization, as illustrated in {numref}`rma-synchronization`.\n", "\n", "```{figure} ../img/ch-mpi/rma-synchronization.png\n", "---\n", "width: 800px\n", "name: rma-synchronization\n", "---\n", - "主动同步与被动同步\n", + "Active Target Synchronization and Passive Target Synchronization.\n", "```\n", "\n", - "## 案例:远程读写\n", + "## Example: Remote Read and Write\n", "\n", - "一个完整的 RMA 程序应该包括:\n", + "A complete RMA (Remote Memory Access) program should encompass the following steps:\n", "\n", - "1. 创建 Window\n", - "2. 数据同步\n", - "3. 数据读写\n", + "1. Create a window\n", + "2. Implement data synchronization\n", + "3. Perform data read and write operations\n", "\n", - "{numref}`mpi-rma-lock` 展示了一个案例,其代码保存为 `rma-lock.py`。\n", + "Example code for a case study is illustrated in {numref}`mpi-rma-lock` and is saved as `rma-lock.py`.\n", "\n", "```{code-block} python\n", ":caption: rma-lock.py\n", diff --git a/ch-mpi/send-np.py b/ch-mpi/send-np.py index 8410a75..feb2e19 100644 --- a/ch-mpi/send-np.py +++ b/ch-mpi/send-np.py @@ -4,8 +4,8 @@ comm = MPI.COMM_WORLD rank = comm.Get_rank() -# 明确告知 MPI 数据类型为 int -# dtype='i', i 为 INT 的缩写 +# tell MPI data type is int +# dtype='i', i is short for INT if rank == 0: data = np.arange(10, dtype='i') comm.Send([data, MPI.INT], dest=1) @@ -15,7 +15,7 @@ comm.Recv([data, MPI.INT], source=0) print(f"Received: {data}, to rank: {rank}.") -# MPI 自动发现数据类型 +# MPI detects data type if rank == 0: data = np.arange(10, dtype=np.float64) comm.Send(data, dest=1) diff --git a/conf.py b/conf.py index dc3bcd4..673d1b2 100644 --- a/conf.py +++ b/conf.py @@ -17,7 +17,21 @@ 'launch_buttons': {'notebook_interface': 'classic', 'binderhub_url': '', 'jupyterhub_url': '', 'thebe': False, 'colab_url': ''}, 'path_to_docs': 'docs', 'repository_url': 'https://github.com/godaai/distributed-python-en', - 'repository_branch': 'main', + 'repository_branch': 'main', + 'icon_links': [ + { + "name": "中文版", + "url": "https://dp.godaai.org/", # required + "icon": "fa fa-language", + "type": "fontawesome", + }, + { + "name": "GitHub", + "url": "https://github.com/godaai/distributed-python-en", + "icon": "https://img.shields.io/github/stars/godaai/distributed-python-en?style=for-the-badge", + "type": "url", + }, + ], 'extra_footer': '', 'home_page_in_toc': True, 'announcement': "If you find this tutorial helpful, please star our GitHub repo!", diff --git a/drawio/ch-dask-dataframe/parquet-row-group.drawio b/drawio/ch-dask-dataframe/parquet-row-group.drawio deleted file mode 100644 index 96f9a59..0000000 --- a/drawio/ch-dask-dataframe/parquet-row-group.drawio +++ /dev/null @@ -1,151 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/drawio/ch-dask-dataframe/parquet.drawio b/drawio/ch-dask-dataframe/parquet.drawio new file mode 100644 index 0000000..3a5d7d2 --- /dev/null +++ b/drawio/ch-dask-dataframe/parquet.drawio @@ -0,0 +1,133 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/drawio/ch-intro/distributed-timeline.drawio b/drawio/ch-intro/distributed-timeline.drawio new file mode 100644 index 0000000..8f2e904 --- /dev/null +++ b/drawio/ch-intro/distributed-timeline.drawio @@ -0,0 +1,214 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/drawio/ch-mpi/blocking.drawio b/drawio/ch-mpi/blocking.drawio index 86954bc..406dba9 100644 --- a/drawio/ch-mpi/blocking.drawio +++ b/drawio/ch-mpi/blocking.drawio @@ -1,11 +1,11 @@ - + - + - - + + @@ -13,7 +13,7 @@ - + @@ -22,7 +22,7 @@ - + @@ -31,8 +31,8 @@ - - + + @@ -49,16 +49,16 @@ - + - - + + - + @@ -67,11 +67,11 @@ - + - - + + diff --git a/drawio/ch-mpi/communications.drawio b/drawio/ch-mpi/communications.drawio index a70e630..eb2c7e8 100644 --- a/drawio/ch-mpi/communications.drawio +++ b/drawio/ch-mpi/communications.drawio @@ -1,6 +1,6 @@ - + - + @@ -47,7 +47,7 @@ - + @@ -84,7 +84,7 @@ - + diff --git a/drawio/ch-mpi/non-blocking.drawio b/drawio/ch-mpi/non-blocking.drawio index 555ecc1..8718719 100644 --- a/drawio/ch-mpi/non-blocking.drawio +++ b/drawio/ch-mpi/non-blocking.drawio @@ -1,10 +1,10 @@ - + - + - + @@ -13,7 +13,7 @@ - + @@ -22,7 +22,7 @@ - + @@ -31,7 +31,7 @@ - + @@ -49,17 +49,17 @@ - + - - + + - - + + @@ -67,11 +67,11 @@ - + - - + + @@ -85,8 +85,8 @@ - - + + @@ -112,7 +112,7 @@ - + @@ -130,7 +130,7 @@ - + diff --git a/img/ch-dask-dataframe/parquet-row-group.svg b/img/ch-dask-dataframe/parquet-row-group.svg deleted file mode 100644 index 6769ad3..0000000 --- a/img/ch-dask-dataframe/parquet-row-group.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - -
Column 1
Column 1
Column 2
Column 2
Column 3
Column 3
Column 4
Column 4
Column 5
Column 5
Product
Product
Customer
Customer
Country
Country
Date
Date
Sales Amount
Sales Amount
Ball
Ball
T-Shirt
T-Shirt
Socks
Socks
Socks
Socks
T-Shirt
T-Shirt
Socks
Socks
Row Group 1
Row Group 1
Row Group 2
Row Group 2
Row Group 3
Row Group 3
John Doe
John Doe
John Doe
John Doe
Maria Adams
Maria Adams
Antonio Grant
Antonio Grant
Maria Adams
Maria Adams
John Doe
John Doe
USA
USA
USA
USA
UK
UK
USA
USA
UK
UK
USA
USA
2023-01-01
2023-01-01
2023-01-02
2023-01-02
2023-01-01
2023-01-01
2023-01-03
2023-01-03
2023-01-02
2023-01-02
2023-01-05
2023-01-05
100
100
200
200
300
300
100
100
500
500
200
200
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/ch-dask-dataframe/parquet.svg b/img/ch-dask-dataframe/parquet.svg new file mode 100644 index 0000000..933e86b --- /dev/null +++ b/img/ch-dask-dataframe/parquet.svg @@ -0,0 +1,4 @@ + + + +
Parquet File
Parquet File
...
...
Row Group
Row Group
Metadata
Metadata
...
...
Column Chunk
Column Chunk
2023-01-01
2023-01-01
2023-01-02
2023-01-02
2023-01-03
2023-01-03
2023-01-04
2023-01-04
...
...
...
...
Column Chunk
Column Chunk
300
300
100
100
200
200
500
500
...
...
...
...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/ch-intro/distributed-timeline.svg b/img/ch-intro/distributed-timeline.svg new file mode 100644 index 0000000..f395af5 --- /dev/null +++ b/img/ch-intro/distributed-timeline.svg @@ -0,0 +1,4 @@ + + + +
Scheduler
Scheduler
Worker 1
Worker 1
task 1
task 1
task 5
task 5
task 9
task 9
Worker 2
Worker 2
task 2
task 2
task 6
task 6
task 10
task 10
Worker 3
Worker 3
task 3
task 3
task 7
task 7
Worker 4
Worker 4
task 4
task 4
task 8
task 8
Time
Time
Time Saved
Time Saved
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/ch-mpi/blocking.svg b/img/ch-mpi/blocking.svg index 3b15df2..47a7ce1 100644 --- a/img/ch-mpi/blocking.svg +++ b/img/ch-mpi/blocking.svg @@ -1,4 +1,4 @@ -
发送方
发送方
接收方
接收方
T0: Recv 被调用
缓存区无法被用户读写
T0: Recv 被调用...
T1: Send
T1: Send
时间轴
时间轴
T2:  缓存数据被发送
Send 返回 
T2:  缓存数据被发送...
T3: 发送结束
T3: 发送结束
T4: 缓存区被填充
T4: 缓存区被填充
Text is not SVG - cannot display
\ No newline at end of file +
Sender
Sender
Receiver
Receiver
T0: Recv is called
receiver buffer unavailable to user
T0: Recv is called...
T1: Send
T1: Send
Time
Time
T2:  Send returns
 sender buffer can be reused
T2:  Send returns...
T3: Transfer Complete
T3: Transfer Complet...
T4: Recv returns
receiver buffer is filled
T4: Recv returns...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/ch-mpi/communications.svg b/img/ch-mpi/communications.svg index a806d3c..2954a40 100644 --- a/img/ch-mpi/communications.svg +++ b/img/ch-mpi/communications.svg @@ -1,4 +1,4 @@ -
Process 0
Process 0
Process 1
Process 1
DATA
DATA
SEND
SEND
DATA
DATA
RECV
RECV
(a) 双边
(a) 双边
Process 0
Process 0
Process 1
Process 1
DATA
DATA
PUT
PUT
(b) 单边
(b) 单边
DATA
DATA
GET
GET
Text is not SVG - cannot display
\ No newline at end of file +
Process 0
Process 0
Process 1
Process 1
DATA
DATA
SEND
SEND
DATA
DATA
RECV
RECV
(a) Two-Sided
(a) Two-Sided
Process 0
Process 0
Process 1
Process 1
DATA
DATA
PUT
PUT
(b) One-Sided
(b) One-Sided
DATA
DATA
GET
GET
Text is not SVG - cannot display
\ No newline at end of file diff --git a/img/ch-mpi/non-blocking.svg b/img/ch-mpi/non-blocking.svg index 4b58dc8..2383800 100644 --- a/img/ch-mpi/non-blocking.svg +++ b/img/ch-mpi/non-blocking.svg @@ -1,4 +1,4 @@ -
发送方
发送方
接收方
接收方
T0: Irecv 被调用
T0: Irecv 被调用
T2: Isend
T2: Isend
时间轴
时间轴
T3:  Isend 返回
缓冲区不可用
T3:  Isend 返回...
T7: 发送结束
T7: 发送结束
T8: wait() 返回
缓存区被填充
T8: wait() 返回...
T1: Irecv 返回
T1: Irecv 返回
T5: 发送结束
缓冲区可被循环使用
T5: 发送结束...
T6: wait()
T6: wait()
T9:  wait() 返回
T9:  wait() 返回
Text is not SVG - cannot display
\ No newline at end of file +
Sender
Sender
Receiver
Receiver
T0: Irecv
T0: Irecv
T2: Isend
T2: Isend
Time
Time
T3:  Isend returns
buffer is not available
T3:  Isend returns...
T7: Transfer finishes
T7: Transfer finis...
T8: wait() returns
receive buffer is filled
T8: wait() returns...
T1: Irecv returns
T1: Irecv returns
T5: Sender completes
buffer is available
T5: Sender complet...
T6: wait()
T6: wait()
T9:  wait() returns
T9:  wait() retur...
Text is not SVG - cannot display
\ No newline at end of file