Improve serving (#3398) (#3422)

* Add description of using HPIP * Update layout parsing v2 schema and app * Update HPI ref * Optimize shitu and face rec schemas * Fix serving doc * Fix shitu bugs * Fix bug * Update docs * Add notice * Include HPIP links in package
PaddlePaddle · Feb 20, 2025 · c4c9dca · c4c9dca
1 parent 933230b
commit c4c9dca
Show file tree

Hide file tree

Showing 20 changed files with 396 additions and 112 deletions.
diff --git a/docs/pipeline_deploy/high_performance_inference.md b/docs/pipeline_deploy/high_performance_inference.md
@@ -691,14 +691,3 @@ python -m pip install ../../python/dist/ultra_infer*.whl
   </tr>
 
 </table>
-
-
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_python-1.0.0.3.0.0rc0-cp38-cp38-linux_x86_64.whl"></a>
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_python-1.0.0.3.0.0rc0-cp39-cp39-linux_x86_64.whl"></a>
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_python-1.0.0.3.0.0rc0-cp310-cp310-linux_x86_64.whl"></a>
-
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_gpu_python-1.0.0.3.0.0rc0-cp38-cp38-linux_x86_64.whl"></a>
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_gpu_python-1.0.0.3.0.0rc0-cp39-cp39-linux_x86_64.whl"></a>
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/ultra_infer_gpu_python-1.0.0.3.0.0rc0-cp310-cp310-linux_x86_64.whl"></a>
-
-<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/hpi/ultra_infer/releases/3.0.0rc0/paddlex_hpi-3.0.0rc0-py3-none-any.whl"></a>
diff --git a/docs/pipeline_deploy/serving.en.md b/docs/pipeline_deploy/serving.en.md
@@ -128,7 +128,7 @@ Find the high-stability serving SDK corresponding to the pipeline in the table b
 </tr>
 <tr>
 <td>General image classification</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hps/public/sdks/v3.0.0rc0/paddlex_hps_image_classification.tar.gz">paddlex_hps_image_classification.tar.gz</a></td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hps/public/sdks/v3.0.0rc0/paddlex_hps_image_classification_sdk.tar.gz">paddlex_hps_image_classification_sdk.tar.gz</a></td>
 </tr>
 <tr>
 <td>General object detection</td>
@@ -262,7 +262,7 @@ Select the pipeline you wish to deploy and click "获取" (acquire). Afterwards,
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipeline_deploy/image-2.png"/>
 
-**Please note**: Each serial number can only be bound to a unique device fingerprint and can only be bound once. This means that if a user deploys a pipeline on different machines, a separate serial number must be prepared for each machine.
+**Please note**: Each serial number can only be bound to a unique device fingerprint and can only be bound once. This means that if a user deploys a pipeline on different machines, a separate serial number must be prepared for each machine. **The high-stability serving solution is completely free.** PaddleX's authentication mechanism is targeted to count the number of deployments across various pipelines and provide pipeline efficiency analysis for our team through data modeling, so as to optimize resource allocation and improve the efficiency of key pipelines. It is important to note that the authentication process only uses non-sensitive information such as disk partition UUIDs, and PaddleX does not collect sensitive data such as device telemetry data. Therefore, in theory, **the authentication server cannot obtain any sensitive information**.
 
 ### 2.3 Adjust Configurations
 
@@ -312,44 +312,46 @@ First, pull the Docker image as needed:
 - Image supporting deployment with NVIDIA GPU (the machine must have NVIDIA drivers that support CUDA 11.8 installed):
 
     ```bash
-    docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0b2-gpu
+    docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-gpu
     ```
 
 - CPU-only Image:
 
     ```bash
-    docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0b2-cpu
+    docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-cpu
     ```
 
 With the image prepared, execute the following command to run the server:
 
 ```bash
 docker run \
     -it \
+    -e PADDLEX_HPS_DEVICE_TYPE={deployment device type} \
+    -e PADDLEX_HPS_SERIAL_NUMBER={serial number} \
+    -e PADDLEX_HPS_UPDATE_LICENSE=1 \
     -v "$(pwd)":/workspace \
     -v "${HOME}/.baidu/paddlex/licenses":/root/.baidu/paddlex/licenses \
     -v /dev/disk/by-uuid:/dev/disk/by-uuid \
     -w /workspace \
-    -e PADDLEX_HPS_DEVICE_TYPE={deployment device type} \
-    -e PADDLEX_HPS_SERIAL_NUMBER={serial number} \
     --rm \
     --gpus all \
     --network host \
     --shm-size 8g \
     {image name} \
-    ./server.sh
+    /bin/bash server.sh
 ```
 
 - The deployment device type can be `cpu` or `gpu`, and the CPU-only image supports only `cpu`.
 - If CPU deployment is required, there is no need to specify `--gpus`.
 - The above commands can only be executed properly after successful activation. PaddleX offers two activation methods: online activation and offline activation. They are detailed as follows:
 
-    - Online activation: Add `-e PADDLEX_HPS_UPDATE_LICENSE=1` to the command to enable the program to complete activation automatically.
-    - Offline Activation: Follow the instructions in the serial number management section to obtain the machine’s device fingerprint. Bind the serial number with the device fingerprint to obtain the certificate and complete the activation. For this activation method, you need to manually place the certificate in the `${HOME}/.baidu/paddlex/licenses` directory on the machine (if the directory does not exist, you will need to create it).
+    - Online activation: Set `PADDLEX_HPS_UPDATE_LICENSE` to `1` for the first execution to enable the program to automatically update the license and complete activation. When executing the command again, you may set `PADDLEX_HPS_UPDATE_LICENSE` to `0` to avoid online license updates.
+    - Offline Activation: Follow the instructions in the serial number management section to obtain the machine’s device fingerprint. Bind the serial number with the device fingerprint to obtain the certificate and complete the activation. For this activation method, you need to manually place the certificate in the `${HOME}/.baidu/paddlex/licenses` directory on the machine (if the directory does not exist, you will need to create it). When using this method, set `PADDLEX_HPS_UPDATE_LICENSE` to `0` to avoid online license updates.
 
 - It is necessary to ensure that the `/dev/disk/by-uuid` directory on the host machine exists and is not empty, and that this directory is correctly mounted in order to perform activation properly.
-- If you need to enter the container for debugging, you can replace ``./server.sh` in the command with `/bin/bash`. Then execute `./server.sh` inside the container.
+- If you need to enter the container for debugging, you can replace `/bin/bash server.sh` in the command with `/bin/bash`. Then execute `/bin/bash server.sh` inside the container.
 - If you want the server to run in the background, you can replace `-it` in the command with `-d`. After the container starts, you can view the container logs with `docker logs -f {container ID}`.
+- Add `-e PADDLEX_USE_HPIP=1` to use the PaddleX high-performance inference plugin to accelerate the pipeline inference process. However, please note that not all pipelines support using the high-performance inference plugin. Please refer to the [PaddleX High-Performance Inference Guide](./high_performance_inference.en.md) for more information.
 
 You may observe output similar to the following:
 
@@ -367,8 +369,8 @@ Navigate to the `client` directory of the high-stability serving SDK, and run th
 
 ```bash
 # It is recommended to install in a virtual environment
-python -m pip install paddlex_hps_client-*.whl
 python -m pip install -r requirements.txt
+python -m pip install paddlex_hps_client-*.whl
 ```
 
 The `client.py` script in the `client` directory contains examples of how to call the service and provides a command-line interface.

diff --git a/docs/pipeline_deploy/serving.md b/docs/pipeline_deploy/serving.md
@@ -128,7 +128,7 @@ paddlex --serve --pipeline image_classification --use_hpip
 </tr>
 <tr>
 <td>通用图像分类</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hps/public/sdks/v3.0.0rc0/paddlex_hps_image_classification.tar.gz">paddlex_hps_image_classification.tar.gz</a></td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hps/public/sdks/v3.0.0rc0/paddlex_hps_image_classification_sdk.tar.gz">paddlex_hps_image_classification_sdk.tar.gz</a></td>
 </tr>
 <tr>
 <td>通用目标检测</td>
@@ -263,7 +263,7 @@ paddlex --serve --pipeline image_classification --use_hpip
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipeline_deploy/image-2.png">
 
-**请注意**：每个序列号只能绑定到唯一的设备指纹，且只能绑定一次。这意味着用户如果使用不同的机器部署产线，则必须为每台机器准备单独的序列号。
+**请注意**：每个序列号只能绑定到唯一的设备指纹，且只能绑定一次。这意味着用户如果使用不同的机器部署产线，则必须为每台机器准备单独的序列号。**高稳定性服务化部署完全免费。**PaddleX 的鉴权机制核心在于统计各产线的部署数量，并通过数据建模为团队提供产线效能分析，以便进行资源的优化配置和重点产线效率的提升。需要特别说明的是，鉴权过程只使用硬盘分区 UUID 等非敏感信息，PaddleX 也并不采集设备遥测数据等敏感数据，因此理论上**鉴权服务器无法获取到任何敏感信息**。
 
 ### 2.3 调整配置
 
@@ -327,30 +327,32 @@ PaddleX 高稳定性服务化部署方案基于 NVIDIA Triton Inference Server
 ```bash
 docker run \
     -it \
+    -e PADDLEX_HPS_DEVICE_TYPE={部署设备类型} \
+    -e PADDLEX_HPS_SERIAL_NUMBER={序列号} \
+    -e PADDLEX_HPS_UPDATE_LICENSE=1 \
     -v "$(pwd)":/workspace \
     -v "${HOME}/.baidu/paddlex/licenses":/root/.baidu/paddlex/licenses \
     -v /dev/disk/by-uuid:/dev/disk/by-uuid \
     -w /workspace \
-    -e PADDLEX_HPS_DEVICE_TYPE={部署设备类型} \
-    -e PADDLEX_HPS_SERIAL_NUMBER={序列号} \
     --rm \
     --gpus all \
     --network host \
     --shm-size 8g \
     {镜像名称} \
-    ./server.sh
+    /bin/bash server.sh
 ```
 
 - 部署设备类型可以为 `cpu` 或 `gpu`，CPU-only 镜像仅支持 `cpu`。
 - 如果希望使用 CPU 部署，则不需要指定 `--gpus`。
 - 以上命令必须在激活成功后才可以正常执行。PaddleX 提供两种激活方式：离线激活和在线激活。具体说明如下：
 
-    - 联网激活：在命令中添加 `-e PADDLEX_HPS_UPDATE_LICENSE=1`，使程序自动完成激活。
-    - 离线激活：按照序列号管理部分中的指引，获取机器的设备指纹，并将序列号与设备指纹绑定以获取证书，完成激活。使用这种激活方式，需要手动将证书存放在机器的 `${HOME}/.baidu/paddlex/licenses` 目录中（如果目录不存在，需要创建目录）。
+    - 联网激活：在第一次执行时设置 `PADDLEX_HPS_UPDATE_LICENSE` 为 `1`，使程序自动更新证书并完成激活。再次执行命令时可以将 `PADDLEX_HPS_UPDATE_LICENSE` 设置为 `0` 以避免联网更新证书。
+    - 离线激活：按照序列号管理部分中的指引，获取机器的设备指纹，并将序列号与设备指纹绑定以获取证书，完成激活。使用这种激活方式，需要手动将证书存放在机器的 `${HOME}/.baidu/paddlex/licenses` 目录中（如果目录不存在，需要创建目录）。使用这种方式时，将 `PADDLEX_HPS_UPDATE_LICENSE` 设置为 `0` 以避免联网更新证书。
 
 - 必须确保宿主机的 `/dev/disk/by-uuid` 存在且非空，并正确挂载该目录，才能正常执行激活。
-- 如果需要进入容器内部调试，可以将命令中的 `./server.sh` 替换为 `/bin/bash`，在容器中执行 `./server.sh`。
+- 如果需要进入容器内部调试，可以将命令中的 `/bin/bash server.sh` 替换为 `/bin/bash`，然后在容器中执行 `/bin/bash server.sh`。
 - 如果希望服务器在后台运行，可以将命令中的 `-it` 替换为 `-d`。容器启动后，可通过 `docker logs -f {容器 ID}` 查看容器日志。
+- 在命令中添加 `-e PADDLEX_USE_HPIP=1` 可以使用 PaddleX 高性能推理插件加速产线推理过程。但请注意，并非所有产线都支持使用高性能推理插件。请参考 [PaddleX 高性能推理指南](./high_performance_inference.md) 获取更多信息。
 
 可观察到类似下面的输出信息：
 
@@ -368,8 +370,8 @@ I1216 11:37:21.643494 35 http_server.cc:167] Started Metrics Service at 0.0.0.0:
 
 ```bash
 # 建议在虚拟环境中安装
-python -m pip install paddlex_hps_client-*.whl
 python -m pip install -r requirements.txt
+python -m pip install paddlex_hps_client-*.whl
 ```
 
 `client` 目录的 `client.py` 脚本包含服务的调用示例，并提供命令行接口。

diff --git a/docs/pipeline_usage/tutorials/cv_pipelines/face_recognition.en.md b/docs/pipeline_usage/tutorials/cv_pipelines/face_recognition.en.md
@@ -718,9 +718,9 @@ Below is the API reference for basic service deployment and multi-language servi
 <td>The key corresponding to the index, used to identify the created index. It can be used as input for other operations.</td>
 </tr>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -791,9 +791,9 @@ Below is the API reference for basic service deployment and multi-language servi
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -842,9 +842,9 @@ Below is the API reference for basic service deployment and multi-language servi
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -1014,7 +1014,7 @@ if resp_index_build.status_code != 200:
     pprint.pp(resp_index_build.json())
     sys.exit(1)
 result_index_build = resp_index_build.json()["result"]
-print(f"Number of images indexed: {len(result_index_build['idMap'])}")
+print(f"Number of images indexed: {result_index_build['imageCount']}")
 
 for pair in image_label_pairs_to_add:
     with open(pair["image"], "rb") as file:
@@ -1029,7 +1029,7 @@ if resp_index_add.status_code != 200:
     pprint.pp(resp_index_add.json())
     sys.exit(1)
 result_index_add = resp_index_add.json()["result"]
-print(f"Number of images indexed: {len(result_index_add['idMap'])}")
+print(f"Number of images indexed: {result_index_add['imageCount']}")
 
 payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
 resp_index_remove = requests.post(f"{API_BASE_URL}/face-recognition-index-remove", json=payload)
@@ -1038,7 +1038,7 @@ if resp_index_remove.status_code != 200:
     pprint.pp(resp_index_remove.json())
     sys.exit(1)
 result_index_remove = resp_index_remove.json()["result"]
-print(f"Number of images indexed: {len(result_index_remove['idMap'])}")
+print(f"Number of images indexed: {result_index_remove['imageCount']}")
 
 with open(infer_image_path, "rb") as file:
     image_bytes = file.read()

diff --git a/docs/pipeline_usage/tutorials/cv_pipelines/face_recognition.md b/docs/pipeline_usage/tutorials/cv_pipelines/face_recognition.md
@@ -714,9 +714,9 @@ data_root             # 数据集根目录，目录名称可以改变
 <td>索引对应的键，用于标识建立的索引。可用作其他操作的输入。</td>
 </tr>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>向量ID到标签的映射。</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>索引的图像数量。</td>
 </tr>
 </tbody>
 </table>
@@ -787,9 +787,9 @@ data_root             # 数据集根目录，目录名称可以改变
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>向量ID到标签的映射。</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>索引的图像数量。</td>
 </tr>
 </tbody>
 </table>
@@ -838,9 +838,9 @@ data_root             # 数据集根目录，目录名称可以改变
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>向量ID到标签的映射。</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>索引的图像数量。</td>
 </tr>
 </tbody>
 </table>
@@ -1011,7 +1011,7 @@ if resp_index_build.status_code != 200:
     pprint.pp(resp_index_build.json())
     sys.exit(1)
 result_index_build = resp_index_build.json()["result"]
-print(f"Number of images indexed: {len(result_index_build['idMap'])}")
+print(f"Number of images indexed: {result_index_build['imageCount']}")
 
 for pair in image_label_pairs_to_add:
     with open(pair["image"], "rb") as file:
@@ -1026,7 +1026,7 @@ if resp_index_add.status_code != 200:
     pprint.pp(resp_index_add.json())
     sys.exit(1)
 result_index_add = resp_index_add.json()["result"]
-print(f"Number of images indexed: {len(result_index_add['idMap'])}")
+print(f"Number of images indexed: {result_index_add['imageCount']}")
 
 payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
 resp_index_remove = requests.post(f"{API_BASE_URL}/face-recognition-index-remove", json=payload)
@@ -1035,7 +1035,7 @@ if resp_index_remove.status_code != 200:
     pprint.pp(resp_index_remove.json())
     sys.exit(1)
 result_index_remove = resp_index_remove.json()["result"]
-print(f"Number of images indexed: {len(result_index_remove['idMap'])}")
+print(f"Number of images indexed: {result_index_remove['imageCount']}")
 
 with open(infer_image_path, "rb") as file:
     image_bytes = file.read()

diff --git a/docs/pipeline_usage/tutorials/cv_pipelines/general_image_recognition.en.md b/docs/pipeline_usage/tutorials/cv_pipelines/general_image_recognition.en.md
@@ -679,9 +679,9 @@ Below is the API reference for basic service deployment and multi-language servi
 <td>The key corresponding to the index, used to identify the created index. It can be used as input for other operations.</td>
 </tr>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -752,9 +752,9 @@ Below is the API reference for basic service deployment and multi-language servi
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -803,9 +803,9 @@ Below is the API reference for basic service deployment and multi-language servi
 </thead>
 <tbody>
 <tr>
-<td><code>idMap</code></td>
-<td><code>object</code></td>
-<td>Mapping from vector IDs to labels.</td>
+<td><code>imageCount</code></td>
+<td><code>integer</code></td>
+<td>The number of images indexed.</td>
 </tr>
 </tbody>
 </table>
@@ -975,7 +975,7 @@ if resp_index_build.status_code != 200:
     pprint.pp(resp_index_build.json())
     sys.exit(1)
 result_index_build = resp_index_build.json()["result"
-print(f"Number of images indexed: {len(result_index_build['idMap'])}")
+print(f"Number of images indexed: {result_index_build['imageCount']}")
 
 for pair in image_label_pairs_to_add:
     with open(pair["image"], "rb") as file:
@@ -990,7 +990,7 @@ if resp_index_add.status_code != 200:
     pprint.pp(resp_index_add.json())
     sys.exit(1)
 result_index_add = resp_index_add.json()["result"]
-print(f"Number of images indexed: {len(result_index_add['idMap'])}")
+print(f"Number of images indexed: {result_index_add['imageCount']}")
 
 payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
 resp_index_remove = requests.post(f"{API_BASE_URL}/shitu-index-remove", json=payload)
@@ -999,7 +999,7 @@ if resp_index_remove.status_code != 200:
     pprint.pp(resp_index_remove.json())
     sys.exit(1)
 result_index_remove = resp_index_remove.json()["result"]
-print(f"Number of images indexed: {len(result_index_remove['idMap'])}")
+print(f"Number of images indexed: {result_index_remove['imageCount']}")
 
 with open(infer_image_path, "rb") as file:
     image_bytes = file.read()