Skip to content

Commit

Permalink
update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
TeslaZhao committed Nov 14, 2021
1 parent bbe8fba commit 2d707c5
Show file tree
Hide file tree
Showing 67 changed files with 205 additions and 337 deletions.
62 changes: 31 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-


- Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection]()
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md)
- Support multiple [protocols]() such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
Expand All @@ -54,46 +54,46 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
This chapter guides you through the installation and deployment steps. It is strongly recommended to use Docker to deploy Paddle Serving. If you do not use docker, ignore the docker-related steps. Paddle Serving can be deployed on cloud servers using Kubernetes, running on many commonly hardwares such as ARM CPU, Intel CPU, Nvidia GPU, Kunlun XPU. The latest development kit of the develop branch is compiled and generated every day for developers to use.

- [Install Paddle Serving using docker](doc/Install.md)
- [Build Paddle Serving from Source with Docker](doc/COMPILE.md)
- [Deploy Paddle Serving on Kubernetes](doc/PADDLE_SERVING_ON_KUBERNETES.md)
- [Deploy Paddle Serving with Security gateway](doc/SERVING_AUTH_DOCKER.md)
- [Deploy Paddle Serving on more hardwares](doc/BAIDU_KUNLUN_XPU_SERVING.md)
- [Latest Wheel packages](doc/LATEST_PACKAGES.md)(Update everyday on branch develop)
- [Install Paddle Serving using docker](doc/Install_EN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_EN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker.md)
- [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)

> Use
The first step is to call the model save interface to generate a model parameter configuration file (.prototxt), which will be used on the client and server. The second step, read the configuration and startup parameters and start the service. According to API documents and your case, the third step is to write client requests based on the SDK, and test the inference service.

- [Quick Start](doc/QuickStart.md)
- [Save a servable model](doc/SAVE.md)
- [Description of configuration and startup parameters](doc/SERVING_CONFIGURE.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/HTTP_SERVICE_CN.md)
- [Infer on quantizative models](doc/LOW_PRECISION_DEPLOYMENT_CN.md)
- [Data format of classic models](doc/PROCESS_DATA.md)
- [C++ Serving](doc/C++Serving/Introduction_EN.md)
- [Hot loading models](doc/C++Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++Serving/ABTest_EN.md)
- [Encryption](doc/C++Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++Serving/Benchmark_CN.md)
- [Python Pipeline](doc/python_server/PIPELINE_SERVING_CN.md)
- [Analyze and optimize performance](doc/python_server/PIPELINE_SERVING_CN.md)
- [Benchmark(Chinese)](doc/python_server/BENCHMARKING_GPU.md)
- [Client SDK]()
- [Python SDK](doc/PYTHON_SDK_CN.md)
- [JAVA SDK](doc/JAVA_SDK.md)
- [C++ SDK](doc/C++_SDK_CN.md)
- [Large-scale sparse parameter server](doc/CUBE_LOCAL_EN.md)
- [Quick Start](doc/Quick_Start_EN.md)
- [Save a servable model](doc/Save_EN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Http_Service_EN.md)
- [Infer on quantizative models](doc/Low_Precision_CN.md)
- [Data format of classic models](doc/Process_Data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_EN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
- Client SDK
- [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
- [JAVA SDK](doc/Java_SDK_EN.md)
- [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)

<br>

> Developers
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/OP_EN.md)
- [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ.md)
- [FAQ(Chinese)](doc/FAQ_CN.md)

<h2 align="center">Model Zoo</h2>

Expand All @@ -108,7 +108,7 @@ Paddle Serving works closely with the Paddle model suite, and implements a large

</center>

For more model examples, read [Model zoo](doc/Model_Zoo.md)
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)

<center class="half">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="280"/> <img src="https://github.com/PaddlePaddle/PaddleDetection/raw/release/2.3/docs/images/road554.png" width="160"/>
Expand All @@ -132,7 +132,7 @@ If you want to communicate with developers and other users? Welcome to join us,

> Contribution
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/CONTRIBUTE.md)
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md)

- Special Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
- Special Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark
Expand Down
57 changes: 29 additions & 28 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@
Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:

- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python2个框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型]()
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
- 支持HTTP、gRPC、bRPC等多种[协议](链接protocol文档);提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理
- 提供一套模型安全部署解决方案,包括加密模型部署、鉴权校验、HTTPs安全网关,并在实际项目中应用
- 支持云端部署,提供百度云智能云kubernetes集群部署Paddle Serving案例
- 提供丰富的经典预模型部署示例,如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件,共计40多个预训练精品模型,更多模型持续扩展
- 提供丰富的经典预模型部署示例,如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件,共计40+个预训练精品模型,更多模型持续扩展
- 支持大规模稀疏参数索引模型分布式部署,具有多表、多分片、多副本、本地高频cache等特性、可单机或云端部署


Expand All @@ -54,41 +54,42 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/COMPILE_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/PADDLE_SERVING_ON_KUBERNETES.md)
- [部署Paddle Serving安全网关](doc/SERVING_AUTH_DOCKER.md)
- [在异构硬件部署Paddle Serving](doc/BAIDU_KUNLUN_XPU_SERVING_CN.md)
- [最新Wheel开发包](doc/LATEST_PACKAGES.md)(develop分支每日更新)
- [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker.md)
- [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)

> 使用
安装Paddle Serving后,使用快速开始将引导您运行Serving。第一步,调用模型保存接口,生成模型参数配置文件(.prototxt)用以在客户端和服务端使用;第二步,阅读配置和启动参数并启动服务;第三步,根据API和您的使用场景,基于SDK编写客户端请求,并测试推理服务。您想了解跟多特性的使用场景和方法,请详细阅读以下文档。
- [快速开始](doc/QuickStart_CN.md)
- [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/SAVE_CN.md)
- [配置和启动参数的说明](doc/SERVING_CONFIGURE.md)
- [RESTful/gRPC/bRPC API指南](doc/HTTP_SERVICE_CN.md)
- [低精度推理](doc/LOW_PRECISION_DEPLOYMENT_CN.md)
- [常见模型数据处理](doc/PROCESS_DATA.md)
- [C++ Serving简介](doc/C++Serving/Introduction_CN.md)
- [模型热加载](doc/C++Serving/Hot_Loading_CN.md)
- [A/B Test](doc/C++Serving/ABTest_CN.md)
- [加密模型推理服务](doc/C++Serving/Encryption_CN.md)
- [性能优化指南](doc/C++Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/python_server/PIPELINE_SERVING_CN.md)
- [性能优化指南](doc/python_server/PIPELINE_SERVING_CN.md)
- [客户端SDK]()
- [Python SDK](doc/PYTHON_SDK_CN.md)
- [JAVA SDK](doc/JAVA_SDK_CN.md)
- [C++ SDK](doc/C++_SDK_CN.md)
- [大规模稀疏参数索引服务](doc/CUBE_LOCAL_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Http_Service_CN.md)
- [低精度推理](doc/Low_Precision_CN.md)
- [常见模型数据处理](doc/Process_data_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md)
- [模型热加载](doc/C++_Serving/Hot_Loading_CN.md)
- [A/B Test](doc/C++_Serving/ABTest_CN.md)
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
- 客户端SDK
- [Python SDK](doc/C++_Serving/Http_Service_CN.md)
- [JAVA SDK](doc/Java_SDK_CN.md)
- [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)

> 开发者
为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/OP_CN.md)
- [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ.md)
- [常见问答](doc/FAQ_CN.md)

<h2 align="center">模型库</h2>

Expand Down Expand Up @@ -126,7 +127,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,

> 贡献代码
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/CONTRIBUTE.md)
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute.md)

- 特别感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
- 特别感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ service ImageClassifyService {

#### 2.2.2 示例配置

关于Serving端的配置的详细信息,可以参考[Serving端配置](../SERVING_CONFIGURE_CN.md)
关于Serving端的配置的详细信息,可以参考[Serving端配置](../Serving_Configure_CN.md)

以下配置文件将ReaderOP, ClassifyOP和WriteJsonOP串联成一个workflow (关于OP/workflow等概念,可参考[OP介绍](OP_CN.md)[DAG介绍](DAG_CN.md))

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Server端的核心是一个由项目代码编译产生的名称为serving的二

为了方便用户快速的启动C++ Serving的Server端,除了用户自行修改配置文件并通过命令行传参运行serving二进制可执行文件以外,我们也提供了另外一种通过python脚本启动的方式。python脚本启动本质上仍是运行serving二进制可执行文件,但python脚本中会自动完成两件事:1、配置文件的生成;2、根据需要配置的参数,生成命令行,通过命令行的方式,传入参数信息并运行serving二进制可执行文件。

更多详细说明和示例,请参考[C++ Serving 参数配置和启动的详细说明](../SERVING_CONFIGURE_CN.md)
更多详细说明和示例,请参考[C++ Serving 参数配置和启动的详细说明](../Serving_Configure_CN.md)

### 3.2 同步/异步模式
同步模式比较简单直接,适用于模型预测时间短,单个Request请求的batch已经比较大的情况。
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion doc/COMPILE_CN.md → doc/Compile_CN.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 如何编译PaddleServing

(简体中文|[English](./COMPILE.md))
(简体中文|[English](./Compile_EN.md))

## 编译环境设置

Expand Down
7 changes: 3 additions & 4 deletions doc/COMPILE.md → doc/Compile_EN.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# How to compile PaddleServing

([简体中文](./COMPILE_CN.md)|English)
([简体中文](./Compile_CN.md)|English)

## Compilation environment requirements

Expand All @@ -23,7 +23,7 @@
| libSM | 1.2.2 |
| libXrender | 0.9.10 |

It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](Docker_Images_EN.md).

## Get Code

Expand Down Expand Up @@ -159,8 +159,7 @@ cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR/ \
-DSERVER=ON ..
make -j10
```

**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Notes).
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](Compile_EN.md#Notes).

## Compile Client

Expand Down
2 changes: 1 addition & 1 deletion doc/CONTRIBUTE.md → doc/Contribute_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu

1. Build and test

Users can build Paddle Serving natively on Linux, see the [BUILD steps](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md).
Users can build Paddle Serving natively on Linux, see the [BUILD steps](Compile_EN.md).

1. Keep pulling

Expand Down
6 changes: 3 additions & 3 deletions doc/CUBE_LOCAL_CN.md → doc/Cube_Local_CN.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# 稀疏参数索引服务Cube单机版使用指南

(简体中文|[English](./CUBE_LOCAL.md))
(简体中文|[English](./Cube_Local_EN.md))

## 引言

在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。

<!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](CUBE_LOCAL_CN.md)(正在建设中)。-->
<!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](Cube_Local_CN.md)(正在建设中)。-->

本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md)
本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./Cube_Quant_CN.md)


## 示例
Expand Down
Loading

0 comments on commit 2d707c5

Please sign in to comment.