cherry-pick

PaddlePaddle · May 23, 2022 · 0c3bcf2 · 0c3bcf2
1 parent 935597b
commit 0c3bcf2
Show file tree

Hide file tree

Showing 44 changed files with 2,983 additions and 2,652 deletions.
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@
 The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows:
 
 
-- Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
+- Integrate high-performance server-side inference engine [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) and mobile-side engine [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html). Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
 - There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
 - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
 - Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc.
@@ -40,13 +40,17 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
 - Support service monitoring, provide prometheus-based performance statistics and port access
 
 
-<h2 align="center">Tutorial and Papers</h2>
-
+<h2 align="center">Tutorial and Solutions</h2>
 
 - AIStudio tutorial(Chinese) : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
 - AIStudio OCR practice(Chinese) : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
 - Video tutorial(Chinese) : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
 - Edge AI solution(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- GOVT Q&A Solution(Chinese) : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
+- Smart Q&A Solution(Chinese) : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
+- Semantic Indexing Solution(Chinese) : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
+
+<h2 align="center">Papers</h2>
 
 - Paper : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
 Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
@@ -67,6 +71,7 @@ This chapter guides you through the installation and deployment steps. It is str
 
 - [Install Paddle Serving using docker](doc/Install_EN.md)
 - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
+- [Install Paddle Serving on linux system](doc/Install_Linux_Env_CN.md)
 - [Deploy Paddle Serving on Kubernetes(Chinese)](doc/Run_On_Kubernetes_CN.md)
 - [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
 - Deploy on more hardwares[[ARM CPU、百度昆仑](doc/Run_On_XPU_EN.md)、[华为昇腾](doc/Run_On_NPU_CN.md)、[海光DCU](doc/Run_On_DCU_CN.md)、[Jetson](doc/Run_On_JETSON_CN.md)]
@@ -93,10 +98,11 @@ The first step is to call the model save interface to generate a model parameter
   - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
   - [Multiple models in series(Chinese)](doc/C++_Serving/2+_model.md)
   - [Request Cache(Chinese)](doc/C++_Serving/Request_Cache_CN.md)
-- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
-  - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
-  - [TensorRT dynamic Shape](doc/TensorRT_Dynamic_Shape_EN.md)
-  - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
+- [Python Pipeline Overview(Chinese)](doc/Python_Pipeline/Pipeline_Int_CN.md)
+  - [Architecture Design(Chinese)](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [Core Features(Chinese)](doc/Python_Pipeline/Pipeline_Features_CN.md)
+  - [Performance Optimization(Chinese)](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
+  - [Benchmark(Chinese)](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
 - Client SDK
   - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
   - [JAVA SDK](doc/Java_SDK_EN.md)

diff --git a/README_CN.md b/README_CN.md
@@ -24,27 +24,32 @@
 
 ***
 
-Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：
-
-- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
-- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
-- 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供C++、Python、Java语言SDK
-- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
-- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU、华为昇腾310/910、海光DCU、Nvidia Jetson等多种硬件
-- 集成Intel MKLDNN、Nvidia TensorRT加速库，以及低精度和量化推理
-- 提供一套模型安全部署解决方案，包括加密模型部署、鉴权校验、HTTPs安全网关，并在实际项目中应用
-- 支持云端部署，提供百度云智能云kubernetes集群部署Paddle Serving案例
-- 提供丰富的经典模型部署示例，如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件，共计40+个预训练精品模型
-- 支持大规模稀疏参数索引模型分布式部署，具有多表、多分片、多副本、本地高频cache等特性、可单机或云端部署
+Paddle Serving 依托深度学习框架 PaddlePaddle 旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：
+
+- 集成高性能服务端推理引擎 [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) 和端侧引擎 [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html)，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过 [x2paddle](https://github.com/PaddlePaddle/X2Paddle) 工具迁移模型
+- 具有高性能 C++ Serving 和高易用 Python Pipeline 2套框架。C++ Serving 基于高性能 bRPC 网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python Pipeline 基于 gRPC/gRPC-Gateway 网络框架和 Python 语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
+- 支持 HTTP、gRPC、bRPC 等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供 C++、Python、Java 语言 SDK
+- 设计并实现基于有向无环图(DAG) 的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
+- 适配 x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑 XPU、华为昇腾310/910、海光 DCU、Nvidia Jetson 等多种硬件
+- 集成 Intel MKLDNN、Nvidia TensorRT 加速库，以及低精度量化推理
+- 提供一套模型安全部署解决方案，包括加密模型部署、鉴权校验、HTTPs 安全网关，并在实际项目中应用
+- 支持云端部署，提供百度云智能云 kubernetes 集群部署 Paddle Serving 案例
+- 提供丰富的经典模型部署示例，如 PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec 等套件，共计40+个预训练精品模型
+- 支持大规模稀疏参数索引模型分布式部署，具有多表、多分片、多副本、本地高频 cache 等特性、可单机或云端部署
 - 支持服务监控，提供基于普罗米修斯的性能数据统计及端口访问
 
 
-<h2 align="center">教程与论文</h2>
+<h2 align="center">教程与案例</h2>
 
 - AIStudio 使用教程 : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
-- AIStudio OCR实战 : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
+- AIStudio OCR 实战 : [基于Paddle Serving的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
 - 视频教程 : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
-- 边缘AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- 边缘 AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- 政务问答解决方案 : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
+- 智能问答解决方案 : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
+- 语义索引解决方案 : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
+
+<h2 align="center">论文</h2>
 
 - 论文 : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
 Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
@@ -61,13 +66,14 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
 > 部署
 
 此章节引导您完成安装和部署步骤，强烈推荐使用Docker部署Paddle Serving，如您不使用docker，省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
-- [使用docker安装Paddle Serving](doc/Install_CN.md)
-- [源码编译安装Paddle Serving](doc/Compile_CN.md)
-- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
-- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
+- [使用 Docker 安装 Paddle Serving](doc/Install_CN.md)
+- [Linux 原生系统安装 Paddle Serving](doc/Install_Linux_Env_CN.md)
+- [源码编译安装 Paddle Serving](doc/Compile_CN.md)
+- [Kuberntes集群部署 Paddle Serving](doc/Run_On_Kubernetes_CN.md)
+- [部署 Paddle Serving 安全网关](doc/Serving_Auth_Docker_CN.md)
 - 异构硬件部署[[ARM CPU、百度昆仑](doc/Run_On_XPU_CN.md)、[华为昇腾](doc/Run_On_NPU_CN.md)、[海光DCU](doc/Run_On_DCU_CN.md)、[Jetson](doc/Run_On_JETSON_CN.md)]
-- [Docker镜像](doc/Docker_Images_CN.md)
-- [下载Wheel包](doc/Latest_Packages_CN.md)
+- [Docker 镜像列表](doc/Docker_Images_CN.md)
+- [下载 Python Wheels](doc/Latest_Packages_CN.md)
 
 > 使用
 
@@ -79,7 +85,9 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
 - [低精度推理](doc/Low_Precision_CN.md)
 - [常见模型数据处理](doc/Process_data_CN.md)
 - [普罗米修斯](doc/Prometheus_CN.md)
-- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) 
+- [设置 TensorRT 动态shape](doc/TensorRT_Dynamic_Shape_CN.md)
+- [C++ Serving 概述](doc/C++_Serving/Introduction_CN.md)
+  - [异步框架](doc/C++_Serving/Asynchronous_Framwork_CN.md) 
   - [协议](doc/C++_Serving/Inference_Protocols_CN.md)
   - [模型热加载](doc/C++_Serving/Hot_Loading_CN.md)
   - [A/B Test](doc/C++_Serving/ABTest_CN.md)
@@ -88,10 +96,11 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
   - [性能指标](doc/C++_Serving/Benchmark_CN.md)
   - [多模型串联](doc/C++_Serving/2+_model.md)
   - [请求缓存](doc/C++_Serving/Request_Cache_CN.md)
-- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
-  - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
-  - [TensorRT动态shape](doc/TensorRT_Dynamic_Shape_CN.md)
-  - [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
+- [Python Pipeline 概述](doc/Python_Pipeline/Pipeline_Int_CN.md)
+  - [框架设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [核心功能](doc/Python_Pipeline/Pipeline_Features_CN.md)
+  - [性能优化](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
+  - [性能指标](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
 - 客户端SDK
   - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
   - [JAVA SDK](doc/Java_SDK_CN.md)
@@ -107,13 +116,13 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
 
 <h2 align="center">模型库</h2>
 
-Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例，以及Paddle全链条项目，共计45个模型。
+Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例，以及Paddle全链条项目，共计47个模型。
 
 <p align="center">
 
 | PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video |
 | :----:  | :----: | :----: | :----: | :----: | :----: | :----: | 
-| 8 | 12 | 14 | 2 | 3 | 6 | 1 | 
+| 8 | 12 | 14 | 2 | 3 | 7 | 1 | 
 
 </p>
 
@@ -147,6 +156,7 @@ Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，
 > 贡献代码
 
 如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
+- 感谢 [@w5688414](https://github.com/w5688414) 提供 NLP Ernie Indexing 案例
 - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
 - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程，更新FAQ教程，整理文件目录。

diff --git a/cmake/paddlepaddle.cmake b/cmake/paddlepaddle.cmake
@@ -39,7 +39,7 @@ if (WITH_GPU)
         set(WITH_TRT ON)
     elseif(CUDA_VERSION EQUAL 10.2)
         if(CUDNN_MAJOR_VERSION EQUAL 7)
-            set(CUDA_SUFFIX "x86-64_gcc5.4_avx_mkl_cuda10.2_cudnn7.6.5_trt6.0.1.5")
+            set(CUDA_SUFFIX "x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn7.6.5_trt6.0.1.5")
             set(WITH_TRT ON)
         elseif(CUDNN_MAJOR_VERSION EQUAL 8)
             set(CUDA_SUFFIX "x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4")