change the directory legacy to slm (PaddlePaddle#9311)

gongel · Oct 25, 2024 · 75f44ef · 75f44ef
1 parent ff7ee8a
commit 75f44ef
Show file tree

Hide file tree

Showing 2,107 changed files with 6,427 additions and 5,331 deletions.
diff --git a/.github/workflows/pipelines.yml b/.github/workflows/pipelines.yml
@@ -3,10 +3,10 @@ name: Pipelines-Test
 on:
   push:
     paths:
-      - 'legacy/pipelines/*'
+      - 'slm/pipelines/*'
   pull_request:
     paths:
-      - 'legacy/pipelines/*'
+      - 'slm/pipelines/*'
 
 
 jobs:
@@ -20,11 +20,11 @@ jobs:
           python-version: '3.10'
           cache: 'pip' # caching pip dependencies
       - name: Install dependencies
-        working-directory: ./legacy/pipelines
+        working-directory: ./slm/pipelines
         run: |
           python -m pip install --upgrade pip
           make install
           pip install -r tests/requirements.txt
       - name: run the command
-        working-directory: ./legacy/pipelines
+        working-directory: ./slm/pipelines
         run: make test
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,4 +1,4 @@
-exclude: 'legacy/model_zoo/gpt-3'
+exclude: 'slm/model_zoo/gpt-3'
 repos:
 # For Python files
 -   repo: https://github.com/psf/black.git

diff --git a/README.md b/README.md
@@ -208,8 +208,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.py
 
 更多 PaddleNLP 内容可参考：
 
-* [精选模型库](./legacy/model_zoo)，包含优质预训练模型的端到端全流程使用。
-* [多场景示例](./legacy/examples)，了解如何使用 PaddleNLP 解决 NLP 多种技术问题，包含基础技术、系统应用与拓展应用。
+* [精选模型库](./slm/model_zoo)，包含优质预训练模型的端到端全流程使用。
+* [多场景示例](./slm/examples)，了解如何使用 PaddleNLP 解决 NLP 多种技术问题，包含基础技术、系统应用与拓展应用。
 * [交互式教程](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)，在🆓免费算力平台 AI Studio 上快速学习 PaddleNLP。
 
 ------------------------------------------------------------------------------------------

diff --git a/README_en.md b/README_en.md
@@ -122,8 +122,8 @@ For more steps in the entire large model process, please refer to the[Large Mode
 
 For more PaddleNLP content, please refer to:
 
-* [Model Library](./legacy/model_zoo)，which includes end-to-end usage of high-quality pre-trained models.
-* [Multi-scenario Examples](./legacy/examples)，to understand how to use PaddleNLP to solve various NLP technical problems, including basic techniques, system applications, and extended applications.
+* [Model Library](./slm/model_zoo)，which includes end-to-end usage of high-quality pre-trained models.
+* [Multi-scenario Examples](./slm/examples)，to understand how to use PaddleNLP to solve various NLP technical problems, including basic techniques, system applications, and extended applications.
 * [Interactive Tutorial](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)，to quickly learn PaddleNLP on the free computing platform AI Studio.
 
 ------------------------------------------------------------------------------------------

diff --git a/docs/FAQ.md b/docs/FAQ.md
diff --git a/docs/advanced_guide/prompt.md b/docs/advanced_guide/prompt.md
@@ -31,7 +31,7 @@ Prompt API 提供了这类算法实现的基本模块，支持[PET](https://arxi
     * [数据准备](#数据准备)
     * [预训练参数准备](#预训练参数准备)
     * [定义提示学习模型](#定义提示学习模型)
-    * [使用PromptTrainer训练](#使用PromptTrainer训练)
+    * [使用 PromptTrainer 训练](#使用 PromptTrainer 训练)
 * [实践教程](#实践教程)
     * [文本分类示例](#文本分类示例)
     * 其他任务示例（待更新）
@@ -486,13 +486,13 @@ prompt_model = PromptModelForSequenceClassification(model,
 - ``freeze_dropout`` : 在训练时固定预训练模型参数并关闭 ``dropout`` 。 当 ``freeze_dropout=True`` ，``freeze_plm`` 也为 ``True`` 。
 
 
-### 使用PromptTrainer训练
+### 使用 PromptTrainer 训练
 
 ``PromptTrainer`` 继承自 ``Trainer`` ， 封装了数据处理，模型训练、测试，训练策略等，便于训练流程的快速搭建。
 
 **配置训练参数**
 
-``PromptTuningArguments`` 继承自 ``TrainingArguments`` ，包含了提示学习的主要训练参数。其中 ``TrainingArguments`` 参数见 `Trainer API 文档 <https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md>`_ ，其余参数详见 [Prompt Trainer参数列表](#PromptTrainer参数列表) 。推荐使用 **命令行** 的形式进行参数配置，即
+``PromptTuningArguments`` 继承自 ``TrainingArguments`` ，包含了提示学习的主要训练参数。其中 ``TrainingArguments`` 参数见 `Trainer API 文档 <https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md>`_ ，其余参数详见 [Prompt Trainer 参数列表](#PromptTrainer 参数列表) 。推荐使用 **命令行** 的形式进行参数配置，即
 
 ```shell
 python xxx.py --output_dir xxx --learning_rate xxx
@@ -561,11 +561,11 @@ if training_args.do_train:
 ### 文本分类示例
 
 
-- [多分类文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/applications/text_classification/multi_class/few-shot)
+- [多分类文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/applications/text_classification/multi_class/few-shot)
 
-- [多标签文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/applications/text_classification/multi_label/few-shot)
+- [多标签文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/applications/text_classification/multi_label/few-shot)
 
-- [多层次文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/applications/text_classification/hierarchical/few-shot)
+- [多层次文本分类示例](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/applications/text_classification/hierarchical/few-shot)
 
 
 ## Reference
@@ -581,7 +581,7 @@ if training_args.do_train:
 ### 附录
 
 
-#### PromptTrainer参数列表
+#### PromptTrainer 参数列表
 
 
 | 参数              |  类型  | 默认值   |   含义                                                   |

diff --git a/docs/compression.md b/docs/compression.md
@@ -1,21 +1,21 @@
 # PaddleNLP 模型压缩 API
 
  **目录**
-   * [模型压缩 API 功能简介](#模型压缩API功简介)
+   * [模型压缩 API 功能简介](#模型压缩 API 功简介)
    * [三大场景快速启动模型压缩示例](#三大场景快速启动模型压缩示例)
    * [四步启动模型压缩](#四步启动模型压缩)
-       * [Step1：获取模型压缩参数 compression_args](#获取模型压缩参数compression_args)
-       * [Step2：实例化 Trainer 并调用 compress()](#实例化Trainer并调用compress())
-           * [Trainer 实例化参数介绍](#Trainer实例化参数介绍)
+       * [Step1：获取模型压缩参数 compression_args](#获取模型压缩参数 compression_args)
+       * [Step2：实例化 Trainer 并调用 compress()](#实例化 Trainer 并调用 compress())
+           * [Trainer 实例化参数介绍](#Trainer 实例化参数介绍)
        * [Step3：实现自定义评估函数（按需可选）](#实现自定义评估函数（按需可选）)
        * [Step4：传参并运行压缩脚本](#传参并运行压缩脚本)
-           * [CompressionArguments 参数介绍](#CompressionArguments参数介绍)
+           * [CompressionArguments 参数介绍](#CompressionArguments 参数介绍)
    * [模型评估与部署](#模型评估与部署)
    * [FAQ](#FAQ)
    * [参考文献](#References)
 
 
-<a name="模型压缩API功能简介"></a>
+<a name="模型压缩 API 功能简介"></a>
 
 ## 模型压缩 API 功能简介
 
@@ -35,7 +35,7 @@ PaddleNLP 模型压缩 API 功能支持对 ERNIE 类下游任务上微调后的
 | ERNIE 3.0-Medium+裁剪+FP32 | 1424.01(1.3x) | 57.31(-0.14) | 454.27(1.2x)  | 93.27(+0.23)  | 183.77(1.3x)  | 65.92(-1.03)  |
 | ERNIE 3.0-Medium+裁剪+INT8 | 3635.48(3.2x) | 57.26(-0.19) | 1105.26(3.0x) | 93.20(+0.16)  | 444.27(3.0x)  | 66.17(-0.78)  |
 
-(以上数据来自 [ERNIE 3.0 性能测试文档](../legacy/model_zoo/ernie-3.0/README.md#性能测试)，文档包含测试环境介绍)
+(以上数据来自 [ERNIE 3.0 性能测试文档](../slm/model_zoo/ernie-3.0/README.md#性能测试)，文档包含测试环境介绍)
 
 ##### UIE 压缩效果
 
@@ -51,7 +51,7 @@ PaddleNLP 模型压缩 API 功能支持对 ERNIE 类下游任务上微调后的
 
 ### 三大场景快速启动模型压缩示例
 
-本项目提供了压缩 API 在分类（包含文本分类、文本匹配、自然语言推理、代词消歧等任务）、序列标注、抽取式阅读理解三大场景下的使用样例，可以分别参考 [ERNIE 3.0](../legacy/model_zoo/ernie-3.0) 目录下的 [compress_seq_cls.py](../legacy/model_zoo/ernie-3.0/compress_seq_cls.py) 、[compress_token_cls.py](../legacy/model_zoo/ernie-3.0/compress_token_cls.py)、[compress_qa.py](../legacy/model_zoo/ernie-3.0/compress_qa.py) 脚本，启动方式如下：
+本项目提供了压缩 API 在分类（包含文本分类、文本匹配、自然语言推理、代词消歧等任务）、序列标注、抽取式阅读理解三大场景下的使用样例，可以分别参考 [ERNIE 3.0](../slm/model_zoo/ernie-3.0) 目录下的 [compress_seq_cls.py](../slm/model_zoo/ernie-3.0/compress_seq_cls.py) 、[compress_token_cls.py](../slm/model_zoo/ernie-3.0/compress_token_cls.py)、[compress_qa.py](../slm/model_zoo/ernie-3.0/compress_qa.py) 脚本，启动方式如下：
 
 ```shell
 # 分类任务
@@ -149,7 +149,7 @@ python compress.py \
 ```
 
 
-<a name="获取模型压缩参数compression_args"></a>
+<a name="获取模型压缩参数 compression_args"></a>
 
 ### Step 1：获取模型压缩参数 compression_args
 
@@ -163,16 +163,16 @@ parser = PdArgumentParser(CompressionArguments)
 compression_args = parser.parse_args_into_dataclasses()
 ```
 
-<a name="实例化Trainer并调用compress()"></a>
+<a name="实例化 Trainer 并调用 compress()"></a>
 
 ### Step 2：实例化 Trainer 并调用 compress
 
-<a name="Trainer实例化参数介绍"></a>
+<a name="Trainer 实例化参数介绍"></a>
 
 #### Trainer 实例化参数介绍
 
 - **--model** 待压缩的模型，目前支持 ERNIE、BERT、RoBERTa、ERNIE-M、ELECTRA、ERNIE-Gram、PP-MiniLM、TinyBERT 等结构相似的模型，是在下游任务中微调后的模型，当预训练模型选择 ERNIE 时，需要继承 `ErniePretrainedModel`。以分类任务为例，可通过`AutoModelForSequenceClassification.from_pretrained(model_name_or_path)` 等方式来获取，这种情况下，`model_name_or_path`目录下需要有 model_config.json, model_state.pdparams 文件；
-- **--data_collator** 三类任务均可使用 PaddleNLP 预定义好的 [DataCollator 类](../paddlenlp/data/data_collator.py)，`data_collator` 可对数据进行 `Pad` 等操作。使用方法参考 [示例代码](../legacy/model_zoo/ernie-3.0/compress_seq_cls.py) 即可；
+- **--data_collator** 三类任务均可使用 PaddleNLP 预定义好的 [DataCollator 类](../paddlenlp/data/data_collator.py)，`data_collator` 可对数据进行 `Pad` 等操作。使用方法参考 [示例代码](../slm/model_zoo/ernie-3.0/compress_seq_cls.py) 即可；
 - **--train_dataset** 裁剪训练需要使用的训练集，是任务相关的数据。自定义数据集的加载可参考 [文档](https://huggingface.co/docs/datasets/loading)。不启动裁剪时，可以为 None；
 - **--eval_dataset** 裁剪训练使用的评估集，也是量化使用的校准数据，是任务相关的数据。自定义数据集的加载可参考 [文档](https://huggingface.co/docs/datasets/loading)。是 Trainer 的必选参数；
 - **--tokenizer** 模型 `model` 对应的 `tokenizer`，可使用 `AutoTokenizer.from_pretrained(model_name_or_path)` 来获取。
@@ -313,7 +313,7 @@ python compress.py \
 
 下面会介绍模型压缩启动命令可以传递的超参数。
 
-<a name="CompressionArguments参数介绍"></a>
+<a name="CompressionArguments 参数介绍"></a>
 
 #### CompressionArguments 参数介绍
 
@@ -350,7 +350,7 @@ python compress.py \
 
 - **--save_steps** 评估模型的步数。默认为 100；
 
-- **--optim** 裁剪训练使用的优化器名称，默认为adamw，默认为 'adamw'；
+- **--optim** 裁剪训练使用的优化器名称，默认为 adamw，默认为 'adamw'；
 
 - **--learning_rate** 裁剪训练使用优化器的初始学习率，默认为 5e-05；
 
@@ -415,19 +415,19 @@ python compress.py \
 
 裁剪、量化后的模型不能再通过 `from_pretrained` 导入进行预测，而是需要使用 Paddle 部署工具才能完成预测。
 
-压缩后的模型部署可以参考 [部署文档](../legacy/model_zoo/ernie-3.0/deploy) 完成。
+压缩后的模型部署可以参考 [部署文档](../slm/model_zoo/ernie-3.0/deploy) 完成。
 
 ### Python 部署
 
-服务端部署可以从这里开始。可以参考 [seq_cls_infer.py](../legacy/model_zoo/ernie-3.0/deploy/python/seq_cls_infer.py) 或者 [token_cls_infer.py](../legacy/model_zoo/ernie-3.0/deploy/python/token_cls_infer.py) 来编写自己的预测脚本。并根据 [Python 部署指南](../legacy/model_zoo/ernie-3.0/deploy/python/README.md) 的介绍安装预测环境，对压缩后的模型进行精度评估、性能测试以及部署。
+服务端部署可以从这里开始。可以参考 [seq_cls_infer.py](../slm/model_zoo/ernie-3.0/deploy/python/seq_cls_infer.py) 或者 [token_cls_infer.py](../slm/model_zoo/ernie-3.0/deploy/python/token_cls_infer.py) 来编写自己的预测脚本。并根据 [Python 部署指南](../slm/model_zoo/ernie-3.0/deploy/python/README.md) 的介绍安装预测环境，对压缩后的模型进行精度评估、性能测试以及部署。
 
 
 <a name="服务化部署"></a>
 
 ### 服务化部署
 
-- [FastDeploy ERNIE 3.0 模型 Serving 部署示例](../legacy/model_zoo/ernie-3.0/deploy/serving/README.md)
-- [基于PaddleNLP SimpleServing 的服务化部署](../legacy/model_zoo/ernie-3.0/deploy/simple_serving/README.md)
+- [FastDeploy ERNIE 3.0 模型 Serving 部署示例](../slm/model_zoo/ernie-3.0/deploy/serving/README.md)
+- [基于 PaddleNLP SimpleServing 的服务化部署](../slm/model_zoo/ernie-3.0/deploy/simple_serving/README.md)
 
 ### 移动端部署
 

diff --git a/docs/model_zoo/model_list_multy_device.md b/docs/model_zoo/model_list_multy_device.md
@@ -3,11 +3,11 @@
 ## 1.模型列表
 | 模型名称/硬件支持 | NPU | XPU | MLU |
 | - | - | - | - |
-| [BERT](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/model_zoo/bert) | ✅ | ✅ | ✅ |
-| [ERNIE-3.0](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/model_zoo/ernie-3.0) | ✅ | ❌ | ❌ |
-| [UIE](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/model_zoo/uie) | ✅ | ❌ | ❌ |
+| [BERT](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/bert) | ✅ | ✅ | ✅ |
+| [ERNIE-3.0](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/ernie-3.0) | ✅ | ❌ | ❌ |
+| [UIE](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/uie) | ✅ | ❌ | ❌ |
 | [UTC](https://github.com/PaddlePaddle/PaddleNLP/tree/release/2.8/applications/zero_shot_text_classification) | ✅ | ❌ | ❌ |
-| [RoBERTa](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/model_zoo/roberta) | ✅ | ❌ | ❌ |
+| [RoBERTa](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/roberta) | ✅ | ❌ | ❌ |
 
 ## 2.各硬件使用指南
 首先在硬件平台上安装飞桨环境，然后参照模型文档中的使用方法，只需将 device 参数改为对应的硬件平台即可。