Skip to content

Commit

Permalink
[taskflow] Fix taskflow bug (PaddlePaddle#9930)
Browse files Browse the repository at this point in the history
* fix

* add doc

* update docs

* add experiment

* update readme.md

---------

Co-authored-by: DrownFish19 <[email protected]>
  • Loading branch information
Fantasy-02 and DrownFish19 committed Feb 27, 2025
1 parent ebcdc4b commit 07ca81c
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 54 deletions.
1 change: 1 addition & 0 deletions docs/llm/application/information_extraction/README.md
1 change: 1 addition & 0 deletions docs/llm/application/information_extraction/doccano.md
103 changes: 58 additions & 45 deletions llm/application/information_extraction/README.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions llm/application/information_extraction/doccano.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ PP-UIE 支持抽取类型的任务,根据实际需要创建一个新的项目

#### 2.1 抽取式任务项目创建

创建项目时选择**序列标注**任务,并勾选**Allow overlapping entity****Use relation Labeling**。适配**命名实体识别、关系抽取、事件抽取、评价观点抽取**等任务。
创建项目时选择**序列标注**任务,并勾选**Allow overlapping entity****Use relation Labeling**。适配**命名实体识别、关系抽取、事件抽取**等任务。

<div align="center">
<img src=https://user-images.githubusercontent.com/40840292/167249142-44885510-51dc-4359-8054-9c89c9633700.png height=230 hspace='15'/>
Expand Down Expand Up @@ -236,14 +236,14 @@ schema = {
python doccano.py \
--doccano_file ./data/doccano_ext.json \
--save_dir ./data \
--negative_ratio 5
--negative_ratio 1
```

可配置参数说明:

- ``doccano_file``: 从 doccano 导出的数据标注文件。
- ``save_dir``: 训练数据的保存目录,默认存储在``data``目录下。
- ``negative_ratio``: 最大负例比例,该参数只对抽取类型任务有效,适当构造负例可提升模型效果。负例数量和实际的标签数量有关,最大负例数量 = negative_ratio * 正例数量。该参数只对训练集有效,默认为5。为了保证评估指标的准确性,验证集和测试集默认构造全正例。
- ``negative_ratio``: 最大负例比例,该参数只对抽取类型任务有效,适当构造负例可提升模型效果。负例数量和实际的标签数量有关,最大负例数量 = negative_ratio * 正例数量。
- ``splits``: 划分数据集时训练集、验证集所占的比例。默认为[0.8, 0.1, 0.1]表示按照``8:1:1``的比例将数据划分为训练集、验证集和测试集。
- ``task_type``: 选择任务类型,目前只有信息抽取这一种任务。
- ``is_shuffle``: 是否对数据集进行随机打散,默认为 True。
Expand Down
7 changes: 1 addition & 6 deletions paddlenlp/taskflow/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from multiprocessing import cpu_count

import paddle
from paddle.base.framework import use_pir_api
from paddle.dataset.common import md5file

from ..utils.env import (
Expand Down Expand Up @@ -371,7 +370,6 @@ def _get_inference_model(self):
self._construct_input_spec()
self._convert_dygraph_to_static()


self._static_model_file = self.inference_model_path + PADDLE_INFERENCE_MODEL_SUFFIX
self._static_params_file = self.inference_model_path + PADDLE_INFERENCE_WEIGHTS_SUFFIX

Expand All @@ -398,10 +396,7 @@ def _get_inference_model(self):
self._static_model_file = self._static_fp16_model_file
self._static_params_file = self._static_fp16_params_file
if self._predictor_type == "paddle-inference":
if use_pir_api():
self._config = paddle.inference.Config(self._static_json_file, self._static_params_file)
else:
self._config = paddle.inference.Config(self._static_model_file, self._static_params_file)
self._config = paddle.inference.Config(self._static_model_file, self._static_params_file)
self._prepare_static_mode()
else:
self._prepare_onnx_mode()
Expand Down

0 comments on commit 07ca81c

Please sign in to comment.