diff --git a/cookbook/essay_correction.ipynb b/cookbook/essay_correction.ipynb new file mode 100644 index 00000000..b225c8af --- /dev/null +++ b/cookbook/essay_correction.ipynb @@ -0,0 +1,1015 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 使用千帆平台训练一个作文批改的大模型\n", + "\n", + "在线教育场景中,对于学生作文,通常我们会从作文的内容是否符合题意、作文结构是否严谨、作文是否存在缺点和扣分项等方面对我们的作文做出评判,并给我们打出最终的得分。其实大模型也可以成为一名点评专家。大模型有很好的格式遵循和风格遵循能力,我们将点评的要求或者模板\"调教\"给大模型,大模型就能按照我们的要求对一篇作文做出点评。\n", + "\n", + "使用大模型对作文做出点评,可以很好的运用到在线教育的场景中,即可以成为老师的得力助手,也能够让学生知道作文还可以从哪些方面提升,大大节省我们的成本和时间。但是,未经过训练的基础模型,很难在具体场景中发挥出好的效果。\n", + "\n", + "为了运行以下的代码,请首先通过 pip 安装千帆 Python SDK,并且设置相关的环境变量" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "!pip install -U \"qianfan[dataset_base]\"" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# 以下环境变量供千帆 OpenAI Adapter 使用\n", + "os.environ[\"QIANFAN_ACCESS_KEY\"] = \"your_qianfan_console_access_key\"\n", + "os.environ[\"QIANFAN_SECRET_KEY\"] = \"your_qianfan_console_secret_key\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "举例来说,我们有以下的一个作文评分模板 Prompt 和待批改的作文:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from qianfan.common import Prompt\n", + "\n", + "correction_template = \"\"\"\n", + "你是一个高考语文阅卷老师,现在有一个高考作文题目和一篇待批改论文,需要你对这篇待批改论文进行评分。\n", + "要求:\n", + "1)请认真阅读作文批改要求和作文题目,对这篇待批改作文进行公正严格的批改和打分;\n", + "2)评分一定要严格,不能轻易给出高分。\n", + "3)最后返回内容要严格按照最后的输出格式。\n", + "\n", + "一、作文批改要求:\n", + "高考作文评分批改分为基础等级、发展等级、关于作文的其他项评定\n", + "1、基础等级\n", + "基础等级分内容和表达两项。\n", + "1)内容项\n", + "具体评分规则如下:符合题意、中心突出、内容充实、思想健康、感情真挚为一等,可按16-20分酌情给分;符合题意、主题明确、内容较充实、思想健康、感情真实为二等,可按11-15分酌情给分;基本符合题意、中心基本明确、内容单薄、思想基本健康、感情基本真实为三等,可按6-10分酌情给分;偏离题意、中心不明确、内容不当、思想不健康、感情虚假为四等,可按0-5分酌情给分。\n", + "2)表达项\n", + "具体评分规则如下:符合文体要求、结构严谨、语言流畅、字迹工整为一等,可按16-20分酌情给分;符合文体要求、结构完整、语言通顺、字迹清楚为二等,可按11-15分酌情给分;基本符合文体要求、结构基本完整、语言基本通顺、字迹基本清楚为三等,可按6-10分酌情给分;不符合文体要求、结构混乱、语言不通顺语病多、字迹潦草难辨为四等,可按0-5分酌情给分。\n", + "2、发展等级\n", + "基础等级分要与发展等级分相匹配,发展等级分不能跨越基础等级的得分等级。\n", + "具体评分规则如下:深刻、丰富、有文采、有创意为一等,可按16-20分酌情给分;较深刻、较丰富、较有文采、较有创意为二等,可按11-15分酌情给分;略显深刻、略显丰富、略显文采、略显创意为三等,可按6-10分酌情给分;个别语句有深意、个别例子较好、个别语句较精彩、个别地方有深意为四等,可按0-5分酌情给分。\n", + "3、关于作文的其他项评定\n", + "1)扣分项评定\n", + "出现错别字,1个错别字扣1分,重复不计,扣完5分为止;标点符号出现3处以上错误的酌情扣分;不足字数者,每少50字扣1分;无标题扣2分。\n", + "2)残篇评定\n", + "400字以上的文章,按评分标准评分,扣字数分。(少50个字扣1分)\n", + "400字以下的文章,20分以下评分,不再扣字数分。\n", + "200字以下的文章,10分以下评分,不再扣字数分。\n", + "只写一两句话的,给1分或2分,不评0分。\n", + "只写标题的,给1分或2分,不评0分。\n", + "完全空白的,评0分。\n", + "\n", + "二、作文题目:\n", + "{{title}}\n", + "\n", + "三、待批改作文\n", + "{{content}}\n", + "\n", + "四、输出格式\n", + "{\"详细解析\":{\"内容项\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"表达项\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"发展等级\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"扣分项和残篇评定\": {\"解析\": \"xxxxxx。\",\"扣分\": \"xx分\"}},\"缺点和改进意见\": {\"缺点\": \"xxxxxx。\",\"改进意见\": \"xxxxxxx。\"},\"最终得分\": \"xx分\"}\n", + "\"\"\"\n", + "\n", + "correction_prompt = Prompt(correction_template, identifier=\"{{}}\")\n", + "\n", + "render_dict = {\n", + " \"title\": \"你注意到了吗?装鲜牛奶的容器一般是方盒子,装矿泉水的容器一般是圆瓶子,装酒圆瓶子又一般放在方盒子里,方圆之间,各得其妙,古诗云:方圆虽异器,功用信具呈。人生也是如此,所谓:上善若水任方圆。以方圆为话题,根据此材料,题目自拟写作文,字数不少于800字。\",\n", + " \"content\": \"\"\"\n", + "方圆之间的人生智慧\n", + "\n", + "“方有止,圆有旋。”这句古人的智慧结晶,揭示了方与圆两种形态背后的深刻内涵。在生活中,我们常常见到方形的容器装着鲜牛奶,圆形的瓶子则装着矿泉水,而圆形的酒瓶又常常被放置在方形的盒子里。这些看似简单的形状,实际上蕴含着人生的哲理。\n", + "\n", + "方,代表着规矩、原则和稳定。它象征着秩序和安定,是我们生活中不可或缺的一部分。在人的成长过程中,我们需要遵循各种规矩,学会遵守社会的秩序,这样才能在社会中立足。正如牛奶需要方形的容器来保持稳定一样,我们的人生也需要方正的品格来支撑。\n", + "\n", + "然而,人生并非只有方的一面。圆,代表着变通、灵活和包容。它象征着和谐与圆满,是我们在面对复杂世界时的有力武器。我们需要学会圆滑处事,善于变通,这样才能在人生的道路上走得更远。就像矿泉水需要圆形的瓶子来适应各种环境一样,我们的人生也需要圆润的智慧来应对各种挑战。\n", + "\n", + "方圆之间,各得其妙。在人生的道路上,我们需要既要方正又要圆润。我们要有坚定的原则和信念,同时也要学会适应环境,灵活应对。这样才能在人生的舞台上大放异彩。\n", + "\n", + "上善若水任方圆。水,是世界上最柔软的物质,却能穿透坚硬的石头。这就是因为水懂得方圆之间的智慧。它既可以是方形的湖泊,也可以是圆形的河流,还可以是无形的雾气。水无常形,但却能包容万物。同样,我们也要有水的智慧,懂得在方圆之间寻找平衡,这样才能在人生的道路上游刃有余。\n", + "\n", + "总之,方圆之间的人生智慧是我们每个人都需要学习和领悟的。我们要学会在坚持原则和灵活变通之间找到平衡,这样才能在人生的道路上不断前行。同时,我们也要像水一样包容万物,接纳不同的观点和文化,让自己的人生更加丰富多彩。\n", + "\n", + "在这个充满变化和挑战的世界里,我们需要不断学习和成长,不断提升自己的能力和素质。只有这样,我们才能在方圆之间的人生舞台上展现出自己的风采和智慧。让我们一起努力,成为拥有方圆智慧的人,为自己的人生添彩!\n", + " \"\"\"\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "然后我们使用该 Prompt,对基础模型进行提问,要求它对上述作文按照要求进行批改。此处我们使用 ERNIE-Speed-8K 作为基础模型" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[INFO] [04-02 13:21:20] openapi_requestor.py:336 [t:8094817088]: requesting llm api endpoint: /chat/ernie_speed\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "```json\n", + "{\"详细解析\": {\"内容项\": {\"解析\": \"本文准确地捕捉了话题的核心,方圆之间的智慧在人生中的应用。文章通过日常生活中的例子,展示了方与圆的辩证关系及其在人生中的重要性。文中分别解释了方代表规矩、原则和稳定,而圆则代表变通、灵活和包容。作者提出的观点条理清晰,中心明确,内容充实且符合题意。\", \"等级\": \"一等\",\"得分\": \"18分\"}, \"表达项\": {\"解析\": \"文章采用了议论文的结构,逻辑清晰,语言通顺,字迹工整。作者在阐述观点时使用了生活中的例子,使观点更加生动且易于理解。此外,文章还使用了古诗来增强论证的说服力。符合文体要求。\", \"等级\": \"一等\",\"得分\": \"17分\"}, \"发展等级\": {\"解析\": \"文章不仅满足于阐述基本的观点,还深入探讨了方圆之间的人生智慧的重要性,并举例进行论证。体现出作者对这个话题有较深刻的理解,文章有一定的文采和创意。\", \"等级\": \"一等\",\"得分\": \"17分\"}, \"扣分项和残篇评定\": {\"解析\": \"文章中没有出现错别字或明显的标点符号错误。字数达到要求,没有扣字数分。\", \"扣分\": \"0分\"}}, \"缺点和改进意见\": {\"缺点\": \"无明显的缺点。文章整体结构清晰,论证充分,表达流畅。\", \"改进意见\": \"无需改进。\"}, \"最终得分\": \"52分\"}\n", + "```\n", + "\n", + "\n", + "{\"详细解析\": {\"内容项\": {\"解析\": \"本文准确地捕捉了话题的核心,方圆之间的智慧在人生中的应用。文章通过日常生活中的例子,展示了方与圆的辩证关系及其在人生中的重要性。文中分别解释了方代表规矩、原则和稳定,而圆则代表变通、灵活和包容。作者提出的观点条理清晰,中心明确,内容充实且符合题意。\", \"等级\": \"一等\",\"得分\": \"18分\"}, \"表达项\": {\"解析\": \"文章采用了议论文的结构,逻辑清晰,语言通顺,字迹工整。作者在阐述观点时使用了生活中的例子,使观点更加生动且易于理解。此外,文章还使用了古诗来增强论证的说服力。符合文体要求。\", \"等级\": \"一等\",\"得分\": \"17分\"}, \"发展等级\": {\"解析\": \"文章不仅满足于阐述基本的观点,还深入探讨了方圆之间的人生智慧的重要性,并举例进行论证。体现出作者对这个话题有较深刻的理解,文章有一定的文采和创意。\", \"等级\": \"一等\",\"得分\": \"17分\"}, \"扣分项和残篇评定\": {\"解析\": \"文章中没有出现错别字或明显的标点符号错误。字数达到要求,没有扣字数分。\", \"扣分\": \"0分\"}}, \"缺点和改进意见\": {\"缺点\": \"无明显的缺点。文章整体结构清晰,论证充分,表达流畅。\", \"改进意见\": \"无需改进。\"}, \"最终得分\": \"52分\"}\n" + ] + } + ], + "source": [ + "import re\n", + "\n", + "from qianfan import ChatCompletion\n", + "\n", + "cc = ChatCompletion(model=\"ERNIE-Speed-8K\")\n", + "\n", + "result = cc.do([{\"content\": correction_prompt.render(**render_dict)[0], \"role\": \"user\"}])\n", + "\n", + "print(result.body[\"result\"] + \"\\n\")\n", + "\n", + "print(re.search(\"^```json([\\s\\S]*)\\n```$\", result.body[\"result\"]).group(1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "对于该篇文章,直接调用模型的评分为52分。从实际情况看,文章针对方和圆的含义及举例阐释了方圆的含义,但是内容的深度、文章的文采、表现力都不足以达到高分的水平。因此,为了获得更好的效果,以及实现更高效的生成过程,我们需要专门训练一个模型。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. 准备数据集\n", + "\n", + "众所周知,训练模型时需要提前准备好相关的数据集,而数据集的获取通常是一个耗时耗力的过程。这不仅对于数据的数量有要求,为了让大模型的输出质量更符合我们的预期,数据的文本质量也有一定要求。\n", + "\n", + "针对数据获取难的问题,千帆平台针对一众细分领域场景提供了预置数据集,用户开箱即可用来训练大模型。本次的作文批改场景中,我们也会使用千帆平台提供的作文批改训练数据集和评估数据集,分别用于作文批改模型的训练和评估。\n", + "\n", + "使用千帆 Python SDK,我们可以很方便地加载数据集。" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[INFO] [04-02 13:21:30] dataset.py:389 [t:8094817088]: no data source was provided, construct\n", + "[INFO] [04-02 13:21:30] dataset.py:263 [t:8094817088]: construct a qianfan data source from existed id: ds-553hczysf3um4cc9, with args: {}\n", + "[INFO] [04-02 13:21:30] dataset.py:389 [t:8094817088]: no data source was provided, construct\n", + "[INFO] [04-02 13:21:30] dataset.py:263 [t:8094817088]: construct a qianfan data source from existed id: ds-6ubasnsry5pa4azi, with args: {}\n" + ] + } + ], + "source": [ + "import json\n", + "from qianfan.dataset import Dataset\n", + "\n", + "# 加载训练用的预置数据集\n", + "train_ds = Dataset.load(qianfan_dataset_id=\"ds-553hczysf3um4cc9\")\n", + "# 加载评估用的预置数据集\n", + "eval_ds = Dataset.load(qianfan_dataset_id=\"ds-6ubasnsry5pa4azi\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. 准备训练参数\n", + "\n", + "训练模型之前,我们需要定义的是一些超参数,比如学习率、训练轮数等,不同的超参数会导致最终训练出来的模型在测试集上的表现不同。基于经验,我们在下面提供了一个,基于百度的轻量级基础模型 ERNIE-Speed 的 SFT 超参数配置。\n", + "\n", + "ERNIE Speed是百度2024年最新发布的自研高性能大语言模型,通用能力优异,适合作为基座模型进行精调,更好地处理特定场景问题,同时具备极佳的推理性能。SFT 则代表监督微调。" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from qianfan.trainer.configs import TrainConfig\n", + "\n", + "# 针对模型进行 SFT 有监督微调的参数配置\n", + "train_config=TrainConfig(\n", + " peft_type=\"FullFineTuning\",\n", + " max_seq_len=4096,\n", + " epoch=5,\n", + " learning_rate=0.00003,\n", + " logging_steps=1,\n", + " warmup_ratio=0.1,\n", + " weight_decay=0.0001,\n", + ")\n", + "\n", + "# 如果用户想尝试使用 LoRA 的方式进行微调,可以尝试使用下面的配置\n", + "# 取消注释使用\n", + "\n", + "# train_config=TrainConfig(\n", + "# peft_type=\"LoRA\",\n", + "# max_seq_len=4096,\n", + "# epoch=5,\n", + "# learning_rate=0.00003,\n", + "# logging_steps=1,\n", + "# warmup_ratio=0.1,\n", + "# weight_decay=0.0001,\n", + "# lora_rank=8,\n", + "# lora_all_linear=True\n", + "# )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3. 发起训练\n", + "\n", + "在准备好上述三种组件之后,我们就可以开始训练了。" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[INFO] [04-01 15:47:38] persist.py:58 [t:8094817088]: save to /Users/pengyiyang/.qianfan_cache/file_tmp/pipeline/MBk3b4TFhR\n", + "[INFO] [04-01 15:47:39] persist.py:58 [t:8094817088]: save to /Users/pengyiyang/.qianfan_cache/file_tmp/pipeline/MBk3b4TFhR\n", + "[INFO] [04-01 15:47:41] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 1% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:48:12] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 1% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:48:42] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 3% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:49:13] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:49:44] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:50:14] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:50:45] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:51:16] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:51:46] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:52:17] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:52:47] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:53:18] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 34% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:53:48] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 35% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:54:19] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 36% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:54:49] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 36% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:55:20] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 37% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:55:50] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 38% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:56:21] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 39% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:56:51] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 39% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:57:22] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 40% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:57:52] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 41% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:58:23] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 42% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:58:54] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 42% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:59:25] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 43% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 15:59:55] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 44% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:00:26] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 45% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:00:57] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 45% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:01:27] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 45% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:01:58] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 46% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:02:28] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 47% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:02:59] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 48% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:03:29] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 49% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:04:00] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 50% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:04:00] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:04:30] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 50% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:04:30] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:05:01] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 51% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:05:01] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:05:32] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 51% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:05:32] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:06:02] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 52% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:06:02] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:06:33] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 53% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:06:33] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:07:04] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 54% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:07:04] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:07:34] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 55% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:07:34] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:08:05] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 55% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:08:05] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:08:35] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 56% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:08:35] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:09:05] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 57% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:09:05] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:09:36] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 57% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:09:36] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:10:06] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 58% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:10:06] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:10:37] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 59% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:10:37] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:11:07] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 60% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:11:07] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:11:38] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 60% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:11:38] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:12:08] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 61% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:12:08] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:12:39] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 62% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:12:39] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:13:12] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 62% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:13:12] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:13:43] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 63% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:13:43] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:14:13] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 64% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:14:13] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:14:53] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:14:53] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:15:23] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:15:23] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:15:54] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:15:54] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:16:26] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:16:26] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:16:56] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:16:56] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:17:27] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:17:27] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:17:57] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:17:57] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:18:28] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:18:28] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:18:58] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:18:58] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:19:29] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:19:29] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:19:59] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:19:59] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:20:30] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:20:30] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:21:00] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:21:00] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:21:31] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:21:31] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:22:01] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:22:01] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:22:32] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:22:32] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:23:02] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:23:02] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:23:33] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:23:33] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:24:03] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:24:03] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:24:34] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:24:34] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:25:05] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:25:05] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:25:35] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:25:35] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:26:06] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:26:06] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:26:36] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:26:36] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:27:07] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:27:07] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:27:37] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:27:37] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:28:08] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:28:08] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:28:38] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:28:38] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:29:09] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:29:09] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:29:39] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:29:39] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:30:10] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:30:10] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:30:40] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:30:40] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:31:11] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:31:11] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:31:41] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:31:41] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:32:12] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:32:12] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:32:42] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:32:42] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:33:13] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:33:13] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:33:44] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:33:44] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:34:14] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:34:14] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:34:45] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:34:45] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:35:15] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:35:15] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:35:46] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:35:46] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:36:16] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:36:16] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:36:47] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:36:47] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:37:17] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:37:17] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:37:49] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 65% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:37:49] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:38:19] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 99% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:38:19] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:38:50] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 99% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:38:50] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:39:20] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 99% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:39:20] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:39:51] actions.py:610 [t:8094817088]: [train_action] training ... job_name:model0f228692_WrIcR current status: Running, 99% check train task log in https://console.bce.baidu.com/qianfan/train/sft/job-6x3k3686vifc/task-3gucnjbzvhj9/detail/traininglog\n", + "[INFO] [04-01 16:39:51] actions.py:617 [t:8094817088]: check vdl report in https://console.bce.baidu.com/qianfan/visualdl/index?displayToken=eyJydW5JZCI6InJ1bi1qZThqZTc0aThkZGk2dWZ6In0=\n", + "[INFO] [04-01 16:40:21] actions.py:587 [t:8094817088]: [train_action] training task metrics: {'BLEU-4': '47.73%', 'ROUGE-1': '50.95%', 'ROUGE-2': '29.01%', 'ROUGE-L': '54.53%'}\n", + "[INFO] [04-01 16:40:21] actions.py:626 [t:8094817088]: [train_action] training job has ended: job-6x3k3686vifc/task-3gucnjbzvhj9 with status: Done\n", + "[INFO] [04-01 16:40:21] persist.py:58 [t:8094817088]: save to /Users/pengyiyang/.qianfan_cache/file_tmp/pipeline/MBk3b4TFhR\n", + "[INFO] [04-01 16:40:21] model.py:217 [t:8094817088]: check train job: task-3gucnjbzvhj9/job-6x3k3686vifc status before publishing model\n", + "[INFO] [04-01 16:40:21] model.py:230 [t:8094817088]: model publishing keep polling, current status Done\n", + "[INFO] [04-01 16:40:22] model.py:262 [t:8094817088]: publishing train task: job-6x3k3686vifc/task-3gucnjbzvhj9 to model: am-1g5pi9k6cktk/amv-h7ej2bbxpqrn\n", + "[INFO] [04-01 16:40:53] model.py:287 [t:8094817088]: model am-1g5pi9k6cktk/amv-h7ej2bbxpqrn published successfully\n", + "[INFO] [04-01 16:40:53] model.py:267 [t:8094817088]: publish successfully to model: am-1g5pi9k6cktk/amv-h7ej2bbxpqrn\n" + ] + } + ], + "source": [ + "from qianfan.trainer import LLMFinetune\n", + "\n", + "trainer = LLMFinetune(\n", + " train_type=\"ERNIE-Speed-8K\",\n", + " dataset=train_ds,\n", + " eval_dataset=eval_ds,\n", + " train_config=train_config,\n", + ")\n", + "\n", + "training_result = trainer.run()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 4. 查看结果\n", + "\n", + "训练完成后,我们可以从返回的对象中拿到一系列的信息:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'eval_res': , 'datasets': {'sourceType': 'Platform', 'versions': [{'versionId': 'ds-553hczysf3um4cc9'}], 'splitRatio': 20}, 'task_id': 'task-3gucnjbzvhj9', 'job_id': 'job-6x3k3686vifc', 'metrics': {'BLEU-4': '47.73%', 'ROUGE-1': '50.95%', 'ROUGE-2': '29.01%', 'ROUGE-L': '54.53%'}, 'model_id': 'am-1g5pi9k6cktk', 'model_version_id': 'amv-h7ej2bbxpqrn', 'model': }\n" + ] + } + ], + "source": [ + "print(training_result.output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "例如,我们可以查看训练出来的模型的版本 ID" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "amv-h7ej2bbxpqrn\n" + ] + } + ], + "source": [ + "print(training_result.output[\"model_version_id\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 5. 准备评估\n", + "\n", + "在训练完成之后,我们还需要对微调后的模型进行评估,以确定模型是否已经收敛且能实现我们所期望的效果。千帆 Python SDK 提供了模型评估的能力,用户可以使用千帆平台的预置评估能力,或者自行编写评估代码,来满足自身的评估需求。\n", + "\n", + "在这里,我们选择实现一个简单的自定义评估器,来评估微调后的模型是否有遵循我们的输出格式,以及各项评估指标,大模型输出和预期输出之间的差距。" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import Any, Dict, List, Union\n", + "\n", + "import numpy as np\n", + "\n", + "from qianfan.dataset import Dataset\n", + "from qianfan.evaluation.evaluator import LocalEvaluator\n", + "from qianfan.resources import Embedding\n", + "\n", + "def _convert_str_to_int(str_score: str) -> int:\n", + " try:\n", + " return int(str_score[:-1])\n", + " except:\n", + " return 0\n", + " \n", + "embedding = Embedding(query_per_second=5)\n", + " \n", + "def get_qianfan_embedding(content: str) -> np.array:\n", + " return np.array(embedding.do([content]).body[\"data\"][0][\"embedding\"])\n", + "\n", + "def get_cosine_similarity(content1: str, content2: str) -> float:\n", + " vec1 = get_qianfan_embedding(content1)\n", + " vec2 = get_qianfan_embedding(content2)\n", + "\n", + " return vec1.dot(vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))\n", + "\n", + "class EssayEvaluator(LocalEvaluator):\n", + "\n", + " def evaluate(self, input: Union[str, List[Dict[str, Any]]], reference: str, output: str) -> Dict[str, Any]:\n", + " try:\n", + " try:\n", + " judge_result: Dict[str, Any] = json.loads(output)\n", + " except:\n", + " # 兼容可能的 Markdown 输出\n", + " judge_result: Dict[str, Any] = json.loads(re.search(\"^```json([\\s\\S]*)\\n```$\", output.group(1)))\n", + " \n", + " reference_result: Dict[str, Any] = json.loads(reference)\n", + " return {\n", + " \"遵守格式\": True,\n", + "\n", + " \"内容评分等级一致\": judge_result[\"详细解析\"][\"内容项\"][\"等级\"] == reference_result[\"详细解析\"][\"内容项\"][\"等级\"],\n", + " \"内容点评相似度\": get_cosine_similarity(judge_result[\"详细解析\"][\"内容项\"][\"解析\"], reference_result[\"详细解析\"][\"内容项\"][\"解析\"]),\n", + " \"内容评分分差\": abs(_convert_str_to_int(judge_result[\"详细解析\"][\"内容项\"][\"得分\"]) - _convert_str_to_int(reference_result[\"详细解析\"][\"内容项\"][\"得分\"])),\n", + "\n", + " \"表达评分等级一致\": judge_result[\"详细解析\"][\"表达项\"][\"等级\"] == reference_result[\"详细解析\"][\"表达项\"][\"等级\"],\n", + " \"表达点评相似度\": get_cosine_similarity(judge_result[\"详细解析\"][\"表达项\"][\"解析\"], reference_result[\"详细解析\"][\"表达项\"][\"解析\"]),\n", + " \"表达评分分差\": abs(_convert_str_to_int(judge_result[\"详细解析\"][\"表达项\"][\"得分\"]) - _convert_str_to_int(reference_result[\"详细解析\"][\"表达项\"][\"得分\"])),\n", + " \n", + " \"发展评分等级一致\": judge_result[\"详细解析\"][\"发展等级\"][\"等级\"] == reference_result[\"详细解析\"][\"发展等级\"][\"等级\"],\n", + " \"发展点评相似度\": get_cosine_similarity(judge_result[\"详细解析\"][\"发展等级\"][\"解析\"], reference_result[\"详细解析\"][\"发展等级\"][\"解析\"]),\n", + " \"发展评分分差\": abs(_convert_str_to_int(judge_result[\"详细解析\"][\"发展等级\"][\"得分\"]) - _convert_str_to_int(reference_result[\"详细解析\"][\"发展等级\"][\"得分\"])),\n", + "\n", + " \"扣分解析相似度\": get_cosine_similarity(judge_result[\"详细解析\"][\"扣分项和残篇评定\"][\"解析\"], reference_result[\"详细解析\"][\"扣分项和残篇评定\"][\"解析\"]),\n", + " \"扣分项扣分分差\": abs(_convert_str_to_int(judge_result[\"详细解析\"][\"扣分项和残篇评定\"][\"扣分\"]) - _convert_str_to_int(reference_result[\"详细解析\"][\"扣分项和残篇评定\"][\"扣分\"])),\n", + "\n", + " \"总分分差\": abs(_convert_str_to_int(judge_result[\"最终得分\"]) - _convert_str_to_int(reference_result[\"最终得分\"])),\n", + " }\n", + " except:\n", + " return {\n", + " \"遵守格式\": False,\n", + " \"内容评分等级一致\": False,\n", + " \"内容点评相似度\": -1,\n", + " \"内容评分分差\": -1,\n", + " \"表达评分等级一致\": False,\n", + " \"表达点评相似度\": -1,\n", + " \"表达评分分差\": -1,\n", + " \"发展评分等级一致\": False,\n", + " \"发展点评相似度\": -1,\n", + " \"发展评分分差\": -1,\n", + " \"扣分解析相似度\": -1,\n", + " \"扣分项扣分分差\": -1,\n", + " \"总分分差\": -1\n", + " }\n", + " \n", + " def summarize(self, metric_dataset: Dataset) -> Dict[str, Any] | None:\n", + " statistics_dict: Dict[str: Any] = {}\n", + " for line in metric_dataset.list():\n", + " for k, v in line.items():\n", + " if isinstance(v, bool):\n", + " if f\"{k}占比\" not in statistics_dict:\n", + " statistics_dict[f\"{k}占比\"] = 0\n", + "\n", + " statistics_dict[f\"{k}占比\"] += 1 if v else 0\n", + " \n", + " elif isinstance(v, (int, float)):\n", + " if f\"{k}平均值\" not in statistics_dict:\n", + " statistics_dict[f\"{k}平均值\"] = 0\n", + "\n", + " statistics_dict[f\"{k}平均值\"] += v\n", + "\n", + " ds_size = len(metric_dataset)\n", + "\n", + " for k, v in statistics_dict.items():\n", + " statistics_dict[k] = v / ds_size\n", + "\n", + " return statistics_dict" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from qianfan.evaluation import EvaluationManager\n", + "from qianfan.model import Model\n", + "\n", + "em = EvaluationManager(local_evaluators=[EssayEvaluator()])\n", + "\n", + "# 这一步骤会使用模型进行批量推理,再对批量推理的结果进行批量评估\n", + "eval_result = em.eval([Model(version_id=training_result.output[\"model_version_id\"])], eval_ds)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"am-1g5pi9k6cktk_amv-h7ej2bbxpqrn_None\": {\"遵守格式占比\": 0.94, \"内容评分等级一致占比\": 0.58, \"内容点评相似度平均值\": 0.5121268690680779, \"内容评分分差平均值\": 1.94, \"表达评分等级一致占比\": 0.6, \"表达点评相似度平均值\": 0.5782608177386275, \"表达评分分差平均值\": 1.78, \"发展评分等级一致占比\": 0.56, \"发展点评相似度平均值\": 0.5047243485327537, \"发展评分分差平均值\": 2.06, \"扣分解析相似度平均值\": 0.6616456105456632, \"扣分项扣分分差平均值\": 0.16, \"总分分差平均值\": 5.54}}\n" + ] + } + ], + "source": [ + "print(json.dumps(eval_result.metrics, ensure_ascii=False))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以看到,评估得到的模型,在回答的稳定性上,较之前的基础模型有所提升,且打分结果更贴近人工打分的结果。\n", + "\n", + "我们还可以将评估的结果数据集保存到本地,方便我们进行进一步的分析" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[INFO] [04-01 17:03:16] dataset.py:462 [t:8094817088]: no destination data source was provided, construct\n", + "[INFO] [04-01 17:03:16] dataset.py:257 [t:8094817088]: construct a file data source from path: local.json, with args: {}\n", + "[INFO] [04-01 17:03:16] file.py:280 [t:8094817088]: use format type FormatType.Json\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eval_result.result_dataset.save(data_file=\"local.json\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "比如说,我们可以将各项指标分差进行汇总计算,最后使用可视化的方式进行展示。\n", + "\n", + "为了使得数据能够有对比,我们还可以使用基础模型的 ERNIE-Speed-8K ,在评估集上也做相同的评估,以佐证我们的训练效果" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[INFO] [04-03 16:10:01] evaluation_manager.py:430 [t:8094817088]: start to inference in batch during evaluation\n", + "[INFO] [04-03 16:10:02] dataset_utils.py:332 [t:6310473728]: start to create evaluation task in model\n", + "[INFO] [04-03 16:10:02] dataset_utils.py:294 [t:6310473728]: start to polling status of evaluation task ame-rad6m0c62sza\n", + "[INFO] [04-03 16:10:03] dataset_utils.py:301 [t:6310473728]: current eval_state: Pending\n", + "[INFO] [04-03 16:10:33] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:11:03] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:11:34] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:12:04] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:12:34] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:13:04] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:13:35] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:14:05] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:14:35] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:15:06] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:15:36] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:16:06] dataset_utils.py:301 [t:6310473728]: current eval_state: Doing\n", + "[INFO] [04-03 16:16:36] dataset_utils.py:301 [t:6310473728]: current eval_state: DoingWithManualBegin\n", + "[INFO] [04-03 16:16:36] dataset_utils.py:319 [t:6310473728]: get result dataset id ds-ragnuw6t3fazaict\n", + "[INFO] [04-03 16:16:36] dataset.py:389 [t:6310473728]: no data source was provided, construct\n", + "[INFO] [04-03 16:16:36] dataset.py:263 [t:6310473728]: construct a qianfan data source from existed id: ds-ragnuw6t3fazaict, with args: {'is_download_to_local': True}\n", + "[WARNING] [04-03 16:16:37] baidu_qianfan.py:740 [t:6310473728]: parameter \"is_download_to_local\" has been set as deprecated\n", + "[INFO] [04-03 16:16:37] baidu_qianfan.py:349 [t:6310473728]: no cache was found, download cache\n", + "[INFO] [04-03 16:16:38] baidu_qianfan.py:275 [t:6310473728]: get dataset info succeeded for dataset id ds-ragnuw6t3fazaict\n", + "[INFO] [04-03 16:16:38] utils.py:617 [t:6310473728]: start to export dataset\n", + "[INFO] [04-03 16:16:39] utils.py:621 [t:6310473728]: create dataset export task successfully\n", + "[INFO] [04-03 16:16:41] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:41] utils.py:634 [t:6310473728]: export status: 1, keep polling\n", + "[INFO] [04-03 16:16:43] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:44] utils.py:634 [t:6310473728]: export status: 1, keep polling\n", + "[INFO] [04-03 16:16:46] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:46] utils.py:634 [t:6310473728]: export status: 1, keep polling\n", + "[INFO] [04-03 16:16:48] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:48] utils.py:634 [t:6310473728]: export status: 1, keep polling\n", + "[INFO] [04-03 16:16:50] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:51] utils.py:634 [t:6310473728]: export status: 1, keep polling\n", + "[INFO] [04-03 16:16:53] utils.py:626 [t:6310473728]: polling export task status\n", + "[INFO] [04-03 16:16:53] utils.py:631 [t:6310473728]: export succeed\n", + "[INFO] [04-03 16:16:54] utils.py:565 [t:6310473728]: get export records succeeded for dataset id ds-ragnuw6t3fazaict\n", + "[INFO] [04-03 16:16:54] utils.py:579 [t:6310473728]: latest dataset with time2024-04-03 16:16:53 for dataset ds-ragnuw6t3fazaict\n", + "[INFO] [04-03 16:16:54] utils.py:645 [t:6310473728]: start to download file from url https://bj.bcebos.com/easydata-upload/_easydata-download_/fbb00c440e174c81a514c4b83570f816/%E8%AF%84%E4%BC%B0%E4%BB%BB%E5%8A%A1_model_run_ugjPawGCfd_%E7%BB%93%E6%9E%9C%E9%9B%86_4ee771V1_20240403_161638.zip?authorization=bce-auth-v1%2F50c8bb753dcb4e1d8646bb1ffefd3503%2F2024-04-03T08%3A16%3A54Z%2F3600%2Fhost%2Fa8b9deebc0d519176b46ea96d13e1e8242401bbfe8b71ddf897317ed3734f784\n", + "[INFO] [04-03 16:16:54] baidu_qianfan.py:290 [t:6310473728]: download dataset zip to /Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/bin.zip succeeded\n", + "[INFO] [04-03 16:16:54] baidu_qianfan.py:315 [t:6310473728]: unzip dataset to path /Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content successfully\n", + "[INFO] [04-03 16:16:54] baidu_qianfan.py:319 [t:6310473728]: write dataset info to path /Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/info.json successfully\n", + "[INFO] [04-03 16:16:55] utils.py:331 [t:6310473728]: need create cached arrow file for /Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content/dataset.jsonl\n", + "[INFO] [04-03 16:16:55] utils.py:376 [t:6310473728]: start to write arrow table to /Users/pengyiyang/.qianfan_cache/dataset/Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content/dataset.arrow\n", + "[INFO] [04-03 16:16:55] utils.py:388 [t:6310473728]: writing succeeded\n", + "[INFO] [04-03 16:16:55] utils.py:262 [t:6310473728]: start to get memory_map from /Users/pengyiyang/.qianfan_cache/dataset/Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content/dataset.arrow\n", + "[INFO] [04-03 16:16:55] utils.py:237 [t:6310473728]: has got a memory-mapped table\n", + "[INFO] [04-03 16:16:55] utils.py:376 [t:6310473728]: start to write arrow table to /Users/pengyiyang/.qianfan_cache/dataset/.mapper_cache/Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content_9f55c2b6-c3d7-4afb-8afd-303de2aaca61.arrow\n", + "[INFO] [04-03 16:16:55] utils.py:388 [t:6310473728]: writing succeeded\n", + "[INFO] [04-03 16:16:55] utils.py:262 [t:6310473728]: start to get memory_map from /Users/pengyiyang/.qianfan_cache/dataset/.mapper_cache/Users/pengyiyang/.qianfan_cache/dataset/.qianfan_download_cache/dg-87rdaiqayhikmra3/ds-ragnuw6t3fazaict/1/content_9f55c2b6-c3d7-4afb-8afd-303de2aaca61.arrow\n", + "[INFO] [04-03 16:16:55] evaluation_manager.py:454 [t:8094817088]: start to evaluate llm 0\n", + "[INFO] [04-03 16:16:55] evaluation_manager.py:482 [t:8094817088]: start to merge evaluation result dataset\n" + ] + } + ], + "source": [ + "og_model_eval_result = em.eval([Model(version_id=\"amv-pzqtzdspm77m\")], eval_ds)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "!pip install tabulate" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "╒══════════════╤════════════════╤════════════════════════╤════════════════════════╤══════════════════════╤════════════════════════╤════════════════════════╤══════════════════════╤════════════════════════╤════════════════════════╤══════════════════════╤════════════════════════╤════════════════════════╤══════════════════╕\n", + "│ │ 遵守格式占比 │ 内容评分等级一致占比 │ 内容点评相似度平均值 │ 内容评分分差平均值 │ 表达评分等级一致占比 │ 表达点评相似度平均值 │ 表达评分分差平均值 │ 发展评分等级一致占比 │ 发展点评相似度平均值 │ 发展评分分差平均值 │ 扣分解析相似度平均值 │ 扣分项扣分分差平均值 │ 总分分差平均值 │\n", + "╞══════════════╪════════════════╪════════════════════════╪════════════════════════╪══════════════════════╪════════════════════════╪════════════════════════╪══════════════════════╪════════════════════════╪════════════════════════╪══════════════════════╪════════════════════════╪════════════════════════╪══════════════════╡\n", + "│ EB-Speed-SFT │ 0.94 │ 0.58 │ 0.512127 │ 1.94 │ 0.6 │ 0.578261 │ 1.78 │ 0.56 │ 0.504724 │ 2.06 │ 0.661646 │ 0.16 │ 5.54 │\n", + "├──────────────┼────────────────┼────────────────────────┼────────────────────────┼──────────────────────┼────────────────────────┼────────────────────────┼──────────────────────┼────────────────────────┼────────────────────────┼──────────────────────┼────────────────────────┼────────────────────────┼──────────────────┤\n", + "│ EB-Speed │ 0 │ 0 │ -1 │ -1 │ 0 │ -1 │ -1 │ 0 │ -1 │ -1 │ -1 │ -1 │ -1 │\n", + "╘══════════════╧════════════════╧════════════════════════╧════════════════════════╧══════════════════════╧════════════════════════╧════════════════════════╧══════════════════════╧════════════════════════╧════════════════════════╧══════════════════════╧════════════════════════╧════════════════════════╧══════════════════╛\n", + "╒══════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕\n", + "│ │ 输入的 Prompt │ 预期回答与大模型回答 │\n", + "╞══════════════╪════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡\n", + "│ 原始数据 │ 你是一个高考语文阅卷老师,现在有一个高考作文题目和一篇待批改论文,需要你对这篇待批改论文进行评分。 │ {\"详细解析\": {\"内容项\": {\"解析\": \"作文贴合题目要求,通过日月、花鸟等自然景象引出语言的议题,并逐层深入论述了语言在生活、生命、文明传承中的作用。文章思路清晰,结构合理,内容较为充实,符合二等水平。\",\"等级\": \"二等\",\"得分\": \"14分\"},\"表达项\": {\"解析\": \"文章整体结构完整,段落清晰,语言表达流畅,书写清晰,未见语病,符合二等水平。\",\"等级\": \"二等\",\"得分\": \"14分\"},\"发展等级\": {\"解析\": \"文章较好地展现了作者的感悟和思考,但在深度和文采上稍显平常,未能展现出较高的创新度和文采,属于二等水平。\",\"等级\": \"二等\",\"得分\": \"13分\"},\"扣分项和残篇评定\": {\"解析\": \"未见明显的错别字和严重的标点错误,作文字数符合要求,无需扣分。\",\"扣分\": \"0分\"}},\"缺点和改进意见\": {\"缺点\": \"虽然文章较好地展现了语言在不同领域中的作用,但对于每一个领域中具体的语言现象论述不够深入,例证不够生动具体,语言灵活性和文采表现一般。\",\"改进意见\": \"可以通过增加对生活中具体语言现象的描述和分析,加强事例的说服力,使文章在论证深度和文采上都更为出彩。另外,可以适当使用比较新颖的表达方法或修辞手法,以提升文章的创造性和阅读兴趣。\"},\"最终得分\": \"41分\"} │\n", + "│ │ 要求: │ │\n", + "│ │ 1)请认真阅读作文批改要求和作文题目,对这篇待批改作文进行公正严格的批改和打分; │ │\n", + "│ │ 2)评分一定要严格,不能轻易给出高分。 │ │\n", + "│ │ 3)最后返回内容要严格按照最后的输出格式。 │ │\n", + "│ │ │ │\n", + "│ │ 一、作文批改要求: │ │\n", + "│ │ 高考作文评分批改分为基础等级、发展等级、关于作文的其他项评定 │ │\n", + "│ │ 1、基础等级 │ │\n", + "│ │ 基础等级分内容和表达两项。 │ │\n", + "│ │ 1)内容项 │ │\n", + "│ │ 具体评分规则如下:符合题意、中心突出、内容充实、思想健康、感情真挚为一等,可按16-20分酌情给分;符合题意、主题明确、内容较充实、思想健康、感情真实为二等,可按11-15分酌情给分;基本符合题意、中心基本明确、内容单薄、思想基本健康、感情基本真实为三等,可按6-10分酌情给分;偏离题意、中心不明确、内容不当、思想不健康、感情虚假为四等,可按0-5分酌情给分。 │ │\n", + "│ │ 2)表达项 │ │\n", + "│ │ 具体评分规则如下:符合文体要求、结构严谨、语言流畅、字迹工整为一等,可按16-20分酌情给分;符合文体要求、结构完整、语言通顺、字迹清楚为二等,可按11-15分酌情给分;基本符合文体要求、结构基本完整、语言基本通顺、字迹基本清楚为三等,可按6-10分酌情给分;不符合文体要求、结构混乱、语言不通顺语病多、字迹潦草难辨为四等,可按0-5分酌情给分。 │ │\n", + "│ │ 2、发展等级 │ │\n", + "│ │ 基础等级分要与发展等级分相匹配,发展等级分不能跨越基础等级的得分等级。 │ │\n", + "│ │ 具体评分规则如下:深刻、丰富、有文采、有创意为一等,可按16-20分酌情给分;较深刻、较丰富、较有文采、较有创意为二等,可按11-15分酌情给分;略显深刻、略显丰富、略显文采、略显创意为三等,可按6-10分酌情给分;个别语句有深意、个别例子较好、个别语句较精彩、个别地方有深意为四等,可按0-5分酌情给分。 │ │\n", + "│ │ 3、关于作文的其他项评定 │ │\n", + "│ │ 1)扣分项评定 │ │\n", + "│ │ 出现错别字,1个错别字扣1分,重复不计,扣完5分为止;标点符号出现3处以上错误的酌情扣分;不足字数者,每少50字扣1分;无标题扣2分。 │ │\n", + "│ │ 2)残篇评定 │ │\n", + "│ │ 400字以上的文章,按评分标准评分,扣字数分。(少50个字扣1分) │ │\n", + "│ │ 400字以下的文章,20分以下评分,不再扣字数分。 │ │\n", + "│ │ 200字以下的文章,10分以下评分,不再扣字数分。 │ │\n", + "│ │ 只写一两句话的,给1分或2分,不评0分。 │ │\n", + "│ │ 只写标题的,给1分或2分,不评0分。 │ │\n", + "│ │ 完全空白的,评0分。 │ │\n", + "│ │ │ │\n", + "│ │ 二、作文题目: │ │\n", + "│ │ 花自语,鸟有语,生活处处有语言。生命也可以用语言来解读,雕塑、基因都可以用语言来传递。语言丰富生活,语言诠释生命,语言传承文明。 请根据所给材料作文,自己拟题,文体不限,诗歌除外,不少于 800 字。 │ │\n", + "│ │ │ │\n", + "│ │ 三、待批改作文 │ │\n", + "│ │ 作文题目:以语言为桥,通往生活的多维世界 │ │\n", + "│ │ │ │\n", + "│ │ 太阳以光芒之语温暖世界,月亮以寂静之语洒落温柔,花以香气之语诉说生命的绽放,鸟以歌声之语传递自由的向往。生活无处不语言,无时不语言,它们像一座座桥梁,连接着我们与世界,我们与自己。 │ │\n", + "│ │ │ │\n", + "│ │ 语言,是生活的调色板。每个人都在用自己的方式表达,艺术家用画笔描绘色彩,音乐家用音符谱写旋律,诗人用文字勾勒意境。这些语言形式各异,却都在诠释着生活的丰富多彩。它们让我们在平淡的日常中,发现不一样的美,感受不一样的情。 │ │\n", + "│ │ │ │\n", + "│ │ 语言,是生命的解码器。生命是一本无字的书,需要我们用心去读,用语言去解读。科学家用基因的语言揭示生命的奥秘,哲学家用思辨的语言探索生命的意义,我们用生活的语言感受生命的温度。这些语言或深或浅,或宽或窄,却都在试图解答生命这个永恒的谜题。 │ │\n", + "│ │ │ │\n", + "│ │ 语言,是文明的传承者。人类的历史就是一部语言的历史,我们的智慧、情感、文化、信仰,都通过语言得以流传。书籍是智慧的语言,让我们跨越时空的限制,与古人对话,与未来畅想。故事是情感的语言,让我们在别人的经历中,找到自己的影子,感受生活的共鸣。习俗是文化的语言,让我们在生活的琐碎中,感受民族的底蕴,传承文明的精神。信仰是灵魂的语言,让我们在迷茫的时刻,找到前进的方向,坚定生活的信念。 │ │\n", + "│ │ │ │\n", + "│ │ 然而,语言并非总是清晰明了的。有时,它是一种含蓄的表达,需要我们用心去揣摩;有时,它是一种复杂的密码,需要我们用智慧去破译;有时,它是一种深邃的海洋,需要我们用勇气去探索。但无论语言如何变化,它都是我们理解世界、表达自我、沟通情感的重要工具。 │ │\n", + "│ │ │ │\n", + "│ │ 在这个充满变化的世界中,让我们更加珍视语言的力量。让我们用爱的语言去温暖他人,用智慧的语言去启迪他人,用美的语言去感染他人。因为,语言不仅是我们生活的工具,更是我们生命的馈赠。 │ │\n", + "│ │ │ │\n", + "│ │ 以语言为桥,我们可以通往生活的多维世界。在这个世界中,有无限的可能等待我们去发现,有无数的精彩等待我们去创造。让我们用语言的力量,去开启这个充满奇迹的世界吧! │ │\n", + "│ │ │ │\n", + "│ │ 四、输出格式 │ │\n", + "│ │ {\"详细解析\": {\"内容项\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"表达项\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"发展等级\": {\"解析\": \"xxxxxx。\",\"等级\": \"xx等\",\"得分\": \"xx分\"},\"扣分项和残篇评定\": {\"解析\": \"xxxxxx。\",\"扣分\": \"xx分\"}},\"缺点和改进意见\": {\"缺点\": \"(一定要详细具体,最好点明哪段哪句存在缺点)xxxxxx。\",\"改进意见\": \"xxxxxxx。\"},\"最终得分\": (最终得分=内容项得分+表达项得分+发展等级得分-扣分项和残篇评定扣分)\"xx分\"} │ │\n", + "├──────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤\n", + "│ EB-Speed-SFT │ │ {\"详细解析\": { │\n", + "│ │ │ \"内容项\": { │\n", + "│ │ │ \"解析\": \"作文内容紧扣题目,围绕‘语言’这一核心概念展开,分别从生活、生命、文明三个维度诠释了语言的重要性和价值。文章结构清晰,中心突出,内容较为充实,情感真挚。\", │\n", + "│ │ │ \"等级\": \"二等\", │\n", + "│ │ │ \"得分\": \"14分\" │\n", + "│ │ │ }, │\n", + "│ │ │ \"表达项\": { │\n", + "│ │ │ \"解析\": \"文章结构完整,段落之间过渡自然,语言通顺,用词恰当,无显著语病。但文章在语言表达上稍显平淡,缺乏一些生动形象的比喻或是更精炼有力的语句。\", │\n", + "│ │ │ \"等级\": \"二等\", │\n", + "│ │ │ \"得分\": \"13分\" │\n", + "│ │ │ }, │\n", + "│ │ │ \"发展等级\": { │\n", + "│ │ │ \"解析\": \"文章观点较为深刻,对语言的功能有一定的探讨,但缺乏更具体丰富的论证和详实的例证,展开略显不足,未能充分展现语言的魅力与创意。\", │\n", + "│ │ │ \"等级\": \"二等\", │\n", + "│ │ │ \"得分\": \"12分\" │\n", + "│ │ │ }, │\n", + "│ │ │ \"扣分项和残篇评定\": { │\n", + "│ │ │ \"解析\": \"文章未出现明显的错别字,标点符号使用正确,未发现字数不足情况,也有标题。\", │\n", + "│ │ │ \"扣分\": \"0分\" │\n", + "│ │ │ } │\n", + "│ │ │ }, │\n", + "│ │ │ \"缺点和改进意见\": { │\n", + "│ │ │ \"缺点\": \"虽然文章结构完整,语言通顺,但在内容上稍显单薄,论证不够深入,缺少生动具体的例子和深刻的思考。同时,文章在表达上缺少一些文采,使语言显得稍微平淡。\", │\n", + "│ │ │ \"改进意见\": \"可以增加一些具体生活中的例子来支撑论点,让论证更加有力。同时,尝试使用更加生动精彩的语言和修辞手法,让文章在表达上更加饱满,情感上更加真挚。\" │\n", + "│ │ │ }, │\n", + "│ │ │ \"最终得分\": \"39分\" │\n", + "│ │ │ } │\n", + "├──────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤\n", + "│ EB-Speed │ │ 根据您提供的作文批改要求和作文内容,我将对这篇作文进行公正严格的批改和打分。 │\n", + "│ │ │ │\n", + "│ │ │ 一、作文批改解析 │\n", + "│ │ │ │\n", + "│ │ │ 1. 内容项 │\n", + "│ │ │ │\n", + "│ │ │ * 解析:该作文紧扣题目要求,从语言在生活中的作用、语言对生命的诠释以及语言在传承文明中的重要性三个方面进行了深入阐述。内容充实,结构清晰,符合题意,主题明确。 │\n", + "│ │ │ * 等级:一等 │\n", + "│ │ │ * 得分:18分 │\n", + "│ │ │ │\n", + "│ │ │ 2. 表达项 │\n", + "│ │ │ │\n", + "│ │ │ * 解析:该作文语言流畅,字迹工整,符合文体要求。结构严谨,层次分明,表达清晰。 │\n", + "│ │ │ * 等级:一等 │\n", + "│ │ │ * 得分:17分 │\n", + "│ │ │ │\n", + "│ │ │ 3. 发展等级 │\n", + "│ │ │ │\n", + "│ │ │ * 解析:该作文在深刻、丰富、有文采、有创意等方面都表现得很好。作者通过生活中的例子,生动地阐述了语言的作用,显示出作者对问题的深刻理解和独特见解。 │\n", + "│ │ │ * 等级:一等 │\n", + "│ │ │ * 得分:17分 │\n", + "│ │ │ │\n", + "│ │ │ 4. 扣分项和残篇评定 │\n", + "│ │ │ │\n", + "│ │ │ * 解析:作文中未发现明显的错别字、标点符号错误。字数达到要求,无不足字数情况。 │\n", + "│ │ │ * 扣分:0分 │\n", + "│ │ │ │\n", + "│ │ │ 二、缺点和改进意见 │\n", + "│ │ │ │\n", + "│ │ │ * 缺点:无明显的缺点。 │\n", + "│ │ │ * 改进意见:无需改进。 │\n", + "│ │ │ │\n", + "│ │ │ 三、最终得分 │\n", + "│ │ │ │\n", + "│ │ │ * 得分:内容项18分 + 表达项17分 + 发展等级17分 - 扣分项0分 = 52分(基础等级得分) │\n", + "│ │ │ * 发展等级加分:由于作文在深刻、丰富、有文采、有创意等方面表现突出,建议加分为一等,加3分。 │\n", + "│ │ │ * 最终得分:52分(基础等级)+ 3分(发展等级加分)= 55分 │\n", + "│ │ │ │\n", + "│ │ │ 总之,这篇作文紧扣题目要求,内容丰富,表达清晰,发展等级高,无明显缺点。建议给予高分。 │\n", + "╘══════════════╧════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛\n" + ] + } + ], + "source": [ + "from tabulate import tabulate\n", + "\n", + "sft_model_tag = list(eval_result.metrics.keys())[0]\n", + "\n", + "cmp_dict = {k: [eval_result.metrics[sft_model_tag][k], v] for k, v in list(og_model_eval_result.metrics.values())[0].items()}\n", + "\n", + "print(tabulate(cmp_dict, headers='keys', tablefmt='fancy_grid', showindex=(\"EB-Speed-SFT\", \"EB-Speed\")))\n", + "\n", + "cmp_entry_dict = {\n", + " \"输入的 Prompt\": [eval_result.result_dataset[0][\"input_prompt\"], None, None],\n", + " \"预期回答与大模型回答\": [eval_result.result_dataset[0][\"expected_output\"], eval_result.result_dataset[0][\"llm_output\"], og_model_eval_result.result_dataset[0][\"llm_output\"]],\n", + "}\n", + "\n", + "print(tabulate(cmp_entry_dict, headers='keys', tablefmt='fancy_grid', showindex=(\"原始数据\", \"EB-Speed-SFT\", \"EB-Speed\")))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在以上训练评估的基础上,我们可以对模型的能力进行系统的评价->优化,直到我们的模型达到我们的期望,就通过以下方式进行服务的部署以实现线上的生产调用:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#-# cell_skip\n", + "from qianfan.model import Service, DeployConfig\n", + "from qianfan.model.consts import ServiceType\n", + "from qianfan.resources.console.consts import DeployPoolType\n", + "\n", + "sft_svc: Service = Model(version_id=\"amv-pzqtzdspm77m\").deploy(DeployConfig(\n", + " name=\"essay_correct\",\n", + " endpoint_prefix=\"essaycor\",\n", + " replicas=1,\n", + " pool_type=DeployPoolType.PrivateResource,\n", + " service_type=ServiceType.Chat,\n", + " # step: x,\n", + "))\n", + "\n", + "chat_comp: ChatCompletion = sft_svc.get_res()\n", + "sft_chat_resp = chat_comp.do([{\"content\": correction_prompt.render(**render_dict)[0], \"role\": \"user\"}])\n", + "sft_chat_resp[\"result\"]\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "bce-qianfan-sdk-new", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/python/pyproject.toml b/python/pyproject.toml index a6886f0b..601a47a4 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "qianfan" -version = "0.3.7" +version = "0.3.7.1" description = "文心千帆大模型平台 Python SDK" authors = [] license = "Apache-2.0" diff --git a/python/qianfan/dataset/data_source/chunk_reader.py b/python/qianfan/dataset/data_source/chunk_reader.py index 9691e56a..ab6ad6c9 100644 --- a/python/qianfan/dataset/data_source/chunk_reader.py +++ b/python/qianfan/dataset/data_source/chunk_reader.py @@ -54,7 +54,7 @@ def get_chunk(self, chunk_size: int = 0) -> List[Any]: except StopIteration: break except Exception as e: - err_msg = f"exception occurred during read csv file streamly: {e}" + err_msg = f"exception occurred during read file streamly: {e}" log_error(err_msg) raise e diff --git a/python/qianfan/evaluation/evaluation_manager.py b/python/qianfan/evaluation/evaluation_manager.py index c212136a..e19059e2 100644 --- a/python/qianfan/evaluation/evaluation_manager.py +++ b/python/qianfan/evaluation/evaluation_manager.py @@ -16,6 +16,7 @@ """ manager which manage whole procedure of evaluation """ +import json import math import multiprocessing import os.path @@ -29,6 +30,7 @@ import pyarrow from qianfan import get_config +from qianfan.config import encoding from qianfan.dataset import Dataset from qianfan.dataset.consts import ( LLMOutputColumnName, @@ -36,6 +38,7 @@ OldReferenceColumnName, ) from qianfan.dataset.data_source import FileDataSource, QianfanDataSource +from qianfan.dataset.data_source.chunk_reader import JsonLineReader from qianfan.dataset.data_source.utils import ( _download_file_from_url_streamly, ) @@ -61,6 +64,44 @@ from qianfan.utils.utils import generate_letter_num_random_id +def _convert_the_value_in_evaluation_into_str(json_line_path: str) -> str: + reader = JsonLineReader(json_line_path) + new_file_path = os.path.join( + os.path.split(json_line_path)[0], "tmp_eval_modify.jsonl" + ) + + is_judge_reason_existed_checked: bool = False + + with open(new_file_path, mode="w", encoding=encoding()) as f: + for entry in reader: + for inner_list in entry: + for single_entry in inner_list: + # 判断是否包含 judge_reason,如果包含则退出 + if "evaluation" not in single_entry or ( + not is_judge_reason_existed_checked + and not any( + [ + v == "judge_reason" + for item in single_entry["evaluation"] + for _, v in item.items() + ] + ) + ): + return json_line_path + + is_judge_reason_existed_checked = True + + single_entry["evaluation"] = [ + {k: str(v) for k, v in item.items()} + for item in single_entry["evaluation"] + ] + + json.dump(inner_list, f, ensure_ascii=False) + f.write("\n") + + return new_file_path + + class EvaluationManager(BaseModel): """logic control center of evaluation""" @@ -481,6 +522,8 @@ def eval( # 整合数据,将得到的数据集整合成网页人工评估的数据集格式 log_info("start to merge evaluation result dataset") table_list: List[pyarrow.Table] = [] + metrics_dict: Dict[str, Dict[str, Any]] = {} + for index, response_list in llm_response_list.items(): index_tag_column = [llm_tags[index] for _ in range(len(response_list))] ds = dataset.create_from_pyobj( @@ -495,13 +538,25 @@ def eval( metrics_ds = dataset.create_from_pyobj( llm_evaluation_result_dict[index] ) + ds.col_append(metrics_ds.col_list()) table_list.append(ds.inner_table) + summarization_dict: Dict[str, Any] = {} + + for evaluator in self.local_evaluators: + summarization = evaluator.summarize(metrics_ds) + if summarization: + summarization_dict.update(summarization) + + if summarization_dict: + metrics_dict[llm_tags[index]] = summarization_dict + return EvaluationResult( result_dataset=Dataset.create_from_pyarrow_table( pyarrow.concat_tables(table_list) - ) + ), + metrics=metrics_dict, ) if self.qianfan_evaluators: @@ -594,6 +649,10 @@ def eval( with zipfile.ZipFile(local_cache_file_path) as zip_f: zip_f.extractall(unfold_zip_file_path) + data_jsonl_file_path = _convert_the_value_in_evaluation_into_str( + data_jsonl_file_path + ) + # 返回指标信息 return EvaluationResult( result_dataset=Dataset.load( diff --git a/python/qianfan/evaluation/evaluator.py b/python/qianfan/evaluation/evaluator.py index 9b913c32..db046f3f 100644 --- a/python/qianfan/evaluation/evaluator.py +++ b/python/qianfan/evaluation/evaluator.py @@ -19,6 +19,7 @@ from abc import ABC, abstractmethod from typing import Any, Dict, List, Optional, Union +from qianfan.dataset import Dataset from qianfan.evaluation.consts import ( QianfanRefereeEvaluatorDefaultMaxScore, QianfanRefereeEvaluatorDefaultMetrics, @@ -37,6 +38,21 @@ def evaluate( ) -> Dict[str, Any]: """evaluate one entry""" + def summarize(self, metric_dataset: Dataset) -> Optional[Dict[str, Any]]: + """ + The default implementation of summarize interface, + which is designed to get a summarization from custom metrics + + Args: + metric_dataset (Dataset): a Dataset object containing all metrics + for one specific evaluated llm + + Returns: + Optional[Dict[str, Any]]: A dict including summarization info, or None + for nothing happened + """ + return None + class LocalEvaluator(Evaluator, ABC): """