注意

本项目代码包含多个文件, Fork并使用GPU环境来运行后, 才能看到项目完整代码, 并正确运行:

并请检查相关参数设置, 例如use_gpu, fluid.CUDAPlace(0)等处是否设置正确.

下载安装命令

## CPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

Auto Dialogue Evaluation

简介

任务说明

对话自动评估（Auto Dialogue Evaluation）评估开放领域对话系统的回复质量，能够帮助企业或个人快速评估对话系统的回复质量，减少人工评估成本。

在无标注数据的情况下，利用负采样训练匹配模型作为评估工具，实现对多个对话系统回复质量排序；
利用少量标注数据（特定对话系统或场景的人工打分），在匹配模型基础上进行微调，可以显著提高该对话系统或场景的评估效果。

效果说明

我们以四个不同的对话系统（seq2seq_naive／seq2seq_att／keywords／human）为例，使用对话自动评估工具进行自动评估。

无标注数据情况下，直接使用预训练好的评估工具进行评估；在四个对话系统上，自动评估打分和人工评估打分spearman相关系数，如下：

/ seq2seq_naive seq2seq_att keywords human

cor 0.361 0.343 0.324 0.288

对四个系统平均得分排序：

人工评估 k(0.591)<n(0.847)<a(1.116)<h(1.240)

自动评估 k(0.625)<n(0.909)<a(1.399)<h(1.683)
利用少量标注数据微调后，自动评估打分和人工打分spearman相关系数，如下：

/ seq2seq_naive seq2seq_att keywords human

cor 0.474 0.477 0.443 0.378

/	seq2seq_naive	seq2seq_att	keywords	human
cor	0.361	0.343	0.324	0.288

人工评估	k(0.591)<n(0.847)<a(1.116)<h(1.240)
自动评估	k(0.625)<n(0.909)<a(1.399)<h(1.683)

/	seq2seq_naive	seq2seq_att	keywords	human
cor	0.474	0.477	0.443	0.378

任务定义与建模

对话自动评估任务输入是文本对（上文，回复），输出是回复质量得分。

模型原理介绍

匹配任务（预测上下文是否匹配）和自动评估任务有天然的联系，该项目利用匹配任务作为自动评估的预训练；

利用少量标注数据，在匹配模型基础上微调。

数据格式说明

训练、预测、评估使用的数据示例如下，数据由三列组成，以制表符（'\t'）分隔，第一列是以空格分开的上文id，第二列是以空格分开的回复id，第三列是标签

723 236 7823 12 8     887 13 77 4       2
8474 13 44 34         2 87 91 23       0

注：本项目额外提供了分词预处理脚本（在preprocess目录下），可供用户使用，具体使用方法如下：

python tokenizer.py --test_data_dir ./test.txt.utf8 --batch_size 1 > test.txt.utf8.seg

代码结构说明

main.py：该项目的主函数，封装包括训练、预测、评估的部分

config.py：定义了该项目模型的相关配置，包括具体模型类别、以及模型的超参数

reader.py：定义了读入数据，加载词典的功能

evaluation.py：定义评估函数

init.py:定义模型load函数

run.sh：训练、预测、评估运行脚本

文件介绍

auto_dialogue_evaluation/: 存放对话自动评估模型的主要执行文件

auto_dialogue_evaluation/data目录下存放unlabel_data(train.ids/val.ids/test.ids)，lable_data(四个任务数据train.ids/val.ids/test.ids)，以及word2ids，

auto_dialogue_evaluation/model_files存放预训练模型

auto_dialogue_evaluation/model_files_tmp存放训练好的模型

models:共享的模型集合

preprocess:共享的数据预处理流程

本例算法运行基于GPU，若采用CPU，请调用run_CPU.sh

In[1]

#可以通过以下方式查看数据格式，此处仅展示部分
f = open('auto_dialogue_evaluation/data/label_data/seq2seq_att/train.ids', "r")
i=0
for line in f:
    i=i+1
    print(line)
    if i > 10:
        break

8 644 10 98 4494 3 19 218 18 182 464 3 20 32 15 837 880 10 27 32 1128 105 32 10 98 63 25 32	7 63 0	0

13863 1348 21 693 20 515 82518 181 1590 3 911 16015 5 2	5681 3 12274 8077	0

186 9 7 322 9761 3 20 622 5 1135 3 7726 17 9 22 51 3	1610 2 5374 5714	1

1224 134 3 1109 458 528 92 42 51 99 19 4 4621 94 73 2 149 151 94 73 4 53 720 25 453 123	104 4621 81	1

943 2 3303 723 4 4 130 3228 4	7 10 12 11750 3	0

1850 2612 534	12199 62 55 187 118 29	1

32138 820 9651 375 149 37 64 481 25	10 12 1048 3 1683	0

1494 5535 1479 1395 213 26815 944 1574 110886 183 38 5535 733 19 11	18 5535 2 5535	1

2543 663 50 64 10114 94 73 11	138 8 5223 11	0

753 6 21 5 2 4028 213 257 249 753 1955 913 2 18 16566 2 37 184 11	8 554 19 797	1

7 26 59 116 586 15663 11634 2 375 4831 3 2 25471 1534	13 26 275 30 564	1

In[1]

#基于示例的数据集，可以运行下面的命令，进行第一阶段训练,训练参数详见run.sh文件，可根据自己的需要，修改训练参数。
!cd auto_dialogue_evaluation && sh run.sh train

-----------  Configuration Arguments -----------
batch_size: 256
do_infer: False
do_train: True
do_val: False
emb_size: 256
hidden_size: 256
init_model: None
learning_rate: 0.001
loss_type: CLS
max_len: 50
num_scan_data: 50
out_path: None
print_step: 50
sample_pro: 1
save_path: model_files_tmp/matching_pretrained
save_step: 10
test_path: None
train_path: data/unlabel_data/train.ids
use_cuda: True
val_path: data/unlabel_data/val.ids
vocab_size: 484016
word_emb_init: None
------------------------------------------------
context_rep shape: (-1L, 256L)
response_rep shape: (-1L, 256L)
logits shape: (-1L, 1L)
before reduce mean loss shape: (-1L, 1L)
after reduce mean loss shape: (1L,)
begin memory optimization ...
2020-03-03 20:15:27,444-WARNING: Caution! paddle.fluid.memory_optimize() is deprecated and not maintained any more, since it is not stable!
This API would not take any memory optimizations on your Program now, since we have provided default strategies for you.
The newest and stable memory optimization strategies (they are all enabled by default) are as follows:
 1. Garbage collection strategy, which is enabled by exporting environment variable FLAGS_eager_delete_tensor_gb=0 (0 is the default value).
 2. Inplace strategy, which is enabled by setting build_strategy.enable_inplace=True (True is the default value) when using CompiledProgram or ParallelExecutor.

end memory optimization ...
context_rep shape: (-1L, 256L)
response_rep shape: (-1L, 256L)
logits shape: (-1L, 1L)
before reduce mean loss shape: (-1L, 1L)
after reduce mean loss shape: (1L,)
device count 1
theoretical memory usage:
(1519.714105796814, 1592.081444168091, 'MB')
W0303 20:15:28.347313    92 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0303 20:15:28.350337    92 device_context.cc:244] device: 0, cuDNN Version: 7.3.
start loading data ...
I0303 20:15:29.676050    92 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0303 20:15:29.677879    92 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0303 20:15:29.679220    92 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0303 20:15:29.680267    92 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Pass 0, pass_time_cost 0.05 sec
Pass 1, pass_time_cost 0.03 sec
Pass 2, pass_time_cost 0.03 sec
Pass 3, pass_time_cost 0.03 sec
Pass 4, pass_time_cost 0.03 sec
Pass 5, pass_time_cost 0.03 sec
Pass 6, pass_time_cost 0.03 sec
Pass 7, pass_time_cost 0.03 sec
Pass 8, pass_time_cost 0.03 sec
share_vars_from is set, scope is ignored.
I0303 20:15:29.968084    92 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0303 20:15:29.968631    92 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0303 20:15:29.968953    92 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0303 20:15:29.969275    92 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
length=25
evaluation recall result:
1_in_2: 0.6	1_in_10: 0.12	2_in_10: 0.32	5_in_10: 0.68
Save model at step 10 ...
2020-03-03 20:15:33
Pass 9, pass_time_cost 3.89 sec
Pass 10, pass_time_cost 0.03 sec
Pass 11, pass_time_cost 0.03 sec
Pass 12, pass_time_cost 0.03 sec
Pass 13, pass_time_cost 0.03 sec
Pass 14, pass_time_cost 0.03 sec
Pass 15, pass_time_cost 0.03 sec
Pass 16, pass_time_cost 0.03 sec
Pass 17, pass_time_cost 0.03 sec
Pass 18, pass_time_cost 0.03 sec
length=25
evaluation recall result:
1_in_2: 0.6	1_in_10: 0.16	2_in_10: 0.28	5_in_10: 0.68
Save model at step 20 ...
2020-03-03 20:15:37
Pass 19, pass_time_cost 3.87 sec
Pass 20, pass_time_cost 0.03 sec
Pass 21, pass_time_cost 0.03 sec
Pass 22, pass_time_cost 0.03 sec
Pass 23, pass_time_cost 0.03 sec
Pass 24, pass_time_cost 0.03 sec
Pass 25, pass_time_cost 0.03 sec
Pass 26, pass_time_cost 0.03 sec
Pass 27, pass_time_cost 0.03 sec
Pass 28, pass_time_cost 0.03 sec
length=25
evaluation recall result:
1_in_2: 0.52	1_in_10: 0.16	2_in_10: 0.32	5_in_10: 0.64
Pass 29, pass_time_cost 0.04 sec
Pass 30, pass_time_cost 0.03 sec
Pass 31, pass_time_cost 0.03 sec
Pass 32, pass_time_cost 0.03 sec
Pass 33, pass_time_cost 0.03 sec
Pass 34, pass_time_cost 0.03 sec
Pass 35, pass_time_cost 0.03 sec
Pass 36, pass_time_cost 0.03 sec
Pass 37, pass_time_cost 0.03 sec
Pass 38, pass_time_cost 0.03 sec
length=25
evaluation recall result:
1_in_2: 0.52	1_in_10: 0.12	2_in_10: 0.2	5_in_10: 0.6
Pass 39, pass_time_cost 0.04 sec
Pass 40, pass_time_cost 0.03 sec
Pass 41, pass_time_cost 0.03 sec
Pass 42, pass_time_cost 0.03 sec
Pass 43, pass_time_cost 0.03 sec
Pass 44, pass_time_cost 0.03 sec
Pass 45, pass_time_cost 0.03 sec
Pass 46, pass_time_cost 0.03 sec
Pass 47, pass_time_cost 0.03 sec
Pass 48, pass_time_cost 0.03 sec
length=25
evaluation recall result:
1_in_2: 0.44	1_in_10: 0.08	2_in_10: 0.08	5_in_10: 0.56
training step 50 avg loss 0.0003279643505811691
Pass 49, pass_time_cost 0.04 sec

In[2]

#以召回率的形式对模型结果进行评估，也可以通过其他形式评估，具体参见run.sh文件
!cd auto_dialogue_evaluation && sh run.sh eval_recall

-----------  Configuration Arguments -----------
batch_size: 256
do_infer: False
do_train: False
do_val: True
emb_size: 256
hidden_size: 256
init_model: model_files_tmp/matching_pretrained
learning_rate: 0.001
loss_type: CLS
max_len: 50
num_scan_data: 50
out_path: None
print_step: 50
sample_pro: 1
save_path: tmp
save_step: 10
test_path: data/unlabel_data/test.ids
train_path: None
use_cuda: True
val_path: None
vocab_size: 484016
word_emb_init: None
------------------------------------------------
W0303 20:16:00.303758   142 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0303 20:16:00.307708   142 device_context.cc:244] device: 0, cuDNN Version: 7.3.
init model model_files_tmp/matching_pretrained
len scores: 256 len labels: 256
length=25
mean score: -0.12624257184143062
evaluation recall result:
1_in_2: 0.48	1_in_10: 0.12	2_in_10: 0.24	5_in_10: 0.4
finish evaluate model:model_files_tmp/matching_pretrained on data:data/unlabel_data/test.ids time_cost(s):2.76

In[3]

#基于预训练模型和数据，可以运行下面的命令直接对对话数据进行打分
!cd auto_dialogue_evaluation && sh run.sh infer

-----------  Configuration Arguments -----------
batch_size: 256
do_infer: True
do_train: False
do_val: False
emb_size: 256
hidden_size: 256
init_model: model_files_tmp/human_finetuned
learning_rate: 0.001
loss_type: CLS
max_len: 50
num_scan_data: 50
out_path: None
print_step: 50
sample_pro: 1
save_path: tmp
save_step: 10
test_path: data/label_data/human/test.ids
train_path: None
use_cuda: True
val_path: None
vocab_size: 484016
word_emb_init: None
------------------------------------------------
W0303 20:17:02.407306   180 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0303 20:17:02.411096   180 device_context.cc:244] device: 0, cuDNN Version: 7.3.
finish infer model:model_files_tmp/human_finetuned out file: data/label_data/human/test.ids.infer time_cost(s):2.92

In[4]

#在第一阶段训练基础上，可利用少量标注数据进行第二阶段训练，进行finetune
!cd auto_dialogue_evaluation && sh run.sh finetune

-----------  Configuration Arguments -----------
batch_size: 256
do_infer: False
do_train: True
do_val: False
emb_size: 256
hidden_size: 256
init_model: model_files_tmp/matching_pretrained
learning_rate: 0.001
loss_type: L2
max_len: 50
num_scan_data: 50
out_path: None
print_step: 1
sample_pro: 1
save_path: model_files_tmp/human_finetuned
save_step: 1
test_path: None
train_path: data/label_data/human/train.ids
use_cuda: True
val_path: data/label_data/human/val.ids
vocab_size: 484016
word_emb_init: None
------------------------------------------------
context_rep shape: (-1L, 256L)
response_rep shape: (-1L, 256L)
logits shape: (-1L, 1L)
begin memory optimization ...
2020-03-03 20:17:10,301-WARNING: Caution! paddle.fluid.memory_optimize() is deprecated and not maintained any more, since it is not stable!
This API would not take any memory optimizations on your Program now, since we have provided default strategies for you.
The newest and stable memory optimization strategies (they are all enabled by default) are as follows:
 1. Garbage collection strategy, which is enabled by exporting environment variable FLAGS_eager_delete_tensor_gb=0 (0 is the default value).
 2. Inplace strategy, which is enabled by setting build_strategy.enable_inplace=True (True is the default value) when using CompiledProgram or ParallelExecutor.

end memory optimization ...
context_rep shape: (-1L, 256L)
response_rep shape: (-1L, 256L)
logits shape: (-1L, 1L)
device count 1
theoretical memory usage:
(1519.720290184021, 1592.087923049927, 'MB')
W0303 20:17:11.208235   218 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0303 20:17:11.211208   218 device_context.cc:244] device: 0, cuDNN Version: 7.3.
Load pretraining parameters from model_files_tmp/matching_pretrained.
sccuess init model_files_tmp/matching_pretrained
start loading data ...
I0303 20:17:14.004169   218 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0303 20:17:14.006306   218 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0303 20:17:14.007922   218 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0303 20:17:14.009212   218 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
share_vars_from is set, scope is ignored.
I0303 20:17:14.040747   218 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0303 20:17:14.041477   218 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0303 20:17:14.041981   218 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0303 20:17:14.042467   218 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
evaluation cor relevance -0.045655154934289226
training step 1 avg loss 0.20214074850082397
Pass 0, pass_time_cost 0.06 sec
evaluation cor relevance -0.03812301431872137
training step 2 avg loss 0.16594290733337402
Pass 1, pass_time_cost 0.04 sec
evaluation cor relevance -0.03420949590753319
training step 3 avg loss 0.1503480225801468
Pass 2, pass_time_cost 0.04 sec
evaluation cor relevance -0.026657388361296008
training step 4 avg loss 0.13701853156089783
Pass 3, pass_time_cost 0.04 sec
evaluation cor relevance -0.026197380997029576
training step 5 avg loss 0.12259398400783539
Pass 4, pass_time_cost 0.04 sec
evaluation cor relevance -0.01672844810653696
training step 6 avg loss 0.10547607392072678
Pass 5, pass_time_cost 0.04 sec
evaluation cor relevance 0.02767570182543171
Save model at step 7 ...
2020-03-03 20:17:18
training step 7 avg loss 0.08713535219430923
Pass 6, pass_time_cost 3.88 sec
evaluation cor relevance 0.05709543824894198
Save model at step 8 ...
2020-03-03 20:17:21
training step 8 avg loss 0.07621623575687408
Pass 7, pass_time_cost 3.85 sec
evaluation cor relevance 0.08186671962085844
Save model at step 9 ...
2020-03-03 20:17:25
training step 9 avg loss 0.0698356181383133
Pass 8, pass_time_cost 3.86 sec
evaluation cor relevance 0.06082311061313105
training step 10 avg loss 0.0569886639714241
Pass 9, pass_time_cost 0.04 sec
evaluation cor relevance 0.0347839291437123
training step 11 avg loss 0.04764831066131592
Pass 10, pass_time_cost 0.04 sec
evaluation cor relevance 0.0381713957276509
training step 12 avg loss 0.03954772651195526
Pass 11, pass_time_cost 0.04 sec
evaluation cor relevance 0.02633714951171487
training step 13 avg loss 0.03392593562602997
Pass 12, pass_time_cost 0.04 sec
evaluation cor relevance 0.02831387564797829
training step 14 avg loss 0.030515454709529877
Pass 13, pass_time_cost 0.04 sec
evaluation cor relevance 0.03238175379241782
training step 15 avg loss 0.027258876711130142
Pass 14, pass_time_cost 0.04 sec
evaluation cor relevance 0.0218399823578848
training step 16 avg loss 0.023921750485897064
Pass 15, pass_time_cost 0.04 sec
evaluation cor relevance 0.02820405752929699
training step 17 avg loss 0.021650437265634537
Pass 16, pass_time_cost 0.04 sec
evaluation cor relevance 0.03996381173465899
training step 18 avg loss 0.019501041620969772
Pass 17, pass_time_cost 0.04 sec
evaluation cor relevance 0.04563058025038851
training step 19 avg loss 0.017170080915093422
Pass 18, pass_time_cost 0.04 sec
evaluation cor relevance 0.05073597083076118
training step 20 avg loss 0.015909677371382713
Pass 19, pass_time_cost 0.12 sec
evaluation cor relevance 0.046565186197487424
training step 21 avg loss 0.013923028483986855
Pass 20, pass_time_cost 0.04 sec
evaluation cor relevance 0.05044875421267162
training step 22 avg loss 0.012706486508250237
Pass 21, pass_time_cost 0.04 sec
evaluation cor relevance 0.053344727118596
training step 23 avg loss 0.011962947435677052
Pass 22, pass_time_cost 0.04 sec
evaluation cor relevance 0.055481956659085944
training step 24 avg loss 0.011046133935451508
Pass 23, pass_time_cost 0.04 sec
evaluation cor relevance 0.05437763180129775
training step 25 avg loss 0.010488426312804222
Pass 24, pass_time_cost 0.04 sec
evaluation cor relevance 0.0553283648847065
training step 26 avg loss 0.010120008140802383
Pass 25, pass_time_cost 0.04 sec
evaluation cor relevance 0.05274495123964428
training step 27 avg loss 0.009528218768537045
Pass 26, pass_time_cost 0.04 sec
evaluation cor relevance 0.046217300828517986
training step 28 avg loss 0.009135609492659569
Pass 27, pass_time_cost 0.04 sec
evaluation cor relevance 0.04565438697541733
training step 29 avg loss 0.008912760764360428
Pass 28, pass_time_cost 0.04 sec
evaluation cor relevance 0.04975528735134844
training step 30 avg loss 0.00856935977935791
Pass 29, pass_time_cost 0.04 sec
evaluation cor relevance 0.05003098458635954
training step 31 avg loss 0.008245025761425495
Pass 30, pass_time_cost 0.04 sec
evaluation cor relevance 0.04752513478735894
training step 32 avg loss 0.008075003512203693
Pass 31, pass_time_cost 0.04 sec
evaluation cor relevance 0.04078091997435762
training step 33 avg loss 0.007863759063184261
Pass 32, pass_time_cost 0.04 sec
evaluation cor relevance 0.03466259164195254
training step 34 avg loss 0.007645013276487589
Pass 33, pass_time_cost 0.04 sec
evaluation cor relevance 0.03227270363260842
training step 35 avg loss 0.007561428472399712
Pass 34, pass_time_cost 0.04 sec
evaluation cor relevance 0.03320577366196353
training step 36 avg loss 0.007472529541701078
Pass 35, pass_time_cost 0.04 sec
evaluation cor relevance 0.035791491183641444
training step 37 avg loss 0.007373311556875706
Pass 36, pass_time_cost 0.04 sec
evaluation cor relevance 0.03418108142927299
training step 38 avg loss 0.007349514868110418
Pass 37, pass_time_cost 0.04 sec
evaluation cor relevance 0.031033985972238214
training step 39 avg loss 0.007251190487295389
Pass 38, pass_time_cost 0.04 sec
evaluation cor relevance 0.029633228989897703
training step 40 avg loss 0.00710717961192131
Pass 39, pass_time_cost 0.04 sec
evaluation cor relevance 0.031149179803022797
training step 41 avg loss 0.00701714726164937
Pass 40, pass_time_cost 0.04 sec
evaluation cor relevance 0.03398755579355489
training step 42 avg loss 0.006880040280520916
Pass 41, pass_time_cost 0.04 sec
evaluation cor relevance 0.038649834104842865
training step 43 avg loss 0.006781894713640213
Pass 42, pass_time_cost 0.04 sec
evaluation cor relevance 0.039020758239969214
training step 44 avg loss 0.006729709915816784
Pass 43, pass_time_cost 0.04 sec
evaluation cor relevance 0.03814682104375019
training step 45 avg loss 0.006635408382862806
Pass 44, pass_time_cost 0.04 sec
evaluation cor relevance 0.03727365180640306
training step 46 avg loss 0.0065619852393865585
Pass 45, pass_time_cost 0.04 sec
evaluation cor relevance 0.03844171725055872
training step 47 avg loss 0.006483430974185467
Pass 46, pass_time_cost 0.04 sec
evaluation cor relevance 0.03844402112717441
training step 48 avg loss 0.006373999174684286
Pass 47, pass_time_cost 0.04 sec
evaluation cor relevance 0.039338693212934664
training step 49 avg loss 0.0063008773140609264
Pass 48, pass_time_cost 0.04 sec
evaluation cor relevance 0.03943545603079371
training step 50 avg loss 0.0062194340862333775
Pass 49, pass_time_cost 0.04 sec

In[5]

#使用固化的模型进行预测，结果保存在out_path
#此处仅打印一例结果，分别为上文，回复，回复质量得分
!cd auto_dialogue_evaluation/ && python freeze_infer.py --test_path data/label_data/human/test.ids \
                                                        --model_path model_files_tmp/human_finetuned \
                                                        --use_cuda False \
                                                        --out_path infer_result.txt

-----------  Configuration Arguments -----------
batch_size: 256
max_len: 50
model_path: model_files_tmp/human_finetuned
out_path: infer_result.txt
test_path: data/label_data/human/test.ids
use_cuda: 0
------------------------------------------------
('context_wordseq:', '3354 1604 1100 6564 97 694 326 11')
('response_wordseq:', '1604 48 6564 97 2 9 763 97')
('scores:', 1.5645232)
finish infer model:model_files_tmp/human_finetuned out file: infer_result.txt time_cost(s):3.80

点击链接，使用AI Studio一键上手实践项目吧：https://aistudio.baidu.com/aistudio/projectdetail/122301

下载安装命令

## CPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 访问 PaddlePaddle 官网，了解更多相关内容。

飞桨PaddlePaddle

基于PaddlePaddle的对话自动评估模型实现

注意