注意:为了回答基于文本文档的问题,我们建议使用使用嵌入的问答中的步骤。 以下部分代码可能依赖于已弃用的 API 端点。
注意:为了回答基于文本文档的问题,我们建议使用使用嵌入的问答中的步骤。 以下部分代码可能依赖于已弃用的 API 端点。
本笔记本将利用上下文、问题和答案对的数据集,额外创建对抗性问题和上下文对,其中问题不是根据该上下文生成的。 在这些情况下,模型将被提示回答“没有足够的上下文来回答问题”。 我们还将训练一个判别器模型,该模型预测问题是否可以基于上下文回答。
我们还将添加更难的对抗性示例,这些示例将基于语义相似的章节或来自同一文章的相邻章节。
import openai
import pandas as pd
df = pd.read_csv('olympics-data/olympics_qa.csv')
olympics_search_fileid = "file-c3shd8wqF3vSCKaukW4Jr1TT"
df.head()
标题 | 标题 | 内容 | tokens | 上下文 | 问题 | 答案 | |
---|---|---|---|---|---|---|---|
0 | 2020年夏季奥林匹克运动会 | 摘要 | 2020 年夏季奥林匹克运动会(日语:2020年夏季オリン... | 713 | 2020年夏季奥林匹克运动会\n摘要\n\n2020 年夏季奥... | 1. 什么是 2020 年夏季奥林匹克运动会?\n2. 何时 ... | 1. 2020 年夏季奥林匹克运动会是一项国际性... |
1 | 2020年夏季奥林匹克运动会 | 主办城市选择 | 国际奥林匹克委员会 (IOC) 投票... | 126 | 2020年夏季奥林匹克运动会\n主办城市选择\n\n国... | 1. \n2. \n3. \n4. | 1. 什么是国际奥林匹克委员会... |
2 | 2020年夏季奥林匹克运动会 | COVID-19 疫情的影响 | 2020 年 1 月,人们开始对... | 369 | 2020年夏季奥林匹克运动会\nCOVID-19 疫情的影响\n... | 1. 什么是 COVID-19 疫情?\n2. 疫情如何... | 1. COVID-19 疫情是一场始于... |
3 | 2020年夏季奥林匹克运动会 | 资格赛取消和延期 | 对疫情的担忧开始影响到资格赛... | 298 | 2020年夏季奥林匹克运动会\n资格赛取消和延期\n... | 1. 亚洲原定资格赛的地点在哪里... | 1. 亚洲和大洋洲资格赛的原定地点是... |
4 | 2020年夏季奥林匹克运动会 | 对兴奋剂检测的影响 | 强制性兴奋剂检测受到严重限制... | 163 | 2020年夏季奥林匹克运动会\n对兴奋剂检测的影响\n... | 1. 什么是 COVID-19 疫情?\n2. 什么导致... | 1. COVID-19 疫情是一场始于... |
将章节拆分为训练集和测试集
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
len(train_df), len(test_df)
(3014, 754)
我们检查我们打算使用的分隔符是否不存在于上下文中
df.context.str.contains('->').sum()
0
微调数据集按以下方式创建。 对于每对对应的问题、答案和上下文对,我们创建
这个过程是有噪声的,因为有时问题在不同的上下文中也可能可以回答,但平均而言,我们希望这不会对性能产生太大影响。
我们对判别器模型和问答模型应用相同的数据集创建过程。 我们分别对训练集和测试集应用该过程,以确保训练集中的示例不会出现在测试集中。
import random
def get_random_similar_contexts(question, context, file_id=olympics_search_fileid, search_model='ada', max_rerank=10):
"""
Find similar contexts to the given context using the search file
"""
try:
# TODO: openai.Engine(search_model) is deprecated
results = openai.Engine(search_model).search(
search_model=search_model,
query=question,
max_rerank=max_rerank,
file=file_id
)
candidates = []
for result in results['data'][:3]:
if result['text'] == context:
continue
candidates.append(result['text'])
random_candidate = random.choice(candidates)
return random_candidate
except Exception as e:
print(e)
return ""
def create_fine_tuning_dataset(df, discriminator=False, n_negative=1, add_related=False):
"""
Create a dataset for fine tuning the OpenAI model; either for a discriminator model,
or a model specializing in Q&A, where it says if no relevant context is found.
Parameters
----------
df: pd.DataFrame
The dataframe containing the question, answer and context pairs
discriminator: bool
Whether to create a dataset for the discriminator
n_negative: int
The number of random negative samples to add (using a random context)
add_related: bool
Whether to add the related contexts to the correct context. These are hard negative examples
Returns
-------
pd.DataFrame
The dataframe containing the prompts and completions, ready for fine-tuning
"""
rows = []
for i, row in df.iterrows():
for q, a in zip(("1." + row.questions).split('\n'), ("1." + row.answers).split('\n')):
if len(q) >10 and len(a) >10:
if discriminator:
rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\n Related:", "completion":f" yes"})
else:
rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\nAnswer:", "completion":f" {a[2:].strip()}"})
for i, row in df.iterrows():
for q in ("1." + row.questions).split('\n'):
if len(q) >10:
for j in range(n_negative + (2 if add_related else 0)):
random_context = ""
if j == 0 and add_related:
# add the related contexts based on originating from the same wikipedia page
subset = df[(df.title == row.title) & (df.context != row.context)]
if len(subset) < 1:
continue
random_context = subset.sample(1).iloc[0].context
if j == 1 and add_related:
# add the related contexts based on the most similar contexts according to the search
random_context = get_random_similar_contexts(q[2:].strip(), row.context, search_model='ada', max_rerank=10)
else:
while True:
# add random context, which isn't the correct context
random_context = df.sample(1).iloc[0].context
if random_context != row.context:
break
if discriminator:
rows.append({"prompt":f"{random_context}\nQuestion: {q[2:].strip()}\n Related:", "completion":f" no"})
else:
rows.append({"prompt":f"{random_context}\nQuestion: {q[2:].strip()}\nAnswer:", "completion":f" No appropriate context found to answer the question."})
return pd.DataFrame(rows)
我们对判别器模型和问答模型应用相同的数据集创建过程。 我们分别对训练集和测试集应用该过程,以确保训练集中的示例不会出现在测试集中。
for name, is_disc in [('discriminator', True), ('qa', False)]:
for train_test, dt in [('train', train_df), ('test', test_df)]:
ft = create_fine_tuning_dataset(dt, discriminator=is_disc, n_negative=1, add_related=True)
ft.to_json(f'{name}_{train_test}.jsonl', orient='records', lines=True)
我们根据微调工具的建议格式化了数据,该工具可以使用
openai tools fine_tunes.prepare_data -f qa_train.jsonl
我们强烈建议您使用此工具,它可以建议改进数据格式以进行微调。
!openai api fine_tunes.create -t "olympics-data/discriminator_train.jsonl" -v "olympics-data/discriminator_test.jsonl" --batch_size 16 --compute_classification_metrics --classification_positive_class " yes" --model ada
!openai api fine_tunes.create -t "olympics-data/qa_train.jsonl" -v "olympics-data/qa_test.jsonl" --batch_size 16
我们现在将使用微调的判别器和微调的问答模型。 通过请求 logprobs,我们可以看到判别器在 yes
与 no
答案中有多确定。
ft_discriminator = "curie:ft-openai-internal-2021-08-23-23-58-57"
ft_qa = "curie:ft-openai-internal-2021-08-23-17-54-10"
def apply_ft_discriminator(context, question, discriminator_model):
"""
Apply the fine tuned discriminator to a question, to assess whether it can be answered from the context.
"""
prompt = f"{context}\nQuestion: {question}\n Related:"
result = openai.chat.completions.create(model=discriminator_model, prompt=prompt, max_tokens=1, temperature=0, top_p=1, n=1, logprobs=2)
return result['choices'][0]['logprobs']['top_logprobs']
apply_ft_discriminator('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',
'What was the first human-made object in space?', ft_discriminator)
[<OpenAIObject at 0x7fe812e602b0> JSON: { " no": -10.819577, " yes": -2.045765e-05 }]
我们可以看到,该模型可以很好地泛化到不同的上下文和问题。
def apply_ft_qa_answer(context, question, answering_model):
"""
Apply the fine tuned discriminator to a question
"""
prompt = f"{context}\nQuestion: {question}\nAnswer:"
result = openai.chat.completions.create(model=answering_model, prompt=prompt, max_tokens=30, temperature=0, top_p=1, n=1, stop=['.','\n'])
return result['choices'][0]['text']
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',
'What was the first human-made object in space?', ft_qa)
' The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957'
我们可以看到,当上下文合适时,模型可以回答问题。
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',
'What is impressive about the Soviet Union?', ft_qa)
' The Soviet Union was the first country to successfully launch a satellite into space'
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',
'How many cars were produced in the Soviet Union in 1970?', ft_qa)
' No appropriate context found to answer the question'
我们可以看到,模型知道何时回答问题,以及何时说明没有足够的上下文来回答问题。
我们还可以组合判别器和基础模型,或微调的问答模型。 判别器本质上可以充当一个决策器,判断在给定上下文的情况下问题是否可以回答。
def answer_question_conditionally(answering_model, discriminator_model, context, question, discriminator_logprob_yes_modifier=0):
logprobs = apply_ft_discriminator(context, question, discriminator_model)
yes_logprob = logprobs[' yes'] if ' yes' in logprobs else -100
no_logprob = logprobs[' no'] if ' no' in logprobs else -100
if yes_logprob + discriminator_logprob_yes_modifier < no_logprob:
return " No appropriate context found to answer the question based on the discriminator."
return apply_ft_qa_answer(context, question, answering_model)
answer_question_conditionally(ft_qa, ft_discriminator,
"Crowdless games are a rare although not unheard-of occurrence in sports. \
When they do occur, it is usually the result of events beyond the control \
of the teams or fans, such as weather-related concerns, public health concerns, \
or wider civil disturbances unrelated to the game. For instance, \
the COVID-19 pandemic caused many sports leagues around the world \
to be played behind closed doors.",
"Could weather cause a sport event to have no crowd?")
' Weather could cause a sport event to have no crowd'
上面的函数说明了如何潜在地组合判别器和微调的问答模型。 这可以更精细地控制我们希望模型在回答问题之前有多确定。
我们现在来看看 answers 端点是如何工作的 - 结合搜索从知识库中检索相关上下文,然后使用微调的问答模型来回答问题。
最后,我们可以使用类似于 /answers 端点的逻辑,我们首先搜索相关上下文,然后要求问答模型根据该上下文回答问题。 如果您想查看实现细节,请查看 answers_with_ft.py
文件。
from answers_with_ft import answer_question
answer_question(olympics_search_fileid, ft_qa, "Which country won the Women's football tournament at the 2020 Olympic games?")
" Canada won the Women's football tournament at the 2020 Olympic games"