使用 Langchain、Tair 和 OpenAI 进行问答

2023 年 9 月 11 日
在 Github 中打开

本笔记本展示了如何使用 Langchain 和 Tair 作为知识库,以及 OpenAI embeddings 来实现问答系统。如果您不熟悉 Tair,最好先查看 Getting_started_with_Tair_and_OpenAI.ipynb 笔记本。

本笔记本展示了一个端到端的流程,包括:

  • 使用 OpenAI API 计算 embeddings。
  • 将 embeddings 存储在 Tair 实例中,以构建知识库。
  • 使用 OpenAI API 将原始文本查询转换为 embedding。
  • 使用 Tair 在创建的集合中执行最近邻搜索,以查找一些上下文。
  • 要求 LLM 在给定的上下文中查找答案。

所有步骤都将简化为调用一些相应的 Langchain 方法。

安装要求

本笔记本需要以下 Python 包:openaitiktokenlangchaintair

  • openai 提供了对 OpenAI API 的便捷访问。
  • tiktoken 是一个快速的 BPE tokenizer,用于 OpenAI 的模型。
  • langchain 帮助我们更轻松地构建 LLM 应用程序。
  • tair 库用于与 tair 向量数据库交互。
! pip install openai tiktoken langchain tair 
Looking in indexes: http://sg.mirrors.cloud.aliyuncs.com/pypi/simple/
Requirement already satisfied: openai in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (0.28.0)
Requirement already satisfied: tiktoken in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (0.4.0)
Requirement already satisfied: langchain in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (0.0.281)
Requirement already satisfied: tair in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (1.3.6)
Requirement already satisfied: requests>=2.20 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from openai) (2.31.0)
Requirement already satisfied: tqdm in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from openai) (4.66.1)
Requirement already satisfied: aiohttp in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from openai) (3.8.5)
Requirement already satisfied: regex>=2022.1.18 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from tiktoken) (2023.8.8)
Requirement already satisfied: PyYAML>=5.3 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (6.0.1)
Requirement already satisfied: SQLAlchemy<3,>=1.4 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (2.0.20)
Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (4.0.3)
Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (0.5.14)
Requirement already satisfied: langsmith<0.1.0,>=0.0.21 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (0.0.33)
Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (2.8.5)
Requirement already satisfied: numpy<2,>=1 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (1.25.2)
Requirement already satisfied: pydantic<3,>=1 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (1.10.12)
Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from langchain) (8.2.3)
Requirement already satisfied: redis>=4.4.4 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from tair) (5.0.0)
Requirement already satisfied: attrs>=17.3.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (22.1.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (3.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from aiohttp->openai) (1.3.1)
Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (3.20.1)
Requirement already satisfied: typing-inspect<1,>=0.4.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (0.9.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from pydantic<3,>=1->langchain) (4.7.1)
Requirement already satisfied: idna<4,>=2.5 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from requests>=2.20->openai) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from requests>=2.20->openai) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from requests>=2.20->openai) (2023.7.22)
Requirement already satisfied: greenlet!=0.4.17 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)
Requirement already satisfied: packaging>=17.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)
Requirement already satisfied: mypy-extensions>=0.3.0 in /root/anaconda3/envs/notebook/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (1.0.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pythonlang.cn/warnings/venv

import getpass

openai_api_key = getpass.getpass("Input your OpenAI API key:")
Input your OpenAI API key:········
# The format of url: redis://[[username]:[password]]@localhost:6379/0
TAIR_URL = getpass.getpass("Input your tair url:")
Input your tair url:········

加载数据

在本节中,我们将加载包含一些自然问题及其答案的数据。所有数据将用于创建一个以 Tair 为知识库的 Langchain 应用程序。

import wget

# All the examples come from https://ai.google.com/research/NaturalQuestions
# This is a sample of the training set that we download and extract for some
# further processing.
wget.download("https://storage.googleapis.com/dataset-natural-questions/questions.json")
wget.download("https://storage.googleapis.com/dataset-natural-questions/answers.json")
100% [..............................................................................] 95372 / 95372
'answers (2).json'
import json

with open("questions.json", "r") as fp:
    questions = json.load(fp)

with open("answers.json", "r") as fp:
    answers = json.load(fp)
print(questions[0])
when is the last episode of season 8 of the walking dead
print(answers[0])
No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millions ) 100 `` Mercy '' Greg Nicotero Scott M. Gimple October 22 , 2017 ( 2017 - 10 - 22 ) 11.44 Rick , Maggie , and Ezekiel rally their communities together to take down Negan . Gregory attempts to have the Hilltop residents side with Negan , but they all firmly stand behind Maggie . The group attacks the Sanctuary , taking down its fences and flooding the compound with walkers . With the Sanctuary defaced , everyone leaves except Gabriel , who reluctantly stays to save Gregory , but is left behind when Gregory abandons him . Surrounded by walkers , Gabriel hides in a trailer , where he is trapped inside with Negan . 101 `` The Damned '' Rosemary Rodriguez Matthew Negrete & Channing Powell October 29 , 2017 ( 2017 - 10 - 29 ) 8.92 Rick 's forces split into separate parties to attack several of the Saviors ' outposts , during which many members of the group are killed ; Eric is critically injured and rushed away by Aaron . Jesus stops Tara and Morgan from executing a group of surrendered Saviors . While clearing an outpost with Daryl , Rick is confronted and held at gunpoint by Morales , a survivor he met in the initial Atlanta camp , who is now with the Saviors . 102 `` Monsters '' Greg Nicotero Matthew Negrete & Channing Powell November 5 , 2017 ( 2017 - 11 - 05 ) 8.52 Daryl finds Morales threatening Rick and kills him ; the duo then pursue a group of Saviors who are transporting weapons to another outpost . Gregory returns to Hilltop , and after a heated argument , Maggie ultimately allows him back in the community . Eric dies from his injuries , leaving Aaron distraught . Despite Tara and Morgan 's objections , Jesus leads the group of surrendered Saviors to Hilltop . Ezekiel 's group attacks another Savior compound , during which several Kingdommers are shot while protecting Ezekiel . 103 `` Some Guy '' Dan Liu David Leslie Johnson November 12 , 2017 ( 2017 - 11 - 12 ) 8.69 Ezekiel 's group is overwhelmed by the Saviors , who kill all of them except for Ezekiel himself and Jerry . Carol clears the inside of the compound , killing all but two Saviors , who almost escape but are eventually caught by Rick and Daryl . En route to the Kingdom , Ezekiel , Jerry , and Carol are surrounded by walkers , but Shiva sacrifices herself to save them . The trio returns to the Kingdom , where Ezekiel 's confidence in himself as a leader has diminished . 104 5 `` The Big Scary U '' Michael E. Satrazemis Story by : Scott M. Gimple & David Leslie Johnson & Angela Kang Teleplay by : David Leslie Johnson & Angela Kang November 19 , 2017 ( 2017 - 11 - 19 ) 7.85 After confessing their sins to each other , Gabriel and Negan manage to escape from the trailer . Simon and the other lieutenants grow suspicious of each other , knowing that Rick 's forces must have inside information . The workers in the Sanctuary become increasingly frustrated with their living conditions , and a riot nearly ensues , until Negan returns and restores order . Gabriel is locked in a cell , where Eugene discovers him sick and suffering . Meanwhile , Rick and Daryl argue over how to take out the Saviors , leading Daryl to abandon Rick . 105 6 `` The King , the Widow , and Rick '' John Polson Angela Kang & Corey Reed November 26 , 2017 ( 2017 - 11 - 26 ) 8.28 Rick visits Jadis in hopes of convincing her to turn against Negan ; Jadis refuses , and locks Rick in a shipping container . Carl encounters Siddiq in the woods and recruits him to Alexandria . Daryl and Tara plot to deviate from Rick 's plans by destroying the Sanctuary . Ezekiel isolates himself at the Kingdom , where Carol tries to encourage him to be the leader his people need . Maggie has the group of captured Saviors placed in a holding area and forces Gregory to join them as punishment for betraying Hilltop . 106 7 `` Time for After '' Larry Teng Matthew Negrete & Corey Reed December 3 , 2017 ( 2017 - 12 - 03 ) 7.47 After learning of Dwight 's association with Rick 's group , Eugene affirms his loyalty to Negan and outlines a plan to get rid of the walkers surrounding the Sanctuary . With help from Morgan and Tara , Daryl drives a truck through the Sanctuary 's walls , flooding its interior with walkers , killing many Saviors . Rick finally convinces Jadis and the Scavengers to align with him , and they plan to force the Saviors to surrender . However , when they arrive at the Sanctuary , Rick is horrified to see the breached walls and no sign of the walker herd . 107 8 `` How It 's Gotta Be '' Michael E. Satrazemis David Leslie Johnson & Angela Kang December 10 , 2017 ( 2017 - 12 - 10 ) 7.89 Eugene 's plan allows the Saviors to escape , and separately , the Saviors waylay the Alexandria , Hilltop , and Kingdom forces . The Scavengers abandon Rick , after which he returns to Alexandria . Ezekiel ensures that the Kingdom residents are able to escape before locking himself in the community with the Saviors . Eugene aids Gabriel and Doctor Carson in escaping the Sanctuary in order to ease his conscience . Negan attacks Alexandria , but Carl devises a plan to allow the Alexandria residents to escape into the sewers . Carl reveals he was bitten by a walker while escorting Siddiq to Alexandria . 108 9 `` Honor '' Greg Nicotero Matthew Negrete & Channing Powell February 25 , 2018 ( 2018 - 02 - 25 ) 8.28 After the Saviors leave Alexandria , the survivors make for the Hilltop while Rick and Michonne stay behind to say their final goodbyes to a dying Carl , who pleads with Rick to build a better future alongside the Saviors before killing himself . In the Kingdom , Morgan and Carol launch a rescue mission for Ezekiel . Although they are successful and retake the Kingdom , the Saviors ' lieutenant Gavin is killed by Benjamin 's vengeful brother Henry . 109 10 `` The Lost and the Plunderers '' TBA TBA March 4 , 2018 ( 2018 - 03 - 04 ) TBD 110 11 `` Dead or Alive Or '' TBA TBA March 11 , 2018 ( 2018 - 03 - 11 ) TBD 111 12 `` The Key '' TBA TBA March 18 , 2018 ( 2018 - 03 - 18 ) TBD

链定义

Langchain 已经与 Tair 集成,并为给定的文档列表执行所有索引操作。在我们的例子中,我们将存储我们拥有的一组答案。

from langchain.vectorstores import Tair
from langchain.embeddings import OpenAIEmbeddings
from langchain import VectorDBQA, OpenAI

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
doc_store = Tair.from_texts(
    texts=answers, embedding=embeddings, tair_url=TAIR_URL,
)

在这个阶段,所有可能的答案都已经存储在 Tair 中,所以我们可以定义整个 QA 链。

llm = OpenAI(openai_api_key=openai_api_key)
qa = VectorDBQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    vectorstore=doc_store,
    return_source_documents=False,
)
/root/anaconda3/envs/notebook/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py:251: UserWarning: `VectorDBQA` is deprecated - please use `from langchain.chains import RetrievalQA`
  warnings.warn(

搜索数据

一旦数据被放入 Tair,我们就可以开始提问了。问题将通过 OpenAI 模型自动向量化,创建的向量将用于在 Tair 中查找一些可能匹配的答案。检索到答案后,最相似的答案将被整合到发送给 OpenAI 大型语言模型的提示中。

import random

random.seed(52)
selected_questions = random.choices(questions, k=5)
import time
for question in selected_questions:
    print(">", question)
    print(qa.run(question), end="\n\n")
    # wait 20seconds because of the rate limit
    time.sleep(20)
> where do frankenstein and the monster first meet
 Frankenstein and the monster first meet in the mountains.

> who are the actors in fast and furious
 The actors in Fast & Furious are Vin Diesel ( Dominic Toretto ), Paul Walker ( Brian O'Conner ), Michelle Rodriguez ( Letty Ortiz ), Jordana Brewster ( Mia Toretto ), Tyrese Gibson ( Roman Pearce ), Ludacris ( Tej Parker ), Lucas Black ( Sean Boswell ), Sung Kang ( Han Lue ), Gal Gadot ( Gisele Yashar ), and Dwayne Johnson ( Luke Hobbs ).

> properties of red black tree in data structure
 The properties of a red-black tree in data structure are that each node is either red or black, the root is black, if a node is red then both its children must be black, and every path from a given node to any of its descendant NIL nodes contains the same number of black nodes.

> who designed the national coat of arms of south africa
 Iaan Bekker

> caravaggio's death of the virgin pamela askew
 I don't know.

自定义提示模板

Langchain 中的 stuff 链类型使用特定的提示,其中包含问题和上下文文档。以下是默认提示的样子:

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:

但是,我们可以提供我们自己的提示模板,并更改 OpenAI LLM 的行为,同时仍然使用 stuff 链类型。重要的是保留 {context}{question} 作为占位符。

实验自定义提示

我们可以尝试使用不同的提示模板,以便模型:

  1. 如果知道答案,则用单句回答。
  2. 如果不知道我们问题的答案,则推荐一个随机的歌曲标题。
from langchain.prompts import PromptTemplate
custom_prompt = """
Use the following pieces of context to answer the question at the end. Please provide
a short single-sentence summary answer only. If you don't know the answer or if it's
not present in given context, don't try to make up an answer, but suggest me a random
unrelated song title I could listen to.
Context: {context}
Question: {question}
Helpful Answer:
"""

custom_prompt_template = PromptTemplate(
    template=custom_prompt, input_variables=["context", "question"]
)
custom_qa = VectorDBQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    vectorstore=doc_store,
    return_source_documents=False,
    chain_type_kwargs={"prompt": custom_prompt_template},
)
random.seed(41)
for question in random.choices(questions, k=5):
    print(">", question)
    print(custom_qa.run(question), end="\n\n")
    # wait 20seconds because of the rate limit
    time.sleep(20)
> what was uncle jesse's original last name on full house
Uncle Jesse's original last name on Full House was Cochran.

> when did the volcano erupt in indonesia 2018
The given context does not mention any volcanic eruption in Indonesia in 2018. Suggested song title: "The Heat Is On" by Glenn Frey.

> what does a dualist way of thinking mean
Dualism means the belief that there is a distinction between the mind and the body, and that the mind is a non-extended, non-physical substance.

> the first civil service commission in india was set up on the basis of recommendation of
The first Civil Service Commission in India was not set up on the basis of the recommendation of the Election Commission of India's Model Code of Conduct.

> how old do you have to be to get a tattoo in utah
You must be at least 18 years old to get a tattoo in Utah.