本笔记本是为以下场景准备的:
- 您的数据已在 Weaviate 中
- 您想要将 Weaviate 与 Generative OpenAI 模块(generative-openai)结合使用。
本笔记本是为以下场景准备的:
本 cookbook 仅涵盖生成式搜索示例,但不包括配置和数据导入。
为了充分利用本 cookbook,请先完成入门 cookbook,您将在其中学习使用 Weaviate 的基本知识并导入演示数据。
检查清单
Weaviate
实例,Weaviate
实例中,===========================================================
OpenAI API 密钥
用于在导入时对您的数据进行向量化,以及用于运行查询。
如果您没有 OpenAI API 密钥,您可以从 https://beta.openai.com/account/api-keys 获取一个。
获取密钥后,请将其作为 OPENAI_API_KEY
添加到您的环境变量中。
# Export OpenAI API Key
!export OPENAI_API_KEY="your key"
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'
if os.getenv("OPENAI_API_KEY") is not None:
print ("OPENAI_API_KEY is ready")
else:
print ("OPENAI_API_KEY environment variable not found")
在本节中,我们将
OPENAI_API_KEY
– 确保您已完成 #Prepare-your-OpenAI-API-key 中的步骤OpenAI API 密钥
连接到您的 Weaviate完成此步骤后,client
对象将用于执行所有与 Weaviate 相关的操作。
import weaviate
from datasets import load_dataset
import os
# Connect to your Weaviate instance
client = weaviate.Client(
url="https://your-wcs-instance-name.weaviate.network/",
# url="https://127.0.0.1:8080/",
auth_client_secret=weaviate.auth.AuthApiKey(api_key="<YOUR-WEAVIATE-API-KEY>"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)
additional_headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
}
)
# Check if your instance is live and ready
# This should return `True`
client.is_ready()
Weaviate 提供了一个 生成式搜索 OpenAI 模块,该模块根据存储在您的 Weaviate 实例中的数据生成响应。
您构建生成式搜索查询的方式与 Weaviate 中的标准语义搜索查询非常相似。
例如
result = (
client.query
.get("Articles", ["title", "content", "url"])
.with_near_text("concepts": "football clubs")
.with_limit(5)
# generative query will go here
.do()
)
现在,您可以添加 with_generate()
函数来应用生成式转换。 with_generate
接受以下任一参数:
single_prompt
- 为每个返回的对象生成一个响应,grouped_task
– 从所有返回的对象生成一个单一响应。def generative_search_per_item(query, collection_name):
prompt = "Summarize in a short tweet the following content: {content}"
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_limit(5)
.with_generate(single_prompt=prompt)
.do()
)
# Check for errors
if ("errors" in result):
print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
raise Exception(result["errors"][0]['message'])
return result["data"]["Get"][collection_name]
query_result = generative_search_per_item("football clubs", "Article")
for i, article in enumerate(query_result):
print(f"{i+1}. { article['title']}")
print(article['_additional']['generate']['singleResult']) # print generated response
print("-----------------------")
def generative_search_group(query, collection_name):
generateTask = "Explain what these have in common"
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_generate(grouped_task=generateTask)
.with_limit(5)
.do()
)
# Check for errors
if ("errors" in result):
print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
raise Exception(result["errors"][0]['message'])
return result["data"]["Get"][collection_name]
query_result = generative_search_group("football clubs", "Article")
print (query_result[0]['_additional']['generate']['groupedResult'])
感谢您的关注,您现在已经准备好设置自己的向量数据库并使用嵌入来完成各种很酷的事情 - 尽情享受吧! 对于更复杂的用例,请继续学习本仓库中的其他 cookbook 示例。