将 Weaviate 与生成式 OpenAI 模块结合使用以进行生成式搜索

2023年5月22日
在 Github 中打开

本笔记本是为以下场景准备的:

  • 您的数据已在 Weaviate 中
  • 您想要将 Weaviate 与 Generative OpenAI 模块(generative-openai)结合使用。

先决条件

本 cookbook 仅涵盖生成式搜索示例,但不包括配置和数据导入。

为了充分利用本 cookbook,请先完成入门 cookbook,您将在其中学习使用 Weaviate 的基本知识并导入演示数据。

检查清单

===========================================================

准备您的 OpenAI API 密钥

OpenAI API 密钥用于在导入时对您的数据进行向量化,以及用于运行查询。

如果您没有 OpenAI API 密钥,您可以从 https://beta.openai.com/account/api-keys 获取一个。

获取密钥后,请将其作为 OPENAI_API_KEY 添加到您的环境变量中。

# Export OpenAI API Key
!export OPENAI_API_KEY="your key"
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.
import os

# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'

if os.getenv("OPENAI_API_KEY") is not None:
    print ("OPENAI_API_KEY is ready")
else:
    print ("OPENAI_API_KEY environment variable not found")
import weaviate
from datasets import load_dataset
import os

# Connect to your Weaviate instance
client = weaviate.Client(
    url="https://your-wcs-instance-name.weaviate.network/",
    # url="https://127.0.0.1:8080/",
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="<YOUR-WEAVIATE-API-KEY>"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)
    additional_headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)

# Check if your instance is live and ready
# This should return `True`
client.is_ready()

Weaviate 提供了一个 生成式搜索 OpenAI 模块,该模块根据存储在您的 Weaviate 实例中的数据生成响应。

您构建生成式搜索查询的方式与 Weaviate 中的标准语义搜索查询非常相似。

例如

  • 在“Articles”中搜索,
  • 返回“title”、“content”、“url”
  • 查找与“足球俱乐部”相关的对象
  • 将结果限制为 5 个对象
    result = (
        client.query
        .get("Articles", ["title", "content", "url"])
        .with_near_text("concepts": "football clubs")
        .with_limit(5)
        # generative query will go here
        .do()
    )

现在,您可以添加 with_generate() 函数来应用生成式转换。 with_generate 接受以下任一参数:

  • single_prompt - 为每个返回的对象生成一个响应,
  • grouped_task – 从所有返回的对象生成一个单一响应。
def generative_search_per_item(query, collection_name):
    prompt = "Summarize in a short tweet the following content: {content}"

    result = (
        client.query
        .get(collection_name, ["title", "content", "url"])
        .with_near_text({ "concepts": [query], "distance": 0.7 })
        .with_limit(5)
        .with_generate(single_prompt=prompt)
        .do()
    )
    
    # Check for errors
    if ("errors" in result):
        print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
        raise Exception(result["errors"][0]['message'])
    
    return result["data"]["Get"][collection_name]
query_result = generative_search_per_item("football clubs", "Article")

for i, article in enumerate(query_result):
    print(f"{i+1}. { article['title']}")
    print(article['_additional']['generate']['singleResult']) # print generated response
    print("-----------------------")
def generative_search_group(query, collection_name):
    generateTask = "Explain what these have in common"

    result = (
        client.query
        .get(collection_name, ["title", "content", "url"])
        .with_near_text({ "concepts": [query], "distance": 0.7 })
        .with_generate(grouped_task=generateTask)
        .with_limit(5)
        .do()
    )
    
    # Check for errors
    if ("errors" in result):
        print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
        raise Exception(result["errors"][0]['message'])
    
    return result["data"]["Get"][collection_name]
query_result = generative_search_group("football clubs", "Article")

print (query_result[0]['_additional']['generate']['groupedResult'])

感谢您的关注,您现在已经准备好设置自己的向量数据库并使用嵌入来完成各种很酷的事情 - 尽情享受吧! 对于更复杂的用例,请继续学习本仓库中的其他 cookbook 示例。