Prompt 缓存 101

2024 年 10 月 1 日

OpenAI 为超过 1024 个 token 的 prompt 提供折扣 prompt 缓存，对于超过 10,000 个 token 的更长 prompt，延迟最多可减少 80%。通过跨 LLM API 请求缓存重复信息，您可以大大减少延迟和成本。Prompt 缓存的作用域限定在组织级别，这意味着只有同一组织的成员才能访问共享缓存。此外，缓存符合零数据保留条件，因为在此过程中不存储任何数据。

Prompt 缓存会自动为超过 1024 个 token 的 prompt 激活——您无需在 completions 请求中更改任何内容。当发出 API 请求时，系统首先检查 prompt 的开头部分（前缀）是否已被缓存。如果找到匹配项（缓存命中），则使用缓存的 prompt，从而减少延迟和成本。如果没有匹配项，系统将从头开始处理整个 prompt，并缓存前缀以供将来使用。

考虑到这些优势，以下是一些 prompt 缓存特别有利的关键用例：

使用工具和结构化输出的 Agent：缓存扩展的工具列表和模式。
编码和写作助手：直接在 prompt 中插入代码库和工作区的大段或摘要。
聊天机器人：缓存多轮对话的静态部分，以便在扩展的对话中有效地维护上下文。

在本食谱中，我们将介绍缓存工具和图像的几个示例。请记住，一般来说，您需要将静态内容（如说明和示例）放在 prompt 的开头，并将可变内容（如用户特定信息）放在末尾。这也适用于图像和工具，即使在请求之间它们的顺序也必须完全相同。所有请求，包括少于 1024 个 token 的请求，都将显示 usage.prompt_tokens_details 聊天 completions 对象的一个 cached_tokens 字段，指示有多少 prompt token 是缓存命中。对于少于 1024 个 token 的请求，cached_tokens 将为零。缓存折扣基于实际处理的 token 数量，包括用于图像的 token，这些 token 也计入您的速率限制。

示例 1：缓存工具和多轮对话

在本示例中，我们为客户支持助手定义工具和交互，该助手能够处理诸如检查交货日期、取消订单和更新付款方式等任务。助手处理两条单独的消息，首先响应初始查询，然后延迟响应后续查询。

在缓存工具时，重要的是工具定义及其顺序保持相同，以便将其包含在 prompt 前缀中。要缓存多轮对话中的消息历史记录，请将新元素附加到消息数组的末尾。在响应对象和下面的输出中，对于第二个 completion run2，您可以看到 cached_tokens 值大于零，表明缓存成功。

from openai import OpenAI
import os
import json 
import time


api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(organization='org-l89177bnhkme4a44292n5r3j', api_key=api_key)

import time
import json

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID.",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "cancel_order",
            "description": "Cancel an order that has not yet been shipped. Use this when a customer requests order cancellation.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "reason": {
                        "type": "string",
                        "description": "The reason for cancelling the order."
                    }
                },
                "required": ["order_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "return_item",
            "description": "Process a return for an order. This should be called when a customer wants to return an item and the order has already been delivered.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "item_id": {
                        "type": "string",
                        "description": "The specific item ID the customer wants to return."
                    },
                    "reason": {
                        "type": "string",
                        "description": "The reason for returning the item."
                    }
                },
                "required": ["order_id", "item_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "update_shipping_address",
            "description": "Update the shipping address for an order that hasn't been shipped yet. Use this if the customer wants to change their delivery address.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "new_address": {
                        "type": "object",
                        "properties": {
                            "street": {
                                "type": "string",
                                "description": "The new street address."
                            },
                            "city": {
                                "type": "string",
                                "description": "The new city."
                            },
                            "state": {
                                "type": "string",
                                "description": "The new state."
                            },
                            "zip": {
                                "type": "string",
                                "description": "The new zip code."
                            },
                            "country": {
                                "type": "string",
                                "description": "The new country."
                            }
                        },
                        "required": ["street", "city", "state", "zip", "country"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "new_address"],
                "additionalProperties": False
            }
        }
    },
    # New tool: Update payment method
    {
        "type": "function",
        "function": {
            "name": "update_payment_method",
            "description": "Update the payment method for an order that hasn't been completed yet. Use this if the customer wants to change their payment details.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "payment_method": {
                        "type": "object",
                        "properties": {
                            "card_number": {
                                "type": "string",
                                "description": "The new credit card number."
                            },
                            "expiry_date": {
                                "type": "string",
                                "description": "The new credit card expiry date in MM/YY format."
                            },
                            "cvv": {
                                "type": "string",
                                "description": "The new credit card CVV code."
                            }
                        },
                        "required": ["card_number", "expiry_date", "cvv"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "payment_method"],
                "additionalProperties": False
            }
        }
    }
]

# Enhanced system message with guardrails
messages = [
    {
        "role": "system", 
        "content": (
            "You are a professional, empathetic, and efficient customer support assistant. Your mission is to provide fast, clear, "
            "and comprehensive assistance to customers while maintaining a warm and approachable tone. "
            "Always express empathy, especially when the user seems frustrated or concerned, and ensure that your language is polite and professional. "
            "Use simple and clear communication to avoid any misunderstanding, and confirm actions with the user before proceeding. "
            "In more complex or time-sensitive cases, assure the user that you're taking swift action and provide regular updates. "
            "Adapt to the user’s tone: remain calm, friendly, and understanding, even in stressful or difficult situations."
            "\n\n"
            "Additionally, there are several important guardrails that you must adhere to while assisting users:"
            "\n\n"
            "1. **Confidentiality and Data Privacy**: Do not share any sensitive information about the company or other users. When handling personal details such as order IDs, addresses, or payment methods, ensure that the information is treated with the highest confidentiality. If a user requests access to their data, only provide the necessary information relevant to their request, ensuring no other user's information is accidentally revealed."
            "\n\n"
            "2. **Secure Payment Handling**: When updating payment details or processing refunds, always ensure that payment data such as credit card numbers, CVVs, and expiration dates are transmitted and stored securely. Never display or log full credit card numbers. Confirm with the user before processing any payment changes or refunds."
            "\n\n"
            "3. **Respect Boundaries**: If a user expresses frustration or dissatisfaction, remain calm and empathetic but avoid overstepping professional boundaries. Do not make personal judgments, and refrain from using language that might escalate the situation. Stick to factual information and clear solutions to resolve the user's concerns."
            "\n\n"
            "4. **Legal Compliance**: Ensure that all actions you take comply with legal and regulatory standards. For example, if the user requests a refund, cancellation, or return, follow the company’s refund policies strictly. If the order cannot be canceled due to being shipped or another restriction, explain the policy clearly but sympathetically."
            "\n\n"
            "5. **Consistency**: Always provide consistent information that aligns with company policies. If unsure about a company policy, communicate clearly with the user, letting them know that you are verifying the information, and avoid providing false promises. If escalating an issue to another team, inform the user and provide a realistic timeline for when they can expect a resolution."
            "\n\n"
            "6. **User Empowerment**: Whenever possible, empower the user to make informed decisions. Provide them with relevant options and explain each clearly, ensuring that they understand the consequences of each choice (e.g., canceling an order may result in loss of loyalty points, etc.). Ensure that your assistance supports their autonomy."
            "\n\n"
            "7. **No Speculative Information**: Do not speculate about outcomes or provide information that you are not certain of. Always stick to verified facts when discussing order statuses, policies, or potential resolutions. If something is unclear, tell the user you will investigate further before making any commitments."
            "\n\n"
            "8. **Respectful and Inclusive Language**: Ensure that your language remains inclusive and respectful, regardless of the user’s tone. Avoid making assumptions based on limited information and be mindful of diverse user needs and backgrounds."
        )
    },
    {
        "role": "user", 
        "content": (
            "Hi, I placed an order three days ago and haven’t received any updates on when it’s going to be delivered. "
            "Could you help me check the delivery date? My order number is #9876543210. I’m a little worried because I need this item urgently."
        )
    }
]

# Enhanced user_query2
user_query2 = {
    "role": "user", 
    "content": (
        "Since my order hasn't actually shipped yet, I would like to cancel it. "
        "The order number is #9876543210, and I need to cancel because I’ve decided to purchase it locally to get it faster. "
        "Can you help me with that? Thank you!"
    )
}

# Function to run completion with the provided message history and tools
def completion_run(messages, tools):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        tools=tools,
        messages=messages,
        tool_choice="required"
    )
    usage_data = json.dumps(completion.to_dict(), indent=4)
    return usage_data

# Main function to handle the two runs
def main(messages, tools, user_query2):
    # Run 1: Initial query
    print("Run 1:")
    run1 = completion_run(messages, tools)
    print(run1)

    # Delay for 7 seconds
    time.sleep(7)

    # Append user_query2 to the message history
    messages.append(user_query2)

    # Run 2: With appended query
    print("\nRun 2:")
    run2 = completion_run(messages, tools)
    print(run2)


# Run the main function
main(messages, tools, user_query2)

Run 1:
{
    "id": "chatcmpl-ADeOueQSi2DIUMdLXnZIv9caVfnro",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_5TnLcdD9tyVMVbzNGdejlJJa",
                        "function": {
                            "arguments": "{\"order_id\":\"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816928,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 17,
        "prompt_tokens": 1079,
        "total_tokens": 1096,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

Run 2:
{
    "id": "chatcmpl-ADeP2i0frELC4W5RVNNkKz6TQ7hig",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_viwwDZPuQh8hJFPf2Co1dYJK",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    },
                    {
                        "id": "call_t1FFdAhrfvRc5IgqA6WkPKYj",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\", \"reason\": \"Decided to purchase locally to get it faster.\"}",
                            "name": "cancel_order"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816936,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 64,
        "prompt_tokens": 1136,
        "total_tokens": 1200,
        "prompt_tokens_details": {
            "cached_tokens": 1024
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

示例 2：图像

在我们的第二个示例中，我们在消息数组中包含了多个杂货商品的图像 URL，以及用户查询，延迟运行三次。图像——无论是链接的还是在用户消息中以 base64 编码的——都有资格进行缓存。确保 detail 参数保持一致，因为它会影响图像的 token 化方式。请注意，GPT-4o-mini 会添加额外的 token 以支付图像处理成本，即使它对文本使用低成本 token 模型。缓存折扣基于实际处理的 token 数量，包括用于图像的 token，这些 token 也计入您的速率限制。

此示例的输出显示，第二次运行命中了缓存，但第三次运行未命中，因为第一个 URL 不同（eggs_url 而不是 veggie_url），即使用户查询相同。

sauce_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/12-04-20-saucen-by-RalfR-15.jpg/800px-12-04-20-saucen-by-RalfR-15.jpg"
veggie_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Veggies.jpg/800px-Veggies.jpg"
eggs_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Egg_shelf.jpg/450px-Egg_shelf.jpg"
milk_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/Lactaid_brand.jpg/800px-Lactaid_brand.jpg"

def multiimage_completion(url1, url2, user_query):
    completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
        "role": "user",
        "content": [
            {
            "type": "image_url",
            "image_url": {
                "url": url1,
                "detail": "high"
            },
            },
            {
            "type": "image_url",
            "image_url": {
                "url": url2,
                "detail": "high"
            },
            },
            {"type": "text", "text": user_query}
        ],
        }
    ],
    max_tokens=300,
    )
    print(json.dumps(completion.to_dict(), indent=4))
    

def main(sauce_url, veggie_url):
    multiimage_completion(sauce_url, veggie_url, "Please list the types of sauces are shown in these images")
    #delay for 20 seconds
    time.sleep(20)
    multiimage_completion(sauce_url, veggie_url, "Please list the types of vegetables are shown in these images")
    time.sleep(20)
    multiimage_completion(milk_url, sauce_url, "Please list the types of sauces are shown in these images")

if __name__ == "__main__":
    main(sauce_url, veggie_url)

{
    "id": "chatcmpl-ADeV3IrUqhpjMXEgv29BFHtTQ0Pzt",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show the following types of sauces:\n\n1. **Soy Sauce** - Kikkoman brand.\n2. **Worcester Sauce** - Appel brand, listed as \"Dresdner Art.\"\n3. **Tabasco Sauce** - Original pepper sauce.\n\nThe second image shows various vegetables, not sauces.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817309,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 65,
        "prompt_tokens": 1548,
        "total_tokens": 1613,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVRSI6zFINkx99k7V6ux1v5iF5f",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show different types of items. In the first image, you'll see bottles of sauces like soy sauce, Worcester sauce, and Tabasco. The second image features various vegetables, including:\n\n1. Napa cabbage\n2. Kale\n3. Carrots\n4. Bok choy\n5. Swiss chard\n6. Leeks\n7. Parsley\n\nThese vegetables are arranged on shelves in a grocery store setting.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817333,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 86,
        "prompt_tokens": 1548,
        "total_tokens": 1634,
        "prompt_tokens_details": {
            "cached_tokens": 1280
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVphj3VALQVrdnt2efysvSmdnBx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The second image shows three types of sauces:\n\n1. Soy Sauce (Kikkoman)\n2. Worcestershire Sauce\n3. Tabasco Sauce",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817357,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 29,
        "prompt_tokens": 1548,
        "total_tokens": 1577,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

总体提示

为了充分利用 prompt 缓存，请考虑遵循以下最佳实践：

将静态或经常重用的内容放在 prompt 的开头：这有助于通过将动态数据保留在 prompt 的末尾来确保更好的缓存效率。
保持一致的使用模式：不经常使用的 prompt 会自动从缓存中删除。为了防止缓存驱逐，请保持 prompt 的一致使用。
监控关键指标：定期跟踪缓存命中率、延迟和缓存 token 的比例。使用这些见解来微调您的缓存策略并最大化性能。

通过实施这些实践，您可以充分利用 prompt 缓存，确保您的应用程序既响应迅速又具有成本效益。良好管理的缓存策略将显着减少处理时间、降低成本，并有助于保持流畅的用户体验。