如何格式化 ChatGPT 模型的输入

2023 年 3 月 1 日

ChatGPT 由 gpt-3.5-turbo 和 gpt-4 提供支持，它们是 OpenAI 最先进的模型。

您可以使用 OpenAI API，通过 gpt-3.5-turbo 或 gpt-4 构建您自己的应用程序。

聊天模型将一系列消息作为输入，并返回 AI 撰写的消息作为输出。

本指南通过几个示例 API 调用来说明聊天格式。

1. 导入 openai 库

# if needed, install and/or upgrade to the latest version of the OpenAI Python library
%pip install --upgrade openai

# import the OpenAI Python library for calling the OpenAI API
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

2. 聊天完成 API 调用的示例

聊天完成 API 调用参数，必需

model：您要使用的模型的名称（例如，gpt-3.5-turbo、gpt-4、gpt-3.5-turbo-16k-1106）
messages：消息对象列表，其中每个对象都有两个必需字段
- role：消息传递者的角色（system、user、assistant 或 tool 之一）
- content：消息的内容（例如，Write me a beautiful poem（为我写一首美丽的诗））

消息还可以包含可选的 name 字段，为消息传递者命名。例如，example-user、Alice、BlackbeardBot。名称不得包含空格。

可选

frequency_penalty：根据令牌的频率惩罚令牌，减少重复。
logit_bias：使用偏差值修改指定令牌的可能性。
logprobs：如果为 true，则返回输出令牌的对数概率。
top_logprobs：指定在每个位置返回的最有可能的令牌的数量。
max_tokens：设置聊天完成中生成的最大令牌数。
n：为每个输入生成指定数量的聊天完成选项。
presence_penalty：根据新令牌在文本中的存在来惩罚新令牌。
response_format：指定输出格式，例如，JSON 模式。
seed：使用指定的种子确保确定性采样。
stop：指定最多 4 个序列，API 应在这些序列处停止生成令牌。
stream：在令牌可用时发送部分消息增量。
temperature：设置 0 到 2 之间的采样温度。
top_p：使用核采样；考虑具有 top_p 概率质量的令牌。
tools：列出模型可能调用的函数。
tool_choice：控制模型的功能调用（none/auto/function）。
user：用于最终用户监控和滥用检测的唯一标识符。

截至 2024 年 1 月，您还可以选择提交 functions 列表，告知 GPT 它是否可以生成 JSON 以馈送到函数中。有关详细信息，请参阅文档、API 参考或 Cookbook 指南如何使用聊天模型调用函数。

通常，对话将从系统消息开始，该消息告诉助手如何表现，然后是交替的用户和助手消息，但您无需遵循此格式。

让我们看一个示例聊天 API 调用，以了解聊天格式在实践中是如何工作的。

# Example OpenAI Python library request
MODEL = "gpt-3.5-turbo"
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
    temperature=0,
)

print(json.dumps(json.loads(response.model_dump_json()), indent=4))

{
    "id": "chatcmpl-8dee9DuEFcg2QILtT2a6EBXZnpirM",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Orange who?",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1704461729,
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 3,
        "prompt_tokens": 35,
        "total_tokens": 38
    }
}

如您所见，响应对象具有几个字段

id：请求的 ID
choices：完成对象列表（只有一个，除非您将 n 设置为大于 1）
- finish_reason：模型停止生成文本的原因（stop，或如果达到 max_tokens 限制，则为 length）
- index：选择列表中选择的索引。
- logprobs：选择的对数概率信息。
- message：模型生成的消息对象
  - content：消息内容
  - role：此消息作者的角色。
  - tool_calls：模型生成的工具调用，例如函数调用。如果提供了工具
created：请求的时间戳
model：用于生成响应的模型的完整名称
object：返回的对象类型（例如，chat.completion）
system_fingerprint：此指纹表示模型运行的后端配置。
usage：用于生成回复的令牌数量，包括提示、完成和总计

仅提取回复，使用

response.choices[0].message.content

'Orange who?'

即使是非基于对话的任务也可以通过将指令放在第一条用户消息中来适应聊天格式。

例如，要让模型以海盗黑胡子的风格解释异步编程，我们可以将对话构建如下

# example with a system message
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Arr, me matey! Let me tell ye a tale of asynchronous programming, in the style of the fearsome pirate Blackbeard!

Picture this, me hearties. In the vast ocean of programming, there be times when ye need to perform multiple tasks at once. But fear not, for asynchronous programming be here to save the day!

Ye see, in traditional programming, ye be waitin' for one task to be done before movin' on to the next. But with asynchronous programming, ye can be takin' care of multiple tasks at the same time, just like a pirate multitaskin' on the high seas!

Instead of waitin' for a task to be completed, ye can be sendin' it off on its own journey, while ye move on to the next task. It be like havin' a crew of trusty sailors, each takin' care of their own duties, without waitin' for the others.

Now, ye may be wonderin', how does this sorcery work? Well, me matey, it be all about callbacks and promises. When ye be sendin' off a task, ye be attachin' a callback function to it. This be like leavin' a message in a bottle, tellin' the task what to do when it be finished.

While the task be sailin' on its own, ye can be movin' on to the next task, without wastin' any precious time. And when the first task be done, it be sendin' a signal back to ye, lettin' ye know it be finished. Then ye can be takin' care of the callback function, like openin' the bottle and readin' the message inside.

But wait, there be more! With promises, ye can be makin' even fancier arrangements. Instead of callbacks, ye be makin' a promise that the task will be completed. It be like a contract between ye and the task, swearin' that it will be done.

Ye can be attachin' multiple promises to a task, promisin' different outcomes. And when the task be finished, it be fulfillin' the promises, lettin' ye know it be done. Then ye can be handlin' the fulfillments, like collectin' the rewards of yer pirate adventures!

So, me hearties, that be the tale of asynchronous programming, told in the style of the fearsome pirate Blackbeard! With callbacks and promises, ye can be takin' care of multiple tasks at once, just like a pirate conquerin' the seven seas!

# example without a system message
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Arr, me hearties! Gather 'round and listen up, for I be tellin' ye about the mysterious art of asynchronous programming, in the style of the fearsome pirate Blackbeard!

Now, ye see, in the world of programming, there be times when we need to perform tasks that take a mighty long time to complete. These tasks might involve fetchin' data from the depths of the internet, or performin' complex calculations that would make even Davy Jones scratch his head.

In the olden days, we pirates used to wait patiently for each task to finish afore movin' on to the next one. But that be a waste of precious time, me hearties! We be pirates, always lookin' for ways to be more efficient and plunder more booty!

That be where asynchronous programming comes in, me mateys. It be a way to tackle multiple tasks at once, without waitin' for each one to finish afore movin' on. It be like havin' a crew of scallywags workin' on different tasks simultaneously, while ye be overseein' the whole operation.

Ye see, in asynchronous programming, we be breakin' down our tasks into smaller chunks called "coroutines." Each coroutine be like a separate pirate, workin' on its own task. When a coroutine be startin' its work, it don't wait for the task to finish afore movin' on to the next one. Instead, it be movin' on to the next task, lettin' the first one continue in the background.

Now, ye might be wonderin', "But Blackbeard, how be we know when a task be finished if we don't wait for it?" Ah, me hearties, that be where the magic of callbacks and promises come in!

When a coroutine be startin' its work, it be attachin' a callback or a promise to it. This be like leavin' a message in a bottle, tellin' the coroutine what to do when it be finished. So, while the coroutine be workin' away, the rest of the crew be movin' on to other tasks, plunderin' more booty along the way.

When a coroutine be finished with its task, it be sendin' a signal to the callback or fulfillin' the promise, lettin' the rest of the crew know that it be done. Then, the crew can gather 'round and handle the results of the completed task, celebratin' their victory and countin' their plunder.

So, me hearties, asynchronous programming be like havin' a crew of pirates workin' on different tasks at once, without waitin' for each one to finish afore movin' on. It be a way to be more efficient, plunder more booty, and conquer the vast seas of programming!

Now, set sail, me mateys, and embrace the power of asynchronous programming like true pirates of the digital realm! Arr!

3. 指导 gpt-3.5-turbo-0301 的技巧

指导模型的最佳实践可能会因模型版本而异。以下建议适用于 gpt-3.5-turbo-0301，可能不适用于未来的模型。

系统消息

系统消息可用于为助手预设不同的性格或行为。

请注意，与 gpt-4-0314 或 gpt-3.5-turbo-0613 相比，gpt-3.5-turbo-0301 通常不太关注系统消息。因此，对于 gpt-3.5-turbo-0301，我们建议将重要指令放在用户消息中。一些开发人员发现在对话变长时，不断将系统消息移到对话末尾附近，以保持模型的注意力不分散，这样做是成功的。

# An example of a system message that primes the assistant to explain concepts in great depth
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a friendly and helpful teaching assistant. You explain concepts in great depth using simple terms, and you give examples to help people learn. At the end of each explanation, you ask a question to check for understanding"},
        {"role": "user", "content": "Can you explain how fractions work?"},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Of course! Fractions are a way to represent parts of a whole. They are made up of two numbers: a numerator and a denominator. The numerator tells you how many parts you have, and the denominator tells you how many equal parts make up the whole.

Let's take an example to understand this better. Imagine you have a pizza that is divided into 8 equal slices. If you eat 3 slices, you can represent that as the fraction 3/8. Here, the numerator is 3 because you ate 3 slices, and the denominator is 8 because the whole pizza is divided into 8 slices.

Fractions can also be used to represent numbers less than 1. For example, if you eat half of a pizza, you can write it as 1/2. Here, the numerator is 1 because you ate one slice, and the denominator is 2 because the whole pizza is divided into 2 equal parts.

Now, let's talk about equivalent fractions. Equivalent fractions are different fractions that represent the same amount. For example, 1/2 and 2/4 are equivalent fractions because they both represent half of something. To find equivalent fractions, you can multiply or divide both the numerator and denominator by the same number.

Here's a question to check your understanding: If you have a cake divided into 12 equal slices and you eat 4 slices, what fraction of the cake did you eat?

# An example of a system message that primes the assistant to give brief, to-the-point answers
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a laconic assistant. You reply with brief, to-the-point answers with no elaboration."},
        {"role": "user", "content": "Can you explain how fractions work?"},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Fractions represent parts of a whole. They have a numerator (top number) and a denominator (bottom number).

小样本提示

在某些情况下，向模型展示您想要什么比告诉模型您想要什么更容易。

向模型展示您想要什么的一种方法是使用伪造的示例消息。

例如

# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

This sudden change in direction means we don't have enough time to complete the entire project for the client.

为了帮助澄清示例消息不是真实对话的一部分，并且不应被模型引用，您可以尝试将 system 消息的 name 字段设置为 example_user 和 example_assistant。

转换上面的小样本示例，我们可以写成

# The business jargon translation example, but with example names for the example messages
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English."},
        {"role": "system", "name":"example_user", "content": "New synergies will help drive top-line growth."},
        {"role": "system", "name": "example_assistant", "content": "Things working well together will increase revenue."},
        {"role": "system", "name":"example_user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

This sudden change in direction means we don't have enough time to complete the entire project for the client.

并非每次尝试工程对话都会在第一次就成功。

如果您的第一次尝试失败，请不要害怕尝试不同的方式来预设或调节模型。

例如，一位开发人员发现，当他们插入一条用户消息说“到目前为止做得很好，这些都很完美”以帮助调节模型提供更高质量的响应时，准确性有所提高。

有关如何提高模型可靠性的更多想法，请考虑阅读我们的关于提高可靠性的技术指南。它是为非聊天模型编写的，但它的许多原则仍然适用。

4. 计数令牌

当您提交请求时，API 会将消息转换为令牌序列。

使用的令牌数量会影响

请求的成本
生成响应所需的时间
当回复因达到最大令牌限制（gpt-3.5-turbo 为 4,096，gpt-4 为 8,192）而被截断时

您可以使用以下函数来计算消息列表将使用的令牌数。

请注意，从消息中计数令牌的确切方式可能因模型而异。将下面函数中的计数视为估计值，而不是永恒的保证。

特别是，使用可选函数输入的请求将在下面计算的估计值之上消耗额外的令牌。

阅读如何使用 tiktoken 计数令牌，了解有关计数令牌的更多信息。

import tiktoken


def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

# let's verify the function above matches the OpenAI API response
example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    # "gpt-3.5-turbo-0301",
    # "gpt-4-0314",
    # "gpt-4-0613",
    "gpt-3.5-turbo-1106",
    "gpt-3.5-turbo",
    "gpt-4",
    "gpt-4-1106-preview",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages,
    temperature=0,
    max_tokens=1)
    token = response.usage.prompt_tokens
    print(f'{token} prompt tokens counted by the OpenAI API.')
    print()

gpt-3.5-turbo-1106
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-1106-preview
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.