如何流式传输补全

2022年9月2日
在 Github 中打开

默认情况下,当您从 OpenAI 请求补全时,整个补全内容会在发送回单个响应之前生成。

如果您正在生成较长的补全,等待响应可能需要几秒钟。

为了更快地获得响应,您可以“流式传输”补全内容,因为它正在生成。这允许您在完整补全完成之前开始打印或处理补全的开头部分。

要流式传输补全,请在调用聊天补全或补全端点时设置 stream=True。这将返回一个对象,该对象以 仅数据服务器发送事件 的形式流式返回响应。从 delta 字段而不是 message 字段中提取块。

缺点

请注意,在生产应用程序中使用 stream=True 会使审核补全内容变得更加困难,因为部分补全可能更难评估。 这可能对批准的使用场景产生影响。

示例代码

下面,此 notebook 展示了

  1. 典型的聊天补全响应是什么样的
  2. 流式聊天补全响应是什么样的
  3. 通过流式传输聊天补全节省了多少时间
  4. 如何获取流式聊天补全响应的 token 使用数据
# !pip install openai
# imports
import time  # for measuring time duration of API calls
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/text-generation/chat-completions-api

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
)
# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full response received:\n{response}")
Full response received 1.88 seconds after request
Full response received:
ChatCompletion(id='chatcmpl-9lMgdoiMfxVHPDNVCtvXuTWcQ2GGb', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100', role='assistant', function_call=None, tool_calls=None))], created=1721075651, model='gpt-july-test', object='chat.completion', system_fingerprint='fp_e9b8ed65d2', usage=CompletionUsage(completion_tokens=298, prompt_tokens=36, total_tokens=334))

可以使用 response.choices[0].message 提取回复。

可以使用 response.choices[0].message.content 提取回复的内容。

reply = response.choices[0].message
print(f"Extracted reply: \n{reply}")

reply_content = response.choices[0].message.content
print(f"Extracted content: \n{reply_content}")
Extracted reply: 
ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100', role='assistant', function_call=None, tool_calls=None)
Extracted content: 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100

2. 如何流式传输聊天补全

对于流式 API 调用,响应会通过 事件流 以增量方式分块发送回来。在 Python 中,您可以使用 for 循环迭代这些事件。

让我们看看它是什么样的

# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# a ChatCompletion request
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True  # this time, we set stream=True
)

for chunk in response:
    print(chunk)
    print(chunk.choices[0].delta.content)
    print("****************")
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)

****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='Two', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
Two
****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
.
****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
None
****************

如您在上面看到的,流式响应具有 delta 字段而不是 message 字段。 delta 可以包含诸如以下内容:

  • 角色 token(例如,{"role": "assistant"}
  • 内容 token(例如,{"content": "\n\n"}
  • 无内容 (例如,{}),当流结束时
# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
    stream=True  # again, we set stream=True
)
# create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for chunk in response:
    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
    collected_chunks.append(chunk)  # save the event response
    chunk_message = chunk.choices[0].delta.content  # extract the message
    collected_messages.append(chunk_message)  # save the message
    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text

# print the time delay and text received
print(f"Full response received {chunk_time:.2f} seconds after request")
# clean None in collected_messages
collected_messages = [m for m in collected_messages if m is not None]
full_reply_content = ''.join(collected_messages)
print(f"Full conversation received: {full_reply_content}")
Message received 1.14 seconds after request: 
Message received 1.14 seconds after request: 1
Message received 1.14 seconds after request: ,
Message received 1.14 seconds after request:  
Message received 1.14 seconds after request: 2
Message received 1.16 seconds after request: ,
Message received 1.16 seconds after request:  
Message received 1.16 seconds after request: 3
Message received 1.35 seconds after request: ,
Message received 1.35 seconds after request:  
Message received 1.35 seconds after request: 4
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 5
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 6
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 7
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 8
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 9
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 10
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 11
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 12
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.45 seconds after request: 13
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.45 seconds after request: 14
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.45 seconds after request: 15
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.46 seconds after request: 16
Message received 1.46 seconds after request: ,
Message received 1.46 seconds after request:  
Message received 1.47 seconds after request: 17
Message received 1.47 seconds after request: ,
Message received 1.47 seconds after request:  
Message received 1.49 seconds after request: 18
Message received 1.49 seconds after request: ,
Message received 1.49 seconds after request:  
Message received 1.52 seconds after request: 19
Message received 1.52 seconds after request: ,
Message received 1.52 seconds after request:  
Message received 1.53 seconds after request: 20
Message received 1.53 seconds after request: ,
Message received 1.53 seconds after request:  
Message received 1.55 seconds after request: 21
Message received 1.55 seconds after request: ,
Message received 1.55 seconds after request:  
Message received 1.56 seconds after request: 22
Message received 1.56 seconds after request: ,
Message received 1.56 seconds after request:  
Message received 1.58 seconds after request: 23
Message received 1.58 seconds after request: ,
Message received 1.58 seconds after request:  
Message received 1.59 seconds after request: 24
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:  
Message received 1.62 seconds after request: 25
Message received 1.62 seconds after request: ,
Message received 1.62 seconds after request:  
Message received 1.62 seconds after request: 26
Message received 1.62 seconds after request: ,
Message received 1.62 seconds after request:  
Message received 1.65 seconds after request: 27
Message received 1.65 seconds after request: ,
Message received 1.65 seconds after request:  
Message received 1.67 seconds after request: 28
Message received 1.67 seconds after request: ,
Message received 1.67 seconds after request:  
Message received 1.69 seconds after request: 29
Message received 1.69 seconds after request: ,
Message received 1.69 seconds after request:  
Message received 1.80 seconds after request: 30
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 31
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 32
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 33
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 34
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 35
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 36
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.82 seconds after request: 37
Message received 1.82 seconds after request: ,
Message received 1.82 seconds after request:  
Message received 1.83 seconds after request: 38
Message received 1.83 seconds after request: ,
Message received 1.83 seconds after request:  
Message received 1.84 seconds after request: 39
Message received 1.84 seconds after request: ,
Message received 1.84 seconds after request:  
Message received 1.87 seconds after request: 40
Message received 1.87 seconds after request: ,
Message received 1.87 seconds after request:  
Message received 1.88 seconds after request: 41
Message received 1.88 seconds after request: ,
Message received 1.88 seconds after request:  
Message received 1.91 seconds after request: 42
Message received 1.91 seconds after request: ,
Message received 1.91 seconds after request:  
Message received 1.93 seconds after request: 43
Message received 1.93 seconds after request: ,
Message received 1.93 seconds after request:  
Message received 1.93 seconds after request: 44
Message received 1.93 seconds after request: ,
Message received 1.93 seconds after request:  
Message received 1.95 seconds after request: 45
Message received 1.95 seconds after request: ,
Message received 1.95 seconds after request:  
Message received 2.00 seconds after request: 46
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 47
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 48
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 49
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 50
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 51
Message received 2.00 seconds after request: ,
Message received 2.04 seconds after request:  
Message received 2.04 seconds after request: 52
Message received 2.04 seconds after request: ,
Message received 2.04 seconds after request:  
Message received 2.04 seconds after request: 53
Message received 2.04 seconds after request: ,
Message received 2.13 seconds after request:  
Message received 2.13 seconds after request: 54
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.14 seconds after request: 55
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.14 seconds after request: 56
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.16 seconds after request: 57
Message received 2.16 seconds after request: ,
Message received 2.16 seconds after request:  
Message received 2.17 seconds after request: 58
Message received 2.17 seconds after request: ,
Message received 2.17 seconds after request:  
Message received 2.19 seconds after request: 59
Message received 2.19 seconds after request: ,
Message received 2.19 seconds after request:  
Message received 2.21 seconds after request: 60
Message received 2.21 seconds after request: ,
Message received 2.21 seconds after request:  
Message received 2.34 seconds after request: 61
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 62
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 63
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 64
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 65
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 66
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 67
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.36 seconds after request: 68
Message received 2.36 seconds after request: ,
Message received 2.36 seconds after request:  
Message received 2.36 seconds after request: 69
Message received 2.36 seconds after request: ,
Message received 2.36 seconds after request:  
Message received 2.38 seconds after request: 70
Message received 2.38 seconds after request: ,
Message received 2.38 seconds after request:  
Message received 2.39 seconds after request: 71
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 72
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 73
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 74
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 75
Message received 2.39 seconds after request: ,
Message received 2.40 seconds after request:  
Message received 2.40 seconds after request: 76
Message received 2.40 seconds after request: ,
Message received 2.42 seconds after request:  
Message received 2.42 seconds after request: 77
Message received 2.42 seconds after request: ,
Message received 2.51 seconds after request:  
Message received 2.51 seconds after request: 78
Message received 2.51 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 79
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 80
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 81
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 82
Message received 2.52 seconds after request: ,
Message received 2.60 seconds after request:  
Message received 2.60 seconds after request: 83
Message received 2.60 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.64 seconds after request: 84
Message received 2.64 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.64 seconds after request: 85
Message received 2.64 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.66 seconds after request: 86
Message received 2.66 seconds after request: ,
Message received 2.66 seconds after request:  
Message received 2.66 seconds after request: 87
Message received 2.66 seconds after request: ,
Message received 2.66 seconds after request:  
Message received 2.68 seconds after request: 88
Message received 2.68 seconds after request: ,
Message received 2.68 seconds after request:  
Message received 2.69 seconds after request: 89
Message received 2.69 seconds after request: ,
Message received 2.69 seconds after request:  
Message received 2.72 seconds after request: 90
Message received 2.72 seconds after request: ,
Message received 2.72 seconds after request:  
Message received 2.82 seconds after request: 91
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 92
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 93
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 94
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 95
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 96
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 97
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 98
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 99
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 100
Message received 2.82 seconds after request: None
Full response received 2.82 seconds after request
Full conversation received: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100

时间对比

在上面的示例中,两个请求都花费了大约 4 到 5 秒才能完全完成。请求时间会因负载和其他随机因素而异。

但是,对于流式请求,我们在 0.1 秒后收到了第一个 token,随后每 ~0.01-0.02 秒收到 token。

4. 如何获取流式聊天补全响应的 token 使用数据

您可以通过设置 stream_options={"include_usage": True} 来获取流式响应的 token 使用统计信息。 这样做时,将流式传输一个额外的块作为最终块。 您可以通过此块上的 usage 字段访问整个请求的使用数据。 当您设置 stream_options={"include_usage": True} 时,需要注意以下几点:

  • 除最后一个块之外的所有块上的 usage 字段的值都将为 null。
  • 最后一个块上的 usage 字段包含整个请求的 token 使用统计信息。
  • 最后一个块上的 choices 字段将始终是一个空数组 []

让我们看看它如何使用第 2 节中的示例。

# Example of an OpenAI ChatCompletion request with stream=True and stream_options={"include_usage": True}

# a ChatCompletion request
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True,
    stream_options={"include_usage": True}, # retrieving token usage for stream response
)

for chunk in response:
    print(f"choices: {chunk.choices}\nusage: {chunk.usage}")
    print("****************")
choices: [Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='Two', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)]
usage: None
****************
choices: []
usage: CompletionUsage(completion_tokens=2, prompt_tokens=18, total_tokens=20)
****************