我们传统的 TTS API 无法控制生成音频的声音。例如,如果您想将一段文本转换为音频,您将无法对音频生成给出任何具体指示。
使用音频聊天完成,您可以在生成音频之前给出具体说明。 这允许您告诉 API 以不同的速度、音调和口音说话。 通过适当的指示,这些声音可以更加动态、自然且适合上下文。
我们传统的 TTS API 无法控制生成音频的声音。例如,如果您想将一段文本转换为音频,您将无法对音频生成给出任何具体指示。
使用音频聊天完成,您可以在生成音频之前给出具体说明。 这允许您告诉 API 以不同的速度、音调和口音说话。 通过适当的指示,这些声音可以更加动态、自然且适合上下文。
传统 TTS 可以指定声音,但不能指定音调、口音或任何其他上下文音频参数。
from openai import OpenAI
client = OpenAI()
tts_text = """
Once upon a time, Leo the lion cub woke up to the smell of pancakes and scrambled eggs.
His tummy rumbled with excitement as he raced to the kitchen. Mama Lion had made a breakfast feast!
Leo gobbled up his pancakes, sipped his orange juice, and munched on some juicy berries.
"""
speech_file_path = "./sounds/default_tts.mp3"
response = client.audio.speech.create(
model="tts-1-hd",
voice="alloy",
input=tts_text,
)
response.write_to_file(speech_file_path)
使用聊天完成,您可以在生成音频之前给出具体说明。在以下示例中,我们在儿童学习环境中生成英式口音。这对于教育应用尤其有用,在教育应用中,助手的声音对于学习体验非常重要。
import base64
speech_file_path = "./sounds/chat_completions_tts.mp3"
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "mp3"},
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can generate audio from text. Speak in a British accent and enunciate like you're talking to a child.",
},
{
"role": "user",
"content": tts_text,
}
],
)
mp3_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open(speech_file_path, "wb") as f:
f.write(mp3_bytes)
speech_file_path = "./sounds/chat_completions_tts_fast.mp3"
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "mp3"},
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can generate audio from text. Speak in a British accent and speak really fast.",
},
{
"role": "user",
"content": tts_text,
}
],
)
mp3_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open(speech_file_path, "wb") as f:
f.write(mp3_bytes)
我们还可以生成不同语言口音的音频。在以下示例中,我们生成了特定乌拉圭西班牙语口音的音频。
completion = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are an expert translator. Translate any text given into Spanish like you are from Uruguay.",
},
{
"role": "user",
"content": tts_text,
}
],
)
translated_text = completion.choices[0].message.content
print(translated_text)
speech_file_path = "./sounds/chat_completions_tts_es_uy.mp3"
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "mp3"},
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can generate audio from text. Speak any text that you receive in a Uruguayan spanish accent and more slowly.",
},
{
"role": "user",
"content": translated_text,
}
],
)
mp3_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open(speech_file_path, "wb") as f:
f.write(mp3_bytes)
Había una vez un leoncito llamado Leo que se despertó con el aroma de panqueques y huevos revueltos. Su pancita gruñía de emoción mientras corría hacia la cocina. ¡Mamá León había preparado un festín de desayuno! Leo devoró sus panqueques, sorbió su jugo de naranja y mordisqueó algunas bayas jugosas.
控制生成音频声音的能力为更丰富的音频体验开辟了许多可能性。 有许多用例,例如