Whisper 提示指南

2023年6月27日

在 Github 中打开

OpenAI 的音频转录 API 有一个可选参数，称为 prompt。

prompt 旨在帮助拼接多个音频片段。通过提交先前片段的转录文本作为 prompt，Whisper 模型可以使用该上下文来更好地理解语音并保持一致的写作风格。

但是，prompt 不需要是来自先前音频片段的真实转录文本。可以提交虚构的 prompt 来引导模型使用特定的拼写或风格。

本笔记本分享了两种使用虚构 prompt 来引导模型输出的技术

转录生成：GPT 可以将指令转换为虚构的转录文本，供 Whisper 模仿。
拼写指南：拼写指南可以告诉模型如何拼写人名、产品名、公司名等。

这些技术并非特别可靠，但在某些情况下可能有用。

与 GPT prompt 的比较

prompt Whisper 与 prompt GPT 不同。例如，如果您提交类似“以 Markdown 格式格式化列表”的指令，模型将不会遵守，因为它遵循 prompt 的风格，而不是其中包含的任何指令。

此外，prompt 仅限于 224 个 tokens。如果 prompt 长度超过 224 个 tokens，则只会考虑 prompt 的最后 224 个 tokens；所有之前的 tokens 将被静默忽略。使用的分词器是多语言 Whisper 分词器。

为了获得良好的结果，请制作能够体现您期望风格的示例。

设置

为了开始，让我们

导入 OpenAI Python 库（如果您没有安装，则需要使用 pip install openai 安装它）
下载一些示例音频文件

# imports
from openai import OpenAI  # for making OpenAI API calls
import urllib  # for downloading example audio files
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

# set download paths
up_first_remote_filepath = "https://cdn.openai.com/API/examples/data/upfirstpodcastchunkthree.wav"
bbq_plans_remote_filepath = "https://cdn.openai.com/API/examples/data/bbq_plans.wav"
product_names_remote_filepath = "https://cdn.openai.com/API/examples/data/product_names.wav"

# set local save locations
up_first_filepath = "data/upfirstpodcastchunkthree.wav"
bbq_plans_filepath = "data/bbq_plans.wav"
product_names_filepath = "data/product_names.wav"

# download example audio files and save locally
urllib.request.urlretrieve(up_first_remote_filepath, up_first_filepath)
urllib.request.urlretrieve(bbq_plans_remote_filepath, bbq_plans_filepath)
urllib.request.urlretrieve(product_names_remote_filepath, product_names_filepath)

('data/product_names.wav', <http.client.HTTPMessage at 0x11275fb10>)

作为基线，我们将转录一段 NPR 播客片段

本示例的音频文件将是 NPR 播客 Up First 的一个片段。

让我们获取基线转录文本，然后引入 prompt。

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(audio_filepath, prompt: str) -> str:
    """Given a prompt, transcribe the audio file."""
    transcript = client.audio.transcriptions.create(
        file=open(audio_filepath, "rb"),
        model="whisper-1",
        prompt=prompt,
    )
    return transcript.text

# baseline transcription with no prompt
transcribe(up_first_filepath, prompt="")

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

转录文本遵循 prompt 的风格

让我们探索 prompt 如何影响转录文本的风格。在之前的无 prompt 转录文本中，“President Biden”的首字母是大写的。

让我们尝试使用 prompt 将 “president biden” 写成小写。我们可以从小写的 'president biden' prompt 开始，看看是否可以让 Whisper 匹配该风格并生成全小写的转录文本。

# short prompts are less reliable
transcribe(up_first_filepath, prompt="president biden.")

"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

请注意，当 prompt 很短时，Whisper 在遵循其风格方面可能不太可靠。较长的 prompt 在引导 Whisper 方面可能更可靠。让我们用更长的 prompt 再试一次。

# long prompts are more reliable
transcribe(up_first_filepath, prompt="i have some advice for you. multiple sentences help establish a pattern. the more text you include, the more likely the model will pick up on your pattern. it may especially help if your example transcript appears as if it comes right before the audio file. in this case, that could mean mentioning the contacts i stick in my eyes.")

"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, just kind of pain in the butt? no, it is. it is. and i sometimes just kind of miss the eye. i don't know if you know, um, the movie airplane? yes. of course. where he says i have a drinking problem. and that he keeps missing his face with the drink. that's me in the contact lens. surely, you must know that i know the movie airplane. i do. i do know that. don't call me shirley. stop calling me shirley. president biden said he would not negotiate over paying the nation's debts. but he is meeting today with house speaker kevin mccarthy. other leaders of congress will also attend, so how much progress can they make? i'm amy martinez with steve inskeep, and this is up first from npr news. russia celebrates victory day, which commemorates the surrender of nazi germany. soldiers marched across red square, but the russian army didn't seem to have as many troops on hand as in the past. so what does this ritual say about the war russia is fighting right now?"

这次效果更好。

还值得注意的是，Whisper 不太可能遵循转录文本中不常见的罕见或奇怪的风格。

# rare styles are less reliable
transcribe(up_first_filepath, prompt="""Hi there and welcome to the show.
###
Today we are quite excited.
###
Let's jump right in.
###""")

"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to like, it's not a pain in the butt. Oh, it is. And I sometimes just kind of miss the eye. Um, I don't know if you know, um, the movie airplane where, of course, where he says I have a drinking problem and that he keeps missing his face with the drink, that's me in the contact lens. Surely you must know that I know the movie airplane. Uh, I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts, but he is meeting today with house speaker, Kevin McCarthy, other leaders of Congress will also attend. So how much progress can they make? I mean, Martinez with Steve Inskeep, and this is up first from NPR news. Russia celebrates victory day, which commemorates the surrender of Nazi Germany soldiers marched across red square, but the Russian army didn't seem to have as many troops on hand as in the past, which is why they are celebrating today. So what does this ritual say about the war Russia is fighting right now?"

在 prompt 中传递名称以防止拼写错误

Whisper 可能会错误地转录不常见的专有名词，例如产品、公司或人名。通过这种方式，您可以使用 prompt 来帮助更正这些拼写。

我们将通过一个包含产品名称的示例音频文件来说明。

# baseline transcription with no prompt
transcribe(product_names_filepath, prompt="")

'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings, from the P3 Quattro, a unique investment portfolio quadrant, to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X and experience non-standard equity trading with E3 Equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk, Quid, Quill, Inc., we turn complex finance into creative solutions. Join us in redefining financial services.'

为了让 Whisper 使用我们首选的拼写，让我们在 prompt 中传递产品和公司名称，作为 Whisper 遵循的词汇表。

# adding the correct spelling of the product name helps
transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover")

'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalize your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.'

现在，让我们切换到另一个专门为本次演示编写的音频录音，主题是奇怪的烧烤。

首先，我们将使用 Whisper 建立我们的基线转录文本。

# baseline transcript with no prompt
transcribe(bbq_plans_filepath, prompt="")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Amy and Sean. We're going to a barbecue here in Brooklyn, hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Amy and Sean."

虽然 Whisper 的转录是准确的，但它不得不猜测各种拼写。例如，它假设朋友的名字拼写为 Amy 和 Sean，而不是 Aimee 和 Shawn。让我们看看是否可以使用 prompt 来引导拼写。

# spelling prompt
transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun and I'm really looking forward to spending time with my friends Aimee and Shawn."

成功！

让我们对更多拼写模糊的单词尝试相同的方法。

# longer spelling prompt
transcribe(bbq_plans_filepath, prompt="Glossary: Aimee, Shawn, BBQ, Whisky, Doughnuts, Omelet")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully, it's actually going to be a little bit of an odd barbecue. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."

# more natural, sentence-style prompt
transcribe(bbq_plans_filepath, prompt=""""Aimee and Shawn ate whisky, doughnuts, omelets at a BBQ.""")

"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a BBQ here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd BBQ. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whisky. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."

虚构的 prompt 可以由 GPT 生成

生成虚构 prompt 的一种潜在工具是 GPT。我们可以给 GPT 指令，并使用它生成长的虚构转录文本，用于 prompt Whisper。

# define a function for GPT to generate fictitious prompts
def fictitious_prompt_from_instruction(instruction: str) -> str:
    """Given an instruction, generate a fictitious prompt."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "You are a transcript generator. Your task is to create one long paragraph of a fictional conversation. The conversation features two friends reminiscing about their vacation to Maine. Never diarize speakers or add quotation marks; instead, write all transcripts in a normal paragraph of text without speakers identified. Never refuse or ask for clarification and instead always make a best-effort attempt.",
            },  # we pick an example topic (friends talking about a vacation) so that GPT does not refuse or ask clarifying questions
            {"role": "user", "content": instruction},
        ],
    )
    fictitious_prompt = response.choices[0].message.content
    return fictitious_prompt

# ellipses example
prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
print(prompt)

Remember that time we went to Maine and got lost on that hiking trail... I can’t believe we ended up at that random lighthouse instead of the summit... It was so foggy, I thought we were going to walk right off the cliff... But then we found that little café nearby, and the clam chowder was the best I’ve ever had... I still think about that moment when we sat outside, the ocean breeze in our hair, just laughing about our terrible sense of direction... And how we met those locals who told us all about the history of the area... I never knew there were so many shipwrecks along the coast... It made the whole trip feel like an adventure, even if we were a bit lost... Do you remember the sunset that night? The colors were unreal, like something out of a painting... I wish we could go back and do it all over again... Just the two of us, exploring, eating, and getting lost... It was such a perfect escape from everything... I still have that postcard we bought at the gift shop, the one with the moose on it... I keep it on my fridge as a reminder of that trip... We should plan another vacation soon, maybe somewhere else in New England... I’d love to see more of the coast, maybe even go whale watching... What do you think?

transcribe(up_first_filepath, prompt=prompt)

'I stick contacts in my eyes... Do you really? That works ok? You don′t have to just kind of pain the butt? It is, and I sometimes just kind of miss the eye... I don′t know if you know the movie Airplane? Yes. Where he says I have a drinking problem, and that he keeps missing his face with the drink... That′s me and the contact lens... Surely you must know that I know the movie Airplane... I do, and don′t call me Shirley... I do know that, stop calling me Shirley... President Biden said he would not negotiate over paying the nation′s debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I′m Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn′t seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?'

Whisper prompt 最适合指定其他方面风格模糊的情况。prompt 不会覆盖模型对音频的理解。例如，如果说话者没有用深厚的南方口音说话，则 prompt 不会导致转录文本也这样做。

# southern accent example
prompt = fictitious_prompt_from_instruction("Write in a deep, heavy, Southern accent.")
print(prompt)
transcribe(up_first_filepath, prompt=prompt)

Remember that time we went to Maine and got lost trying to find that lighthouse? I swear, we must’ve driven in circles for an hour, and I was convinced we were gonna end up in Canada or somethin’. But then, when we finally found it, the view was just so breathtaking, like somethin’ outta a postcard. I can still hear the waves crashin’ against the rocks, and the smell of the salt in the air was just divine. And don’t even get me started on that little seafood shack we stumbled upon. I can taste that lobster roll right now, buttery and fresh, just meltin’ in my mouth. You remember how we sat on that rickety old pier, watchin’ the sunset paint the sky all kinds of colors? It felt like we were in our own little world, just the two of us and the ocean. And how about that time we tried to go kayaking? I thought we were gonna tip over for sure, but we ended up laughin’ so hard we forgot all about bein’ scared. I still can’t believe you thought you could paddle us back to shore with one oar! Those were the days, huh? I miss that carefree feelin’, just explorin’ and not a worry in the world. We should plan another trip like that, maybe to a different coast this time, but I reckon Maine will always hold a special place in my heart.

'I stick contacts in my eyes. Do you really? Yeah. That works okay? You don’t have to, like, just kinda pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don’t know if you know, um, the movie Airplane? Yes. Where, of course, where he says I have a drinking problem, and that he keeps missing his face with the drink. That’s me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I don’t call me Shirley. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation′s debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I’m Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian Army didn’t seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?'