使用 GPT-4o 视觉模态优化检索增强生成

当处理富含图像、图形和表格的文档时，实施检索增强生成 (RAG) 会带来独特的挑战。传统的 RAG 模型在处理文本数据方面表现出色，但当视觉元素在传达信息方面起着至关重要的作用时，通常会表现不佳。在本食谱中，我们通过利用视觉模态来提取和解释视觉内容，从而弥合了这一差距，确保生成的响应尽可能地信息丰富和准确。

我们的方法包括将文档解析为图像，并利用元数据标记来识别包含图像、图形和表格的页面。当语义搜索检索到这样一个页面时，我们会将页面图像传递给视觉模型，而不是仅仅依赖文本。这种方法增强了模型理解和回答与视觉数据相关的用户查询的能力。

在本食谱中，我们将探索和演示以下关键概念

2. 解析 PDF 并提取视觉信息

探索将 PDF 页面转换为图像的技术。
使用 GPT-4o 视觉模态从包含图像、图形或表格的页面中提取文本信息。

3. 生成嵌入

利用嵌入模型创建文本数据的向量表示。
标记包含视觉内容的页面，以便我们在向量存储上设置元数据标志，并检索图像以使用视觉模态传递给 GPT-4o。

5. 执行相关页面的语义搜索

对页面文本实施语义搜索，以查找与用户查询最匹配的页面。
将匹配的页面文本作为上下文提供给 GPT-4o，以回答用户查询。

6. 处理包含视觉内容的页面（可选步骤）

了解如何使用 GPT-4o 视觉模态传递图像，以便使用附加上下文进行问题解答。
了解此过程如何提高涉及视觉数据的响应的准确性。

在本食谱结束时，您将全面了解如何实施能够处理和解释包含复杂视觉元素的文档的 RAG 系统。这些知识将使您能够构建 AI 解决方案，从而提供更丰富、更准确的信息，提高用户满意度和参与度。

我们将使用世界银行报告 - 建设更美好的银行，创造更美好的世界：2024 年年度报告来阐述这些概念，因为该文档包含图像、表格和图形数据的混合。

请记住，使用视觉模态是资源密集型的，会导致延迟和成本增加。建议仅在纯文本提取方法在评估基准上的性能不令人满意的情况下使用视觉模态。有了这个背景，让我们深入了解一下。

import os import time # Import the Pinecone library from pinecone.grpc import PineconeGRPC as Pinecone from pinecone import ServerlessSpec from dotenv import load_dotenv load_dotenv() api_key = os.getenv("PINECONE_API_KEY") # Initialize a Pinecone client with your API key pc = Pinecone(api_key) # Create a serverless index index_name = "my-test-index" if not pc.has_index(index_name): pc.create_index( name=index_name, dimension=3072, metric="cosine", spec=ServerlessSpec( cloud='aws', region='us-east-1' ) ) # Wait for the index to be ready while not pc.describe_index(index_name).status['ready']: time.sleep(1)

import base64 import requests import os import pandas as pd from PyPDF2 import PdfReader, PdfWriter from pdf2image import convert_from_bytes from io import BytesIO from openai import OpenAI from tqdm import tqdm # Link to the document we will use as the example document_to_parse = "https://documents1.worldbank.org/curated/en/099101824180532047/pdf/BOSIB13bdde89d07f1b3711dd8e86adb477.pdf" # OpenAI client oai_client = OpenAI() # Chunk the PDF document into single page chunks def chunk_document(document_url): # Download the PDF document response = requests.get(document_url) pdf_data = response.content # Read the PDF data using PyPDF2 pdf_reader = PdfReader(BytesIO(pdf_data)) page_chunks = [] for page_number, page in enumerate(pdf_reader.pages, start=1): pdf_writer = PdfWriter() pdf_writer.add_page(page) pdf_bytes_io = BytesIO() pdf_writer.write(pdf_bytes_io) pdf_bytes_io.seek(0) pdf_bytes = pdf_bytes_io.read() page_chunk = { 'pageNumber': page_number, 'pdfBytes': pdf_bytes } page_chunks.append(page_chunk) return page_chunks # Function to encode the image def encode_image(local_image_path): with open(local_image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') # Function to convert page to image def convert_page_to_image(pdf_bytes, page_number): # Convert the PDF page to an image images = convert_from_bytes(pdf_bytes) image = images[0] # There should be only one page # Define the directory to save images (relative to your script) images_dir = 'images' # Use relative path here # Ensure the directory exists os.makedirs(images_dir, exist_ok=True) # Save the image to the images directory image_file_name = f"page_{page_number}.png" image_file_path = os.path.join(images_dir, image_file_name) image.save(image_file_path, 'PNG') # Return the relative image path return image_file_path # Pass the image to the LLM for interpretation def get_vision_response(prompt, image_path): # Getting the base64 string base64_image = encode_image(image_path) response = oai_client.chat.completions.create( model="gpt-4o", messages=[ { "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" }, }, ], } ], ) return response # Process document function that brings it all together def process_document(document_url): try: # Update document status to 'Processing' print("Document processing started") # Get per-page chunks page_chunks = chunk_document(document_url) total_pages = len(page_chunks) # Prepare a list to collect page data page_data_list = [] # Add progress bar here for page_chunk in tqdm(page_chunks, total=total_pages, desc='Processing Pages'): page_number = page_chunk['pageNumber'] pdf_bytes = page_chunk['pdfBytes'] # Convert page to image image_path = convert_page_to_image(pdf_bytes, page_number) # Prepare question for vision API system_prompt = ( "The user will provide you an image of a document file. Perform the following actions: " "1. Transcribe the text on the page. **TRANSCRIPTION OF THE TEXT:**" "2. If there is a chart, describe the image and include the text **DESCRIPTION OF THE IMAGE OR CHART**" "3. If there is a table, transcribe the table and include the text **TRANSCRIPTION OF THE TABLE**" ) # Get vision API response vision_response = get_vision_response(system_prompt, image_path) # Extract text from vision response text = vision_response.choices[0].message.content # Collect page data page_data = { 'PageNumber': page_number, 'ImagePath': image_path, 'PageText': text } page_data_list.append(page_data) # Create DataFrame from page data pdf_df = pd.DataFrame(page_data_list) print("Document processing completed.") print("DataFrame created with page data.") # Return the DataFrame return pdf_df except Exception as err: print(f"Error processing document: {err}") # Update document status to 'Error' df = process_document(document_to_parse)

	页码	图像路径	页面文本
0	1	images/page_1.png	文本转录：\n\n公开披露授权\n公开披露授权\n建设更美好的银行，创造更美好的世界\n2024 年年度报告\n世界银行集团\nIBRD · IDA\n\n图像或图表描述：\n\n图像以夜景为特色，其中一个临时避难所从内部被照亮。避难所似乎是由织物制成的，上面有图案。在里面，人们可以通过开口看到，并且可以在避难所外的地面上看到一些物品，如鞋子。背景暗示了一个星空下的社区或家庭环境。覆盖在图像上的圆形图形元素可能暗示着互联互通或全球外展。
1	2	images/page_2.png	文本转录：\n\n目录\n\n总裁致辞 6\n执行董事致辞 8\n成为更好的银行 10\n2024 财年财务摘要 12\n按地区划分的结果 14\n按主题划分的结果 44\n我们的工作方式 68\n\n关键表格\nIBRD 关键财务指标，2020-24 财年 84\nIDA 关键财务指标，2020-24 财年 88\n\n本年度报告涵盖 2023 年 7 月 1 日至 2024 年 6 月 30 日期间，由国际复兴开发银行 (IBRD) 和国际开发协会 (IDA) 的执行董事共同编写，两者统称为世界银行，符合这两个机构各自的章程。世界银行集团总裁兼执行董事会主席阿杰·班加已将本报告连同随附的行政预算和经审计的财务报表提交给理事会。\n\n世界银行集团其他机构——国际金融公司 (IFC)、多边投资担保机构 (MIGA) 和国际投资争端解决中心 (ICSID)——的年度报告将单独发布。世界银行集团年度报告摘要中提供了每个机构年度报告的重点摘要。\n\n在整个报告中，“世界银行”和缩写“银行”仅指 IBRD 和 IDA；“世界银行集团”和缩写“银行集团”指五个机构。除非另有说明，否则本报告中使用的所有美元金额均为现行美元。分配给多区域项目的资金在可能的情况下按受益国在表格和文本中核算，以指区域细分。对于部门和主题细分，资金按业务核算。财政年度承诺和支付数据符合 IBRD 和 IDA 财务报表以及 2024 财年管理层讨论与分析文件中报告的经审计数据。由于四舍五入，表格中的数字可能与总数不符，图表中的百分比可能与 100% 不符。\n\n图像或图表描述\n\n图像显示一只手拿着一束稻穗的特写镜头，稻穗呈金黄色。背景模糊，显示更多的稻田。
2	3	images/page_3.png	文本转录：\n\n关于我们\n\n世界银行集团是世界上最大的发展中国家资金和知识来源之一。我们的五个机构共同致力于减少贫困、增加共同繁荣和促进可持续发展。\n\n我们的愿景\n我们的愿景是创建一个没有贫困、宜居地球的世界。\n\n我们的使命\n我们的使命是在一个宜居的地球上消除极端贫困并促进共同繁荣。这受到多重相互关联的危机的威胁。时间至关重要。我们正在建设一个更好的银行，以推动有影响力的发展，这种发展是：\n• 包容所有人，包括妇女和年轻人；\n• 对冲击具有韧性，包括应对气候和生物多样性危机、流行病和脆弱性；\n• 可持续的，通过增长和创造就业、人类发展、财政和债务管理、粮食安全以及获得清洁空气、水和负担得起的能源来实现。\n\n为了实现这一目标，我们将作为一个世界银行集团与所有客户合作，并与其他多边机构、私营部门和民间社会密切合作。\n\n我们的核心价值观\n我们的工作以我们的核心价值观为指导：影响、诚信、尊重、团队合作和创新。这些价值观贯穿于我们所做的每一件事，我们工作的每一个地方。
3	4	images/page_4.png	文本转录：\n\n推动行动，衡量成果\n\n世界银行集团为世界各地有影响力的、有意义的发展成果做出贡献。在 2024 财年上半年，我们：\n\n- 帮助 1.56 亿人摆脱饥饿\n- 改善了 2.8 亿学生的教育\n- 为 2.87 亿贫困人口提供了有效的社会保护支持†\n- 为 5900 万人提供了健康的饮用水、卫生设施和/或个人卫生\n- 为 7700 万人提供了可持续交通\n- 提供了 17 吉瓦的可再生能源容量\n- 承诺到 2025 年将年度融资的 45% 用于气候行动，在减缓和适应之间平均分配\n\n新记分卡的制定在印刷时正在进行中；因此，本报告只能说明截至 2023 年 12 月 31 日的成果。\n截至 2024 年国际货币基金组织-世界银行集团年会，完整的 2024 财年记分卡数据将在以下网址提供：https://scorecard.worldbankgroup.org\n\n† 仅限 IBRD 和 IDA 指标。\n\n在 2024 财年，世界银行集团宣布制定一个新的记分卡，该记分卡将跟踪 22 个指标（仅占之前 150 个指标的一小部分）的成果，以清晰、简洁地展示世界银行集团在所有方面的进展，从改善医疗保健的可及性到使粮食系统可持续发展，再到促进私人投资。\n\n所有世界银行集团融资机构的工作将首次通过同一组指标进行跟踪。新的记分卡将跟踪世界银行集团在宜居地球上消除贫困的总体愿景。\n\n世界银行 2024 年年度报告\n\n图像或图表描述：\n\n图像显示一系列圆形照片，这些照片通过文本高亮显示来连接，描绘了世界银行集团的成就。照片包括与食物、教育、社会保护、水、交通、可再生能源和环境倡议相关的人员和基础设施。每张照片都与描述特定成就或承诺的文本条目相关联。
4	5	images/page_5.png	文本转录：\n\n总裁致辞\n\n履行我们的承诺要求我们开发新的、更好的工作方式。在 2024 财年，我们做到了。\n\n阿杰·班加\n\n在 2024 财年，世界银行集团通过了一项大胆的新愿景，即创建一个没有贫困、宜居地球的世界。为了实现这一目标，世界银行集团正在进行改革，以成为政府、私营部门以及最终我们服务的人民更好的合作伙伴。在我们 80 年的历史中，我们的工作从未像现在这样紧迫：我们在消除贫困的斗争中面临进展缓慢、生存气候危机、公共债务增加、粮食不安全、疫情后复苏不平等以及地缘政治冲突的影响。\n\n应对这些相互交织的挑战需要一个更快、更简单、更高效的世界银行集团。我们正在重新调整重点，不仅通过资金，而且通过知识来应对这些挑战。我们在 2024 财年发布的《行动知识契约》详细介绍了我们如何通过使我们丰富的发展知识更易于获取来增强所有世界银行集团客户（公共和私营部门）的能力。我们还重新组织了世界银行的全球实践，将其划分为五个副总裁部门——人民、繁荣、地球、基础设施和数字——以便与客户进行更灵活、更快速的互动。这些部门在 2024 财年都取得了重要的里程碑。\n\n我们正在支持各国在 2030 年之前为 15 亿人提供优质、负担得起的医疗服务，以便我们的子孙后代能够过上更健康、更好的生活。这是我们通过一个人的一生（婴儿期、童年期、青春期和成年期）的每个阶段解决基本护理标准的更大全球努力的一部分。为了帮助人们抵御受食物影响的冲击和危机，我们正在加强社会保护服务，目标是在 2030 年底之前为 5 亿人提供支持，其中一半受益者是妇女。\n\n我们正在帮助发展中国家创造就业和就业机会，这是最可靠的繁荣推动因素。在未来 10 年里，全球南方将有 12 亿年轻人达到工作年龄。然而，在同一时期和同一国家，预计只会创造 4.24 亿个就业岗位。数亿年轻人看不到体面的工作或未来的希望，其代价是难以想象的，我们正在紧急努力为所有人创造机会。\n\n为了应对气候变化——可以说是我们这一代人面临的最大挑战——我们正在将年度融资的 45% 用于气候行动，到 2025 年在减缓和适应之间平均分配。在其他努力中，我们计划到 2026 财年启动至少 15 个国家主导的甲烷减排计划，我们的森林碳伙伴关系基金已帮助加强高诚信碳市场。\n\n电力供应是一项基本人权，也是任何成功发展努力的基础。它将加速发展中国家的数字化发展，加强公共基础设施，并为人们迎接未来的工作做好准备。但是，非洲一半的人口（6 亿人）无法获得电力。为了应对这一挑战，我们承诺与非洲开发银行合作，到 2030 年为撒哈拉以南非洲地区的 3 亿人提供电力。\n\n认识到数字化是我们这个时代的变革性机遇，我们正在与 100 多个发展中国家的政府合作，以实现数字经济。截至 2024 年 6 月，我们的数字贷款组合承诺总额为 65 亿美元，我们新的数字副总裁部门将指导我们努力建立数字经济的基础。关键措施包括建设和加强数字和数据基础设施，确保机构、企业和公民的网络安全和数据隐私，以及推进数字政务服务。\n\n履行我们的承诺要求我们开发新的、更好的工作方式。在 2024 财年，我们做到了。我们正在压缩我们的资产负债表，寻找新的机会来承担更多风险并增加我们的贷款。我们新的危机准备和应对工具、全球挑战计划和宜居地球基金展示了我们如何使我们的方法现代化，以更好地蓬勃发展并实现成果。我们的新记分卡从根本上改变了我们跟踪成果的方式。\n\n但我们不能单枪匹马地交付成果；我们依赖我们自己。我们需要来自公共和私营部门的合作伙伴加入我们的努力。这就是为什么我们正在与其他多边开发银行密切合作，以切实、可衡量的方式改善发展中国家人民的生活。我们与私营部门关系的深化体现在我们的私营部门投资实验室中，该实验室正在努力消除阻碍私营部门在新兴市场投资的障碍。该实验室由 15 位首席执行官和主席组成的核心小组定期举行会议，并且已经为我们的工作提供了信息，最显著的是世界银行集团担保平台的开发。\n\n我们今年交付的影响和创新将使我们能够以更高的雄心和更强的紧迫感向前迈进，以改善人们的生活。我要感谢我们员工和执行董事的卓越努力，以及我们的客户和合作伙伴的坚定支持。展望 2025 财年，我们满怀乐观——并决心建设一个更美好的银行，创造一个更美好的世界。\n\n阿杰·班加\n世界银行集团总裁\n兼执行董事会主席\n\n图像或图表描述：\n\n图像显示一群人从事农业活动。一个人拿着一个西红柿，其他人正在观察。这反映了农业实践中的合作或援助，可能是在发展中国家。

**TRANSCRIPTION OF THE TEXT:** We also committed $35 million in grants to support emergency relief in Gaza. Working with the World Food Programme, the World Health Organization, and the UN Children’s Fund, the grants supported the delivery of emergency food, water, and medical supplies. In the West Bank, we approved a $200 million program for the continuation of education for children, $22 million to support municipal services, and $45 million to strengthen healthcare and hospital services. **Enabling green and resilient growth** To help policymakers in the region advance their climate change and development goals, we published Country Climate and Development Reports for the West Bank and Gaza, Lebanon, and Tunisia. In Libya, the catastrophic flooding in September 2023 devastated eastern localities, particularly the city of Derna. The World Bank, together with the UN and the European Union, produced a Rapid Damage and Needs Assessment to inform recovery and reconstruction efforts. We signed a new Memorandum of Understanding (MoU) with the Islamic Development Bank to promote further collaboration between our institutions. The MoU focuses on joint knowledge and operational engagements around the energy, food, and water nexus, climate impact, empowering women and youth to engage with the private sector, and advancing the digital transition and regional integration. The MoU aims to achieve a co-financing value of $6 billion through 2026, 45 percent of which has already been met. **Expanding economic opportunities for women** The World Bank has drawn on a variety of instruments to support Jordan’s commitment to increase female labor force participation, including through the recently approved Country Partnership Framework. Through operations, technical assistance (such as Mashreq Gender Facility; Women Entrepreneurs Finance Initiative; and the Women, Business and the Law report), and policy dialogue, we have contributed to legal reforms in Jordan that removed job restrictions on women, prohibited gender-based discrimination in the workplace, and criminalized sexual harassment in the workplace. In fiscal 2024, we approved the first women-focused Bank project in the region: the Enhancing Women’s Economic Opportunities Program for Results aims to improve workplace conditions, increase financial inclusion and entrepreneurship, make public transport safer, and increase access to affordable, quality childcare services. **Analyzing critical infrastructure needs** We published an Interim Damage Assessment for Gaza in partnership with the UN and with financial support from the EU. This found that a preliminary estimate of the cost of damages to critical infrastructure from the conflict in Gaza between October 2023 and the end of January 2024 was around $18.5 billion—equivalent to 97 percent of the 2022 GDP of the West Bank and Gaza combined. When the situation allows, a full-fledged Rapid Damage and Needs Assessment will be conducted. **COUNTRY IMPACT** Egypt: The Bank-supported Takaful and Karama social protection program has reached 4.7 million vulnerable households, benefitting approximately 20 million individuals, 75 percent of them women. Lebanon: A roads project has rehabilitated over 500 km of roads in 25 districts across the country and generated 1.3 million labor days for Lebanese workers and Syrian refugees. Morocco: Our programs have benefited more than 400,000 people directly and more than 33 million people indirectly, through more than 230 disaster risk reduction projects. **DESCRIPTION OF THE IMAGE OR CHART:** The image is a pie chart titled "FIGURE 6: MIDDLE EAST AND NORTH AFRICA IBRD AND IDA LENDING BY SECTOR - FISCAL 2024 SHARE OF TOTAL OF $4.6 BILLION." The chart breaks down the sectors as follows: - Public Administration: 24% - Social Protection: 13% - Health: 13% - Education: 17% - Agriculture, Fishing, and Forestry: 8% - Water, Sanitation, and Waste Management: 8% - Transportation: 5% - Energy and Extractives: 3% - Financial Sector: 1% - Industry, Trade, and Services: 2% - Information and Communications Technologies: 6% **TRANSCRIPTION OF THE TABLE:** TABLE 13: MIDDLE EAST AND NORTH AFRICA REGIONAL SNAPSHOT | INDICATOR | 2000 | 2012 | CURRENT DATA* | |----------------------------------------------------------|--------|----------|---------------| | Total population (millions) | 283.9 | 356.2 | 430.9 | | Population growth (annual %) | 2.0 | 1.8 | 1.5 | | GNI per capita (Atlas method, current US$) | 1,595.5| 4,600.4 | 3,968.1 | | GDP per capita growth (annual %) | 4.0 | 1.7 | 1.2 | | Population living below $2.15 a day (millions) | 9.7 | 8.2 | 19.1 | | Life expectancy at birth, females (years) | 70.8 | 73.9 | 74.8 | | Life expectancy at birth, males (years) | 66.5 | 69.6 | 69.9 | | Carbon dioxide emissions (megatons) | 813.2 | 1,297.7 | 1,370.9 | | Extreme poverty (% of population below $2.15 a day, 2017 PPP)| 3.4 | 2.3 | 4.7 | | Debt service as a proportion of exports of goods, services, and primary income | 15.1 | 5.2 | 12.4 | | Ratio of female to male labor force participation rate (%) (modeled ILO estimate) | 24.5 | 26.2 | 23.2 | | Vulnerable employment, total (% of total employment) (modeled ILO estimate) | 35.4 | 31.7 | 31.4 | | Under-5 mortality rate per 1,000 live births | 46.7 | 29.0 | 20.9 | | Primary completion rate (% of relevant age group) | 81.4 | 88.9 | 86.7 | | Individuals using the Internet (% of population) | 0.9 | 26.0 | 73.4 | | Access to electricity (% of population) | 91.4 | 94.7 | 96.9 | | Renewable energy consumption (% of total final energy consumption) | 3.0 | 3.6 | 2.9 | | People using at least basic drinking water services (% of population) | 86.5 | 90.6 | 93.7 | | People using at least basic sanitation services (% of population) | 79.4 | 86.2 | 90.4 | *Note: ILO = International Labour Organization. PPP = purchasing power parity. a. The most current data available between 2018 and 2023; visit [https://data.worldbank.org](https://data.worldbank.org) for data updates. For more information, visit [www.worldbank.org/mena](http://www.worldbank.org/mena).

# Add a column to flag pages with visual content df['Visual_Input_Processed'] = df['PageText'].apply( lambda x: 'Y' if 'DESCRIPTION OF THE IMAGE OR CHART' in x or 'TRANSCRIPTION OF THE TABLE' in x else 'N' ) # Function to get embeddings def get_embedding(text_input): response = oai_client.embeddings.create( input=text_input, model="text-embedding-3-large" ) return response.data[0].embedding # Generate embeddings with a progress bar embeddings = [] for text in tqdm(df['PageText'], desc='Generating Embeddings'): embedding = get_embedding(text) embeddings.append(embedding) # Add the embeddings to the DataFrame df['Embeddings'] = embeddings

# reload the index from Pinecone index = pc.Index(index_name) # Create a document ID prefix document_id = 'WB_Report' # Define the async function correctly def upsert_vector(identifier, embedding, metadata): try: index.upsert([ { 'id': identifier, 'values': embedding, 'metadata': metadata } ]) except Exception as e: print(f"Error upserting vector with ID {identifier}: {e}") raise for idx, row in tqdm(df.iterrows(), total=df.shape[0], desc='Uploading to Pinecone'): pageNumber = row['PageNumber'] # Create meta-data tags to be added to Pinecone metadata = { 'pageId': f"{document_id}-{pageNumber}", 'pageNumber': pageNumber, 'text': row['PageText'], 'ImagePath': row['ImagePath'], 'GraphicIncluded': row['Visual_Input_Processed'] } upsert_vector(metadata['pageId'], row['Embeddings'], metadata)

import json # Function to get response to a user's question def get_response_to_question(user_question, pc_index): # Get embedding of the question to find the relevant page with the information question_embedding = get_embedding(user_question) # get response vector embeddings response = pc_index.query( vector=question_embedding, top_k=2, include_values=True, include_metadata=True ) # Collect the metadata from the matches context_metadata = [match['metadata'] for match in response['matches']] # Convert the list of metadata dictionaries to prompt a JSON string context_json = json.dumps(context_metadata, indent=3) prompt = f"""You are a helpful assistant. Use the following context and images to answer the question. In the answer, include the reference to the document, and page number you found the information on between <source></source> tags. If you don't find the information, you can say "I couldn't find the information" question: {user_question} <SOURCES> {context_json} </SOURCES> """ # Call completions end point with the prompt completion = oai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": prompt} ] ) return completion.choices[0].message.content

The increase in access to electricity between 2000 and 2012 in Western and Central Africa was from 34.1% to 44.1%, which is an increase of 10 percentage points. <source>WB_Report-13, page 13</source>

import base64 import json def get_response_to_question_with_images(user_question, pc_index): # Get embedding of the question to find the relevant page with the information question_embedding = get_embedding(user_question) # Get response vector embeddings response = pc_index.query( vector=question_embedding, top_k=3, include_values=True, include_metadata=True ) # Collect the metadata from the matches context_metadata = [match['metadata'] for match in response['matches']] # Build the message content message_content = [] # Add the initial prompt initial_prompt = f"""You are a helpful assistant. Use the text and images provided by the user to answer the question. You must include the reference to the page number or title of the section you the answer where you found the information. If you don't find the information, you can say "I couldn't find the information" question: {user_question} """ message_content.append({"role": "system", "content": initial_prompt}) context_messages = [] # Process each metadata item to include text or images based on 'Visual_Input_Processed' for metadata in context_metadata: visual_flag = metadata.get('GraphicIncluded') page_number = metadata.get('pageNumber') page_text = metadata.get('text') message ="" if visual_flag =='Y': # Include the image print(f"Adding page number {page_number} as an image to context") image_path = metadata.get('ImagePath', None) try: base64_image = encode_image(image_path) image_type = 'jpeg' # Prepare the messages for the API call context_messages.append({ "type": "image_url", "image_url": { "url": f"data:image/{image_type};base64,{base64_image}" }, }) except Exception as e: print(f"Error encoding image at {image_path}: {e}") else: # Include the text print(f"Adding page number {page_number} as text to context") context_messages.append({ "type": "text", "text": f"Page {page_number} - {page_text}", }) # Prepare the messages for the API call messages = { "role": "user", "content": context_messages } message_content.append(messages) completion = oai_client.chat.completions.create( model="gpt-4o", messages=message_content ) return completion.choices[0].message.content

Adding page number 13.0 as an image to context Adding page number 12.0 as an image to context Adding page number 11.0 as an image to context The percentage allocated to social protection in Western and Central Africa is 8% (Figure 2: Western and Central Africa; IBRD and IDA Lending by Sector).

Adding page number 32.0 as an image to context Adding page number 10.0 as an image to context Adding page number 4.0 as an image to context ### Image Descriptions 1. **Page 60-61 (Digital Section)**: - **Left Side**: A person is sitting and working on a laptop, holding a smartphone. The setting seems informal, possibly in a small office or a cafe. - **Text**: Discussion on scaling digital development, thought leadership, partnerships, and establishment of a Digital Vice Presidency unit for digital transformation efforts. 2. **Page 16-17 (Eastern and Southern Africa Section)**: - **Right Side**: A group of people standing on a paved street, some using mobile phones. It seems to be a casual, evening setting. - **Text**: Information about improving access to electricity in Rwanda and efforts for education and other services in Eastern and Southern Africa. 3. **Page 4-5 (Driving Action, Measuring Results)**: - **Images**: Various circular images and icons accompany text highlights such as feeding people, providing schooling, access to clean water, transport, and energy. - **Text**: Summary of key development results achieved by the World Bank Group in fiscal 2024. These images illustrate the initiatives and impacts of the World Bank's projects and activities in various sectors.

1. 使用 Pinecone 设置向量存储

2. 解析 PDF 并提取视觉信息

3. 生成嵌入

4. 将嵌入上传到 Pinecone

5. 执行相关页面的语义搜索

6. 处理包含视觉内容的页面（可选步骤）

步骤 1：使用 Pinecone 设置向量存储

步骤 2：解析 PDF 并提取视觉信息

步骤 3：生成嵌入

步骤 4：将嵌入上传到 Pinecone

步骤 5：执行相关页面的语义搜索

步骤 6：处理包含视觉内容的页面（可选步骤）

结论