此 Notebook 提供了关于使用 Azure 数据资源管理器 (Kusto) 作为带有 OpenAI 嵌入的向量数据库的逐步说明。
此 notebook 介绍了一个端到端的流程,包括:
为了本次练习的目的,我们需要准备几件事
%pip install wget
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, -1, Finished, Available)
Collecting wget Downloading wget-3.2.zip (10 kB) Preparing metadata (setup.py) ... [?25ldone [?25hBuilding wheels for collected packages: wget Building wheel for wget (setup.py) ... [?25l- done [?25h Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=10fd8aa1d20fd49c36389dc888acc721d0578c5a0635fc9fc5dc642c0f49522e Stored in directory: /home/trusted-service-user/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769 Successfully built wget Installing collected packages: wget Successfully installed wget-3.2 [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/bin/python -m pip install --upgrade pip[0m Note: you may need to restart the kernel to use updated packages.
Warning: PySpark kernel has been restarted to use updated packages.
%pip install openai
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, -1, Finished, Available)
Collecting openai Downloading openai-0.27.6-py3-none-any.whl (71 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m00:01[0m [?25hRequirement already satisfied: tqdm in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai) (4.65.0) Requirement already satisfied: requests>=2.20 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai) (2.28.2) Requirement already satisfied: aiohttp in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai) (3.8.4) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai) (1.26.14) Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai) (2022.12.7) Requirement already satisfied: idna<4,>=2.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai) (3.4) Requirement already satisfied: charset-normalizer<4,>=2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai) (2.1.1) Requirement already satisfied: attrs>=17.3.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (22.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (1.3.3) Requirement already satisfied: multidict<7.0,>=4.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (6.0.4) Requirement already satisfied: yarl<2.0,>=1.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (1.8.2) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (4.0.2) Requirement already satisfied: aiosignal>=1.1.2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai) (1.3.1) Installing collected packages: openai Successfully installed openai-0.27.6 [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/bin/python -m pip install --upgrade pip[0m Note: you may need to restart the kernel to use updated packages.
Warning: PySpark kernel has been restarted to use updated packages.
%pip install azure-kusto-data
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, -1, Finished, Available)
Requirement already satisfied: azure-kusto-data in /nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/lib/python3.10/site-packages (4.1.4) Requirement already satisfied: msal<2,>=1.9.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-kusto-data) (1.21.0) Requirement already satisfied: python-dateutil>=2.8.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-kusto-data) (2.8.2) Requirement already satisfied: azure-core<2,>=1.11.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-kusto-data) (1.26.4) Requirement already satisfied: requests>=2.13.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-kusto-data) (2.28.2) Requirement already satisfied: ijson~=3.1 in /nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/lib/python3.10/site-packages (from azure-kusto-data) (3.2.0.post0) Requirement already satisfied: azure-identity<2,>=1.5.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-kusto-data) (1.12.0) Requirement already satisfied: six>=1.11.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-core<2,>=1.11.0->azure-kusto-data) (1.16.0) Requirement already satisfied: typing-extensions>=4.3.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-core<2,>=1.11.0->azure-kusto-data) (4.5.0) Requirement already satisfied: cryptography>=2.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-identity<2,>=1.5.0->azure-kusto-data) (40.0.1) Requirement already satisfied: msal-extensions<2.0.0,>=0.3.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from azure-identity<2,>=1.5.0->azure-kusto-data) (1.0.0) Requirement already satisfied: PyJWT[crypto]<3,>=1.0.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from msal<2,>=1.9.0->azure-kusto-data) (2.6.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.13.0->azure-kusto-data) (1.26.14) Requirement already satisfied: charset-normalizer<4,>=2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.13.0->azure-kusto-data) (2.1.1) Requirement already satisfied: idna<4,>=2.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.13.0->azure-kusto-data) (3.4) Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.13.0->azure-kusto-data) (2022.12.7) Requirement already satisfied: cffi>=1.12 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from cryptography>=2.5->azure-identity<2,>=1.5.0->azure-kusto-data) (1.15.1) Requirement already satisfied: portalocker<3,>=1.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from msal-extensions<2.0.0,>=0.3.0->azure-identity<2,>=1.5.0->azure-kusto-data) (2.7.0) Requirement already satisfied: pycparser in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=2.5->azure-identity<2,>=1.5.0->azure-kusto-data) (2.21) [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/bin/python -m pip install --upgrade pip[0m Note: you may need to restart the kernel to use updated packages.
Warning: PySpark kernel has been restarted to use updated packages.
在本节中,我们将加载准备好的嵌入数据,因此您不必使用自己的额度重新计算维基百科文章的嵌入。
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip"
# The file is ~700 MB so this will take some time
wget.download(embeddings_url)
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 17, Finished, Available)
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("/lakehouse/default/Files/data")
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 18, Finished, Available)
import pandas as pd
from ast import literal_eval
article_df = pd.read_csv('/lakehouse/default/Files/data/vector_database_wikipedia_articles_embedded.csv')
# Read vectors from strings back into a list
article_df["title_vector"] = article_df.title_vector.apply(literal_eval)
article_df["content_vector"] = article_df.content_vector.apply(literal_eval)
article_df.head()
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 19, Finished, Available)
id | url | 标题 | 文本 | title_vector | content_vector | vector_id | |
---|---|---|---|---|---|---|---|
0 | 1 | https://simple.wikipedia.org/wiki/April | 四月 | 四月是一年中的第四个月,在公历中为 30 天。 | [0.001009464613161981, -0.020700545981526375, ... | [-0.011253940872848034, -0.013491976074874401,... | 0 |
1 | 2 | https://simple.wikipedia.org/wiki/August | 八月 | 八月(Aug.)是一年中的第八个月,在公历中为 31 天。 | [0.0009286514250561595, 0.000820168002974242, ... | [0.0003609954728744924, 0.007262262050062418, ... | 1 |
2 | 6 | https://simple.wikipedia.org/wiki/Art | 艺术 | 艺术是一种表达想象力或技术创造力的创造性活动。 | [0.003393713850528002, 0.0061537534929811954, ... | [-0.004959689453244209, 0.015772193670272827, ... | 2 |
3 | 8 | https://simple.wikipedia.org/wiki/A | A | A 或 a 是英语字母表的第一个字母。 | [0.0153952119871974, -0.013759135268628597, 0.... | [0.024894846603274345, -0.022186409682035446, ... | 3 |
4 | 9 | https://simple.wikipedia.org/wiki/Air | 空气 | 空气是指地球的大气层。空气是由多种气体组成的混合物。 | [0.02224554680287838, -0.02044147066771984, -0... | [0.021524671465158463, 0.018522677943110466, -... | 4 |
创建一个表,并根据数据帧中的内容将向量加载到 Kusto 中。“spark”选项“CreakeIfNotExists”将在表不存在时自动创建表
# replace with your AAD Tenant ID, Kusto Cluster URI, Kusto DB name and Kusto Table
AAD_TENANT_ID = ""
KUSTO_CLUSTER = ""
KUSTO_DATABASE = "Vector"
KUSTO_TABLE = "Wiki"
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 37, Finished, Available)
kustoOptions = {"kustoCluster": KUSTO_CLUSTER, "kustoDatabase" :KUSTO_DATABASE, "kustoTable" : KUSTO_TABLE }
# Replace the auth method based on your desired authentication mechanism - https://github.com/Azure/azure-kusto-spark/blob/master/docs/Authentication.md
access_token=mssparkutils.credentials.getToken(kustoOptions["kustoCluster"])
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 21, Finished, Available)
#Pandas data frame to spark dataframe
sparkDF=spark.createDataFrame(article_df)
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 22, Finished, Available)
/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:604: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
# Write data to a Kusto table
sparkDF.write. \
format("com.microsoft.kusto.spark.synapse.datasource"). \
option("kustoCluster",kustoOptions["kustoCluster"]). \
option("kustoDatabase",kustoOptions["kustoDatabase"]). \
option("kustoTable", kustoOptions["kustoTable"]). \
option("accessToken", access_token). \
option("tableCreateOptions", "CreateIfNotExist").\
mode("Append"). \
save()
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 23, Finished, Available)
OpenAI API 密钥用于文档和查询的向量化。您可以按照说明创建和检索您的 Azure OpenAI 密钥和终结点。 https://learn.microsoft.com/en-us/azure/cognitive-services/openai/tutorials/embeddings
请确保使用 text-embedding-3-small
模型。由于预计算的嵌入是使用 text-embedding-3-small
模型创建的,因此我们在搜索期间也必须使用它。
import openai
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 43, Finished, Available)
openai.api_version = '2022-12-01'
openai.api_base = '' # Please add your endpoint here
openai.api_type = 'azure'
openai.api_key = '' # Please add your api key here
def embed(query):
# Creates embedding vector from user query
embedded_query = openai.Embedding.create(
input=query,
deployment_id="embed", #replace with your deployment id
chunk_size=1
)["data"][0]["embedding"]
return embedded_query
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 44, Finished, Available)
如果您计划使用 OpenAI 进行嵌入,则仅运行此单元格
openai.api_key = ""
def embed(query):
# Creates embedding vector from user query
embedded_query = openai.Embedding.create(
input=query,
model="text-embedding-3-small",
)["data"][0]["embedding"]
return embedded_query
searchedEmbedding = embed("places where you worship")
#print(searchedEmbedding)
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 45, Finished, Available)
我们将搜索 Kusto 表以查找最接近的向量。
我们将使用 series-cosine-similarity-fl UDF 进行相似度搜索。
请在继续操作之前在您的数据库中创建该函数 - https://learn.microsoft.com/en-us/azure/data-explorer/kusto/functions-library/series-cosine-similarity-fl?tabs=query-defined
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table
import pandas as pd
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 35, Finished, Available)
KCSB = KustoConnectionStringBuilder.with_aad_device_authentication(
KUSTO_CLUSTER)
KCSB.authority_id = AAD_TENANT_ID
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 38, Finished, Available)
KUSTO_CLIENT = KustoClient(KCSB)
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 39, Finished, Available)
KUSTO_QUERY = "Wiki | extend similarity = series_cosine_similarity_fl(dynamic("+str(searchedEmbedding)+"), content_vector,1,1) | top 10 by similarity desc "
RESPONSE = KUSTO_CLIENT.execute(KUSTO_DATABASE, KUSTO_QUERY)
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 48, Finished, Available)
df = dataframe_from_result_table(RESPONSE.primary_results[0])
df
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 49, Finished, Available)
id | url | 标题 | 文本 | title_vector | content_vector | vector_id | 相似度 | |
---|---|---|---|---|---|---|---|---|
0 | 852 | https://simple.wikipedia.org/wiki/Temple | 寺庙 | 寺庙是人们去进行祈祷和崇拜的地方。 | [-0.021837441250681877, -0.007722342386841774,... | [-0.0019541378132998943, 0.007151313126087189,... | 413 | 0.834495 |
1 | 78094 | https://simple.wikipedia.org/wiki/Christian%20... | 基督教崇拜 | 在基督教中,崇拜被认为是基督徒对上帝的第一次责任。 | [0.0017675267299637198, -0.008890199474990368,... | [0.020530683919787407, 0.0024345638230443, -0.... | 20320 | 0.832132 |
2 | 59154 | https://simple.wikipedia.org/wiki/Service%20of... | 崇拜仪式 | 崇拜仪式是一种宗教聚会,人们聚集在一起进行崇拜。 | [-0.007969820871949196, 0.0004240311391185969,... | [0.003784010885283351, -0.0030924836173653603,... | 15519 | 0.831633 |
3 | 51910 | https://simple.wikipedia.org/wiki/Worship | 崇拜 | 崇拜是一个经常在宗教中使用的词。它指的是对上帝或神表示敬畏和尊敬的行为。 | [0.0036036288365721703, -0.01276545226573944, ... | [0.007925753481686115, -0.0110504487529397, 0.... | 14010 | 0.828185 |
4 | 29576 | https://simple.wikipedia.org/wiki/Altar | 祭坛 | 祭坛是一个地方,通常是一张桌子,宗教仪式在那里举行。 | [0.007887467741966248, -0.02706138789653778, -... | [0.023901859298348427, -0.031175222247838977, ... | 8708 | 0.824124 |
5 | 92507 | https://simple.wikipedia.org/wiki/Shrine | 神龛 | 神龛是一个神圣或神圣的地方,里面有一些重要的宗教物品。 | [-0.011601685546338558, 0.006366696208715439, ... | [0.016423320397734642, -0.0015560361789539456,... | 23945 | 0.823863 |
6 | 815 | https://simple.wikipedia.org/wiki/Synagogue | 犹太教堂 | 犹太教堂是犹太人聚在一起祈祷和崇拜的地方。 | [-0.017317570745944977, 0.0022673190105706453,... | [-0.004515442531555891, 0.003739549545571208, ... | 398 | 0.819942 |
7 | 68080 | https://simple.wikipedia.org/wiki/Shinto%20shrine | 神道教神社 | 神道教神社是神圣的地方或地点,神道教的神灵(kami)居住在那里。 | [0.0035740730818361044, 0.0028098472394049168,... | [0.011014971882104874, 0.00042272370774298906,... | 18106 | 0.818475 |
8 | 57790 | https://simple.wikipedia.org/wiki/Chapel | 小教堂 | 小教堂是基督徒崇拜的场所。这个词“chapel”在不同的基督教传统中有不同的含义。 | [-0.01371884811669588, 0.0031672674231231213, ... | [0.002526090247556567, 0.02482965588569641, 0.... | 15260 | 0.817608 |
9 | 142 | https://simple.wikipedia.org/wiki/Church%20%28... | 教堂(建筑物) | 教堂是一座为基督教宗教信仰而建造的建筑物。 | [0.0021336888894438744, 0.0029748091474175453,... | [0.016109377145767212, 0.022908871993422508, 0... | 74 | 0.812636 |
searchedEmbedding = embed("unfortunate events in history")
KUSTO_QUERY = "Wiki | extend similarity = series_cosine_similarity_fl(dynamic("+str(searchedEmbedding)+"), title_vector,1,1) | top 10 by similarity desc "
RESPONSE = KUSTO_CLIENT.execute(KUSTO_DATABASE, KUSTO_QUERY)
df = dataframe_from_result_table(RESPONSE.primary_results[0])
df
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 52, Finished, Available)
id | url | 标题 | 文本 | title_vector | content_vector | vector_id | 相似度 | |
---|---|---|---|---|---|---|---|---|
0 | 848 | https://simple.wikipedia.org/wiki/Tragedy | 悲剧 | 在戏剧中,亚里士多德定义的悲剧是一种模仿崇高和完整行为的戏剧类型。 | [-0.019502468407154083, -0.010160734876990318,... | [-0.012951433658599854, -0.018836138769984245,... | 410 | 0.851848 |
1 | 4469 | https://simple.wikipedia.org/wiki/The%20Holocaust | 大屠杀 | 大屠杀,有时也被称为 Shoah (),是 20 世纪 30 年代和 40 年代在欧洲发生的一场种族灭绝事件,纳粹德国及其合作者杀害了大约六百万犹太人。 | [-0.030233195051550865, -0.024401605129241943,... | [-0.016398731619119644, -0.013267949223518372,... | 1203 | 0.847222 |
2 | 64216 | https://simple.wikipedia.org/wiki/List%20of%20... | 历史瘟疫列表 | 此列表包含著名或有据可查的瘟疫和流行病。 | [-0.010667890310287476, -0.0003575817099772393... | [-0.010863155126571655, -0.0012196656316518784... | 16859 | 0.844411 |
3 | 4397 | https://simple.wikipedia.org/wiki/List%20of%20... | 列表 of disasters | 这是一个灾难列表,包括自然灾害和人为灾害。 | [-0.02713736332952976, -0.005278210621327162, ... | [-0.023679986596107483, -0.006126823835074902,... | 1158 | 0.843063 |
4 | 23073 | https://simple.wikipedia.org/wiki/Disaster | 灾难 | 灾难是非常糟糕的事情,会发生在短时间内,造成很多伤害。 | [-0.018235962837934497, -0.020034968852996823,... | [-0.02504003793001175, 0.007415903266519308, 0... | 7251 | 0.840334 |
5 | 4382 | https://simple.wikipedia.org/wiki/List%20of%20... | 恐怖事件列表 | 以下是按日期排列的恐怖主义行为和失败行为的列表。 | [-0.03989032283425331, -0.012808636762201786, ... | [-0.045838188380002975, -0.01682935282588005, ... | 1149 | 0.836162 |
6 | 13528 | https://simple.wikipedia.org/wiki/A%20Series%2... | 一系列不幸事件 | 《一系列不幸事件》是丹尼尔·汉德勒以笔名雷蒙·斯尼凯特创作的一系列儿童小说。 | [0.0010618815431371331, -0.0267023965716362, -... | [0.002801976166665554, -0.02904471382498741, -... | 4347 | 0.835172 |
7 | 42874 | https://simple.wikipedia.org/wiki/History%20of... | 世界历史 | 世界历史(也称为人类历史)是对人类过去的记忆、发现、收集、组织、呈现和解释。 | [0.0026915925554931164, -0.022206028923392296,... | [0.013645033352077007, -0.005165994167327881, ... | 11672 | 0.830243 |
8 | 4452 | https://simple.wikipedia.org/wiki/Accident | 事故 | 事故是指当事情在没有计划的情况下出错时发生的事情。 | [-0.004075294826179743, -0.0059883203357458115... | [0.00926120299845934, 0.013705797493457794, 0.... | 1190 | 0.826898 |
9 | 324 | https://simple.wikipedia.org/wiki/History | 历史 | 历史是对过去事件的研究。人们通过查看书面文件和文物来了解历史。 | [0.006603690329939127, -0.011856242083013058, ... | [0.0048830462619662285, 0.0032003086525946856,... | 170 | 0.824645 |