使用 Weaviate 进行嵌入搜索

2023年6月28日
在 Github 中打开

本笔记本将引导您完成一个简单的流程,以下载一些数据、嵌入数据,然后使用精选的向量数据库对其进行索引和搜索。对于希望在安全环境中存储和搜索我们的嵌入及其自身数据的客户来说,这是一个常见的需求,以支持生产用例,例如聊天机器人、主题建模等。

什么是向量数据库

向量数据库是一种专门用于存储、管理和搜索嵌入向量的数据库。近年来,使用嵌入将非结构化数据(文本、音频、视频等)编码为向量以供机器学习模型使用的方法激增,这归因于人工智能在解决涉及自然语言、图像识别和其他非结构化数据形式的用例方面的日益有效性。向量数据库已成为企业交付和扩展这些用例的有效解决方案。

为什么使用向量数据库

向量数据库使企业能够采用我们在本仓库中分享的许多嵌入用例(例如,问答、聊天机器人和推荐服务),并在安全、可扩展的环境中使用它们。我们的许多客户都在小规模地使用嵌入来解决他们的问题,但性能和安全性阻碍了他们投入生产——我们将向量数据库视为解决此问题的关键组件,在本指南中,我们将逐步介绍嵌入文本数据、将其存储在向量数据库中以及将其用于语义搜索的基础知识。

演示流程

演示流程如下

  • 设置:导入包并设置任何必需的变量
  • 加载数据:加载数据集并使用 OpenAI 嵌入对其进行嵌入
  • Weaviate
    • 设置:在这里,我们将设置 Weaviate 的 Python 客户端。有关更多详细信息,请访问此处
    • 索引数据:我们将创建一个索引,其中包含标题搜索向量
    • 搜索数据:我们将运行一些搜索以确认其有效

运行完本笔记本后,您应该对如何设置和使用向量数据库有一个基本的了解,并且可以继续进行更复杂的用例,利用我们的嵌入。

设置

导入所需的库并设置我们要使用的嵌入模型。

# We'll need to install the Weaviate client
!pip install weaviate-client

#Install wget to pull zip file
!pip install wget
import openai

from typing import List, Iterator
import pandas as pd
import numpy as np
import os
import wget
from ast import literal_eval

# Weaviate's client library for Python
import weaviate

# I've set this to our new embeddings model, this can be changed to the embedding model of your choice
EMBEDDING_MODEL = "text-embedding-3-small"

# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ResourceWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning) 

加载数据

在本节中,我们将加载我们之前准备好的嵌入数据。

embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip'

# The file is ~700 MB so this will take some time
wget.download(embeddings_url)
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
    zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
id url title text title_vector content_vector vector_id
0 1 https://simple.wikipedia.org/wiki/April April April 是 J... 年的第四个月 [0.001009464613161981, -0.020700545981526375, ... [-0.011253940872848034, -0.013491976074874401,... 0
1 2 https://simple.wikipedia.org/wiki/August August August (Aug.) 是... 年的第八个月 [0.0009286514250561595, 0.000820168002974242, ... [0.0003609954728744924, 0.007262262050062418, ... 1
2 6 https://simple.wikipedia.org/wiki/Art Art Art 是一种表达... 想象力的创造性活动 [0.003393713850528002, 0.0061537534929811954, ... [-0.004959689453244209, 0.015772193670272827, ... 2
3 8 https://simple.wikipedia.org/wiki/A A A 或 a 是英文字母表的第一个字母... [0.0153952119871974, -0.013759135268628597, 0.... [0.024894846603274345, -0.022186409682035446, ... 3
4 9 https://simple.wikipedia.org/wiki/Air Air Air 指的是地球大气层。空气是一种... [0.02224554680287838, -0.02044147066771984, -0... [0.021524671465158463, 0.018522677943110466, -... 4
# Read vectors from strings back into a list
article_df['title_vector'] = article_df.title_vector.apply(literal_eval)
article_df['content_vector'] = article_df.content_vector.apply(literal_eval)

# Set vector_id to be a string
article_df['vector_id'] = article_df['vector_id'].apply(str)
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   id              25000 non-null  int64 
 1   url             25000 non-null  object
 2   title           25000 non-null  object
 3   text            25000 non-null  object
 4   title_vector    25000 non-null  object
 5   content_vector  25000 non-null  object
 6   vector_id       25000 non-null  object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB

Weaviate

我们将探索的另一个向量数据库选项是 Weaviate,它提供托管的 SaaS 选项以及自托管的 开源 选项。由于我们已经研究过云向量数据库,因此我们将在此处尝试自托管选项。

为此,我们将

  • 设置 Weaviate 的本地部署
  • 在 Weaviate 中创建索引
  • 将我们的数据存储在那里
  • 触发一些相似性搜索查询
  • 尝试一个实际用例

自带向量方法

在本食谱中,我们提供了已生成向量的数据。对于数据已经向量化的场景,这是一种很好的方法。

使用 OpenAI 模块自动向量化

对于数据尚未向量化的场景,您可以将使用 OpenAI 进行向量化的任务委托给 Weaviate。Weaviate 提供了一个内置模块 text2vec-openai,它可以为您处理向量化,用于

  • 导入
  • 用于任何 CRUD 操作
  • 用于语义搜索

查看 Weaviate 和 OpenAI 模块入门食谱,逐步了解如何一步导入和向量化数据。

设置

要在本地运行 Weaviate,您需要 Docker。按照 Weaviate 文档 此处 包含的说明,我们在本仓库中创建了一个示例 docker-compose.yml 文件,保存在 ./weaviate/docker-compose.yml

启动 Docker 后,您可以通过导航到 examples/vector_databases/weaviate/ 目录并运行 docker-compose up -d 在本地启动 Weaviate。

SaaS

或者,您可以使用 Weaviate 云服务 (WCS) 创建免费的 Weaviate 集群。

  1. 创建一个免费帐户和/或登录 WCS
  2. 使用以下设置创建 Weaviate 集群
    • 沙箱:Sandbox Free
    • Weaviate 版本:使用默认版本(最新)
    • OIDC 身份验证:禁用
  3. 您的实例应在一两分钟内准备就绪
  4. 记下 集群 ID。该链接将带您到集群的完整路径(稍后您需要它来连接到集群)。它应该类似于:https://your-project-name-suffix.weaviate.network
# Option #1 - Self-hosted - Weaviate Open Source 
client = weaviate.Client(
    url="https://127.0.0.1:8080",
    additional_headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)
# Option #2 - SaaS - (Weaviate Cloud Service)
client = weaviate.Client(
    url="https://your-wcs-instance-name.weaviate.network",
    additional_headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)
client.is_ready()

索引数据

在 Weaviate 中,您创建模式来捕获您将要搜索的每个实体。

在本例中,我们将创建一个名为 Article 的模式,其中包含上面的 title 向量,供我们搜索。

接下来的几个步骤紧密遵循 Weaviate 提供的文档 此处

# Clear up the schema, so that we can recreate it
client.schema.delete_all()
client.schema.get()

# Define the Schema object to use `text-embedding-3-small` on `title` and `content`, but skip it for `url`
article_schema = {
    "class": "Article",
    "description": "A collection of articles",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
        "text2vec-openai": {
          "model": "ada",
          "modelVersion": "002",
          "type": "text"
        }
    },
    "properties": [{
        "name": "title",
        "description": "Title of the article",
        "dataType": ["string"]
    },
    {
        "name": "content",
        "description": "Contents of the article",
        "dataType": ["text"],
        "moduleConfig": { "text2vec-openai": { "skip": True } }
    }]
}

# add the Article schema
client.schema.create_class(article_schema)

# get the schema to make sure it worked
client.schema.get()
{'classes': [{'class': 'Article',
   'description': 'A collection of articles',
   'invertedIndexConfig': {'bm25': {'b': 0.75, 'k1': 1.2},
    'cleanupIntervalSeconds': 60,
    'stopwords': {'additions': None, 'preset': 'en', 'removals': None}},
   'moduleConfig': {'text2vec-openai': {'model': 'ada',
     'modelVersion': '002',
     'type': 'text',
     'vectorizeClassName': True}},
   'properties': [{'dataType': ['string'],
     'description': 'Title of the article',
     'moduleConfig': {'text2vec-openai': {'skip': False,
       'vectorizePropertyName': False}},
     'name': 'title',
     'tokenization': 'word'},
    {'dataType': ['text'],
     'description': 'Contents of the article',
     'moduleConfig': {'text2vec-openai': {'skip': True,
       'vectorizePropertyName': False}},
     'name': 'content',
     'tokenization': 'word'}],
   'replicationConfig': {'factor': 1},
   'shardingConfig': {'virtualPerPhysical': 128,
    'desiredCount': 1,
    'actualCount': 1,
    'desiredVirtualCount': 128,
    'actualVirtualCount': 128,
    'key': '_id',
    'strategy': 'hash',
    'function': 'murmur3'},
   'vectorIndexConfig': {'skip': False,
    'cleanupIntervalSeconds': 300,
    'maxConnections': 64,
    'efConstruction': 128,
    'ef': -1,
    'dynamicEfMin': 100,
    'dynamicEfMax': 500,
    'dynamicEfFactor': 8,
    'vectorCacheMaxObjects': 1000000000000,
    'flatSearchCutoff': 40000,
    'distance': 'cosine'},
   'vectorIndexType': 'hnsw',
   'vectorizer': 'text2vec-openai'}]}
### Step 1 - configure Weaviate Batch, which optimizes CRUD operations in bulk
# - starting batch size of 100
# - dynamically increase/decrease based on performance
# - add timeout retries if something goes wrong

client.batch.configure(
    batch_size=100,
    dynamic=True,
    timeout_retries=3,
)
<weaviate.batch.crud_batch.Batch at 0x3f0ca0fa0>
### Step 2 - import data

print("Uploading data with vectors to Article schema..")

counter=0

with client.batch as batch:
    for k,v in article_df.iterrows():
        
        # print update message every 100 objects        
        if (counter %100 == 0):
            print(f"Import {counter} / {len(article_df)} ")
        
        properties = {
            "title": v["title"],
            "content": v["text"]
        }
        
        vector = v["title_vector"]
        
        batch.add_data_object(properties, "Article", None, vector)
        counter = counter+1

print(f"Importing ({len(article_df)}) Articles complete")  
Uploading data with vectors to Article schema..
Import 0 / 25000 
Import 100 / 25000 
Import 200 / 25000 
Import 300 / 25000 
Import 400 / 25000 
Import 500 / 25000 
Import 600 / 25000 
Import 700 / 25000 
Import 800 / 25000 
Import 900 / 25000 
Import 1000 / 25000 
Import 1100 / 25000 
Import 1200 / 25000 
Import 1300 / 25000 
Import 1400 / 25000 
Import 1500 / 25000 
Import 1600 / 25000 
Import 1700 / 25000 
Import 1800 / 25000 
Import 1900 / 25000 
Import 2000 / 25000 
Import 2100 / 25000 
Import 2200 / 25000 
Import 2300 / 25000 
Import 2400 / 25000 
Import 2500 / 25000 
Import 2600 / 25000 
Import 2700 / 25000 
Import 2800 / 25000 
Import 2900 / 25000 
Import 3000 / 25000 
Import 3100 / 25000 
Import 3200 / 25000 
Import 3300 / 25000 
Import 3400 / 25000 
Import 3500 / 25000 
Import 3600 / 25000 
Import 3700 / 25000 
Import 3800 / 25000 
Import 3900 / 25000 
Import 4000 / 25000 
Import 4100 / 25000 
Import 4200 / 25000 
Import 4300 / 25000 
Import 4400 / 25000 
Import 4500 / 25000 
Import 4600 / 25000 
Import 4700 / 25000 
Import 4800 / 25000 
Import 4900 / 25000 
Import 5000 / 25000 
Import 5100 / 25000 
Import 5200 / 25000 
Import 5300 / 25000 
Import 5400 / 25000 
Import 5500 / 25000 
Import 5600 / 25000 
Import 5700 / 25000 
Import 5800 / 25000 
Import 5900 / 25000 
Import 6000 / 25000 
Import 6100 / 25000 
Import 6200 / 25000 
Import 6300 / 25000 
Import 6400 / 25000 
Import 6500 / 25000 
Import 6600 / 25000 
Import 6700 / 25000 
Import 6800 / 25000 
Import 6900 / 25000 
Import 7000 / 25000 
Import 7100 / 25000 
Import 7200 / 25000 
Import 7300 / 25000 
Import 7400 / 25000 
Import 7500 / 25000 
Import 7600 / 25000 
Import 7700 / 25000 
Import 7800 / 25000 
Import 7900 / 25000 
Import 8000 / 25000 
Import 8100 / 25000 
Import 8200 / 25000 
Import 8300 / 25000 
Import 8400 / 25000 
Import 8500 / 25000 
Import 8600 / 25000 
Import 8700 / 25000 
Import 8800 / 25000 
Import 8900 / 25000 
Import 9000 / 25000 
Import 9100 / 25000 
Import 9200 / 25000 
Import 9300 / 25000 
Import 9400 / 25000 
Import 9500 / 25000 
Import 9600 / 25000 
Import 9700 / 25000 
Import 9800 / 25000 
Import 9900 / 25000 
Import 10000 / 25000 
Import 10100 / 25000 
Import 10200 / 25000 
Import 10300 / 25000 
Import 10400 / 25000 
Import 10500 / 25000 
Import 10600 / 25000 
Import 10700 / 25000 
Import 10800 / 25000 
Import 10900 / 25000 
Import 11000 / 25000 
Import 11100 / 25000 
Import 11200 / 25000 
Import 11300 / 25000 
Import 11400 / 25000 
Import 11500 / 25000 
Import 11600 / 25000 
Import 11700 / 25000 
Import 11800 / 25000 
Import 11900 / 25000 
Import 12000 / 25000 
Import 12100 / 25000 
Import 12200 / 25000 
Import 12300 / 25000 
Import 12400 / 25000 
Import 12500 / 25000 
Import 12600 / 25000 
Import 12700 / 25000 
Import 12800 / 25000 
Import 12900 / 25000 
Import 13000 / 25000 
Import 13100 / 25000 
Import 13200 / 25000 
Import 13300 / 25000 
Import 13400 / 25000 
Import 13500 / 25000 
Import 13600 / 25000 
Import 13700 / 25000 
Import 13800 / 25000 
Import 13900 / 25000 
Import 14000 / 25000 
Import 14100 / 25000 
Import 14200 / 25000 
Import 14300 / 25000 
Import 14400 / 25000 
Import 14500 / 25000 
Import 14600 / 25000 
Import 14700 / 25000 
Import 14800 / 25000 
Import 14900 / 25000 
Import 15000 / 25000 
Import 15100 / 25000 
Import 15200 / 25000 
Import 15300 / 25000 
Import 15400 / 25000 
Import 15500 / 25000 
Import 15600 / 25000 
Import 15700 / 25000 
Import 15800 / 25000 
Import 15900 / 25000 
Import 16000 / 25000 
Import 16100 / 25000 
Import 16200 / 25000 
Import 16300 / 25000 
Import 16400 / 25000 
Import 16500 / 25000 
Import 16600 / 25000 
Import 16700 / 25000 
Import 16800 / 25000 
Import 16900 / 25000 
Import 17000 / 25000 
Import 17100 / 25000 
Import 17200 / 25000 
Import 17300 / 25000 
Import 17400 / 25000 
Import 17500 / 25000 
Import 17600 / 25000 
Import 17700 / 25000 
Import 17800 / 25000 
Import 17900 / 25000 
Import 18000 / 25000 
Import 18100 / 25000 
Import 18200 / 25000 
Import 18300 / 25000 
Import 18400 / 25000 
Import 18500 / 25000 
Import 18600 / 25000 
Import 18700 / 25000 
Import 18800 / 25000 
Import 18900 / 25000 
Import 19000 / 25000 
Import 19100 / 25000 
Import 19200 / 25000 
Import 19300 / 25000 
Import 19400 / 25000 
Import 19500 / 25000 
Import 19600 / 25000 
Import 19700 / 25000 
Import 19800 / 25000 
Import 19900 / 25000 
Import 20000 / 25000 
Import 20100 / 25000 
Import 20200 / 25000 
Import 20300 / 25000 
Import 20400 / 25000 
Import 20500 / 25000 
Import 20600 / 25000 
Import 20700 / 25000 
Import 20800 / 25000 
Import 20900 / 25000 
Import 21000 / 25000 
Import 21100 / 25000 
Import 21200 / 25000 
Import 21300 / 25000 
Import 21400 / 25000 
Import 21500 / 25000 
Import 21600 / 25000 
Import 21700 / 25000 
Import 21800 / 25000 
Import 21900 / 25000 
Import 22000 / 25000 
Import 22100 / 25000 
Import 22200 / 25000 
Import 22300 / 25000 
Import 22400 / 25000 
Import 22500 / 25000 
Import 22600 / 25000 
Import 22700 / 25000 
Import 22800 / 25000 
Import 22900 / 25000 
Import 23000 / 25000 
Import 23100 / 25000 
Import 23200 / 25000 
Import 23300 / 25000 
Import 23400 / 25000 
Import 23500 / 25000 
Import 23600 / 25000 
Import 23700 / 25000 
Import 23800 / 25000 
Import 23900 / 25000 
Import 24000 / 25000 
Import 24100 / 25000 
Import 24200 / 25000 
Import 24300 / 25000 
Import 24400 / 25000 
Import 24500 / 25000 
Import 24600 / 25000 
Import 24700 / 25000 
Import 24800 / 25000 
Import 24900 / 25000 
Importing (25000) Articles complete
# Test that all data has loaded – get object count
result = (
    client.query.aggregate("Article")
    .with_fields("meta { count }")
    .do()
)
print("Object count: ", result["data"]["Aggregate"]["Article"])
Object count:  [{'meta': {'count': 25000}}]
# Test one article has worked by checking one object
test_article = (
    client.query
    .get("Article", ["title", "content", "_additional {id}"])
    .with_limit(1)
    .do()
)["data"]["Get"]["Article"][0]

print(test_article["_additional"]["id"])
print(test_article["title"])
print(test_article["content"])
000393f2-1182-4e3d-abcf-4217eda64be0
Lago d'Origlio
Lago d'Origlio is a lake in the municipality of Origlio, in Ticino, Switzerland.

Lakes of Ticino

搜索数据

如上所述,我们将向我们的新索引发送一些查询,并根据与我们现有向量的接近程度返回结果

def query_weaviate(query, collection_name, top_k=20):

    # Creates embedding vector from user query
    embedded_query = openai.Embedding.create(
        input=query,
        model=EMBEDDING_MODEL,
    )["data"][0]['embedding']
    
    near_vector = {"vector": embedded_query}

    # Queries input schema with vectorised user query
    query_result = (
        client.query
        .get(collection_name, ["title", "content", "_additional {certainty distance}"])
        .with_near_vector(near_vector)
        .with_limit(top_k)
        .do()
    )
    
    return query_result
query_result = query_weaviate("modern art in Europe", "Article")
counter = 0
for article in query_result["data"]["Get"]["Article"]:
    counter += 1
    print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")
1. Museum of Modern Art (Certainty: 0.938) (Distance: 0.125)
2. Western Europe (Certainty: 0.934) (Distance: 0.133)
3. Renaissance art (Certainty: 0.932) (Distance: 0.136)
4. Pop art (Certainty: 0.93) (Distance: 0.14)
5. Northern Europe (Certainty: 0.927) (Distance: 0.145)
6. Hellenistic art (Certainty: 0.926) (Distance: 0.147)
7. Modernist literature (Certainty: 0.924) (Distance: 0.153)
8. Art film (Certainty: 0.922) (Distance: 0.157)
9. Central Europe (Certainty: 0.921) (Distance: 0.157)
10. European (Certainty: 0.921) (Distance: 0.159)
11. Art (Certainty: 0.921) (Distance: 0.159)
12. Byzantine art (Certainty: 0.92) (Distance: 0.159)
13. Postmodernism (Certainty: 0.92) (Distance: 0.16)
14. Eastern Europe (Certainty: 0.92) (Distance: 0.161)
15. Europe (Certainty: 0.919) (Distance: 0.161)
16. Cubism (Certainty: 0.919) (Distance: 0.161)
17. Impressionism (Certainty: 0.919) (Distance: 0.162)
18. Bauhaus (Certainty: 0.919) (Distance: 0.162)
19. Expressionism (Certainty: 0.918) (Distance: 0.163)
20. Surrealism (Certainty: 0.918) (Distance: 0.163)
query_result = query_weaviate("Famous battles in Scottish history", "Article")
counter = 0
for article in query_result["data"]["Get"]["Article"]:
    counter += 1
    print(f"{counter}. {article['title']} (Score: {round(article['_additional']['certainty'],3) })")
1. Historic Scotland (Score: 0.946)
2. First War of Scottish Independence (Score: 0.946)
3. Battle of Bannockburn (Score: 0.946)
4. Wars of Scottish Independence (Score: 0.944)
5. Second War of Scottish Independence (Score: 0.94)
6. List of Scottish monarchs (Score: 0.937)
7. Scottish Borders (Score: 0.932)
8. Braveheart (Score: 0.929)
9. John of Scotland (Score: 0.929)
10. Guardians of Scotland (Score: 0.926)
11. Holyrood Abbey (Score: 0.925)
12. Scottish (Score: 0.925)
13. Scots (Score: 0.925)
14. Robert I of Scotland (Score: 0.924)
15. Scottish people (Score: 0.924)
16. Edinburgh Castle (Score: 0.924)
17. Alexander I of Scotland (Score: 0.924)
18. Robert Burns (Score: 0.924)
19. Battle of Bosworth Field (Score: 0.922)
20. David II of Scotland (Score: 0.922)

让 Weaviate 处理向量嵌入

Weaviate 有一个 OpenAI 的内置模块,它可以处理为您的查询和任何 CRUD 操作生成向量嵌入所需的步骤。

这允许您使用 with_near_text 过滤器运行向量查询,该过滤器使用您的 OPEN_API_KEY

def near_text_weaviate(query, collection_name):
    
    nearText = {
        "concepts": [query],
        "distance": 0.7,
    }

    properties = [
        "title", "content",
        "_additional {certainty distance}"
    ]

    query_result = (
        client.query
        .get(collection_name, properties)
        .with_near_text(nearText)
        .with_limit(20)
        .do()
    )["data"]["Get"][collection_name]
    
    print (f"Objects returned: {len(query_result)}")
    
    return query_result
query_result = near_text_weaviate("modern art in Europe","Article")
counter = 0
for article in query_result:
    counter += 1
    print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")
Objects returned: 20
1. Museum of Modern Art (Certainty: 0.938) (Distance: 0.125)
2. Western Europe (Certainty: 0.934) (Distance: 0.133)
3. Renaissance art (Certainty: 0.932) (Distance: 0.136)
4. Pop art (Certainty: 0.93) (Distance: 0.14)
5. Northern Europe (Certainty: 0.927) (Distance: 0.145)
6. Hellenistic art (Certainty: 0.926) (Distance: 0.147)
7. Modernist literature (Certainty: 0.923) (Distance: 0.153)
8. Art film (Certainty: 0.922) (Distance: 0.157)
9. Central Europe (Certainty: 0.921) (Distance: 0.157)
10. European (Certainty: 0.921) (Distance: 0.159)
11. Art (Certainty: 0.921) (Distance: 0.159)
12. Byzantine art (Certainty: 0.92) (Distance: 0.159)
13. Postmodernism (Certainty: 0.92) (Distance: 0.16)
14. Eastern Europe (Certainty: 0.92) (Distance: 0.161)
15. Europe (Certainty: 0.919) (Distance: 0.161)
16. Cubism (Certainty: 0.919) (Distance: 0.161)
17. Impressionism (Certainty: 0.919) (Distance: 0.162)
18. Bauhaus (Certainty: 0.919) (Distance: 0.162)
19. Surrealism (Certainty: 0.918) (Distance: 0.163)
20. Expressionism (Certainty: 0.918) (Distance: 0.163)
query_result = near_text_weaviate("Famous battles in Scottish history","Article")
counter = 0
for article in query_result:
    counter += 1
    print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")
Objects returned: 20
1. Historic Scotland (Certainty: 0.946) (Distance: 0.107)
2. First War of Scottish Independence (Certainty: 0.946) (Distance: 0.108)
3. Battle of Bannockburn (Certainty: 0.946) (Distance: 0.109)
4. Wars of Scottish Independence (Certainty: 0.944) (Distance: 0.111)
5. Second War of Scottish Independence (Certainty: 0.94) (Distance: 0.121)
6. List of Scottish monarchs (Certainty: 0.937) (Distance: 0.127)
7. Scottish Borders (Certainty: 0.932) (Distance: 0.137)
8. Braveheart (Certainty: 0.929) (Distance: 0.141)
9. John of Scotland (Certainty: 0.929) (Distance: 0.142)
10. Guardians of Scotland (Certainty: 0.926) (Distance: 0.148)
11. Holyrood Abbey (Certainty: 0.925) (Distance: 0.15)
12. Scottish (Certainty: 0.925) (Distance: 0.15)
13. Scots (Certainty: 0.925) (Distance: 0.15)
14. Robert I of Scotland (Certainty: 0.924) (Distance: 0.151)
15. Scottish people (Certainty: 0.924) (Distance: 0.152)
16. Edinburgh Castle (Certainty: 0.924) (Distance: 0.153)
17. Alexander I of Scotland (Certainty: 0.924) (Distance: 0.153)
18. Robert Burns (Certainty: 0.924) (Distance: 0.153)
19. Battle of Bosworth Field (Certainty: 0.922) (Distance: 0.155)
20. David II of Scotland (Certainty: 0.922) (Distance: 0.157)