Skip to main content

2 posts tagged with "OpenSearch"

View All Tags

· 8 min read
Noam Schwartz

Learn how to install OpenSearch Benchmark, create “workloads,” and benchmark them between computing devices

Photo by Ben White on Unsplash

OpenSearch users often want to know how their searches will perform in various environments, host types, and cluster configurations. OpenSearch Benchmark, a community-driven, open-source fork of Rally, is the ideal tool for that purpose.

OpenSearch-benchmark helps you to reduce infrastructure costs by optimizing OpenSearch resource usage. This tool also enables you to discover performance regressions and improve performance by running periodic benchmarks. Before benchmarking, you should try several other steps to improve performance — a subject I discussed in an earlier article.

In this article, I will lead you through setting up OpenSearch Benchmark and running search performance benchmarking comparing a widely used EC2 instance to a new computing accelerator — the Associative Processing Unit (APU) by

Step 1: Install Opensearch-benchmark

We’ll be using an m5.4xlarge (us-west-1) EC2 machine on which I installed OpenSearch and indexed a 9.1 M-sized vector index called laion_text. The index is a subset of the large laion dataset where I converted the text field to a vector representation (using a CLIP model):

Install Python 3.8+, including pip3, git 1.9+, and an appropriate JDK to run OpenSearch. Be sure that JAVA_HOME points to that JDK. Then run the following command:

sudo python3.8 -m pip install opensearch-benchmark

Tip: You might need to install each dependency manually.

  • sudo apt install python3.8-dev
  • sudo apt install python3.8-distutils
  • python3.8 -m pip install multidict –upgrade
  • python3.8 -m pip install attrs — upgrade
  • python3.8 -m pip install yarl –upgrade
  • python3.8 -m pip install async_timeout –upgrade
  • python3.8 -m pip install aiosignal — upgrade

Run the following to verify that the installation was successful:

opensearch-benchmark list workloads

You should see the following details:

Screenshot by the author

Step 2: Configure Where You Want Results To Be Saved

By default, OpenSearch Benchmark reports to “in-memory.” If set to “in-memory,” all metrics will be kept in memory while running the benchmark. If set to “opensearch,” all metrics will be written to a persistent metrics store, and the data will be available for further analysis.

To save the reported results in your OpenSearch cluster, open the opensearch-benchmark.ini file, which can be found in the ~/.benchmark folder and then modify the results publishing section in the highlighted area to write to the OpenSearch cluster:

Screenshot by the author

Step 3: Construct the Search “Workload”

Photo by Scott Blake on Unsplash

Now that we have OpenSearch Benchmark installed properly, it’s time to start benchmarking!

The plan is to use OpenSearch Benchmark to compare searches between two computing devices. You can use the following method to benchmark and compare any instance you wish. In this example, we will test a commonly used KNN flat search (an ANN example using IVF and HNSW will be covered in my next article) and compare an m5.4xlarge EC2 instance to the APU.

You can access the APU through a plugin downloaded from’s SaaS platform. You can test the following benchmarking process on your own environment and data. A free trial is available, and registration is simple.

Each test/track in OpenSearch Benchmark is called a “workload.” We will create a workload for searching on the m5.4xlarge, which will act as our baseline. We will also create a workload for searching on the APU on the same EC2, which will act as our contender. Later, we will compare the performance of both workloads.

Let’s start by creating a workload for both the m5.4xlarge (CPU) and the APU using thelaion_text index (make sure you run these commands from within the .benchmark directory):

opensearch-benchmark create-workload --workload=laion_text_cpu --target-hosts=localhost:9200 --indices="laion_text”

opensearch-benchmark create-workload --workload=laion_text_apu --target-hosts=localhost:9200 --indices="laion_text”

Note: If the workloads are saved in a _workloads_ folder in your _home_ folder, you will need to copy them to the _.benchmark/benchmarks/workloads/default_ directory.

Run the opensearch-benchmark list workloads again and note that both laion_text_cpu and laion_text_apu are listed.

Next, we’ll add operations to the test schedule. You can add as many benchmarking tests as you want in this section. Add each test to the schedule in the workload.json file, which can be found in the folder with the index name you wish to benchmark.

In our case, it can be found in the following areas:

  • ./benchmark/benchmarks/workloads/default/laion_text_apu
  • ./benchmark/benchmarks/workloads/default/laion_text_cpu

We want to test out our OpenSearch search. Create an operation named “single vector search” (or any other name) and include a query vector. I cut out the vector itself because a 512 dimension vector would be a bit long… Add in the desired query vector and make sure to copy the same vector to the m5.4xlarge (CPU) and APU workload.json files!

Next, add any parameters you want. In this example, I will stick with the default eight clients and 1,000 iterations.

m5.4xlarge (CPU) workload.json:



APU workload.json:


Step 4: Run our Workloads

Photo by Tim Gouw on Unsplash

It’s time to run our workloads! We are interested in running our search workloads on a running OpenSearch cluster. I added a few parameters to the execute_test command:

Distribution-version — Make sure to add your correct OpenSearch version.

Workload — Our workload name.

Other parameters are available. I added the pipeline, client-options, and on-error, which simplifies the whole process.

Go ahead and run the following commands, which will run our workloads:

opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_apu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"

opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_cpu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"

And now we wait…

Bonus benchmark: I was interested to see the results on an Arm-based Amazon Graviton2 processor, so I ran the same exact process on an r6g.8xlarge EC2 as well.

Our results should look like the following:

_laion_text_apu (_APU) results

m5.4xlarge _(C_PU) results

r6g.8xlarge _(C_PU) results

Step 5: Compare our Results

We are finally ready to look at our test results. Drumroll, please… 🥁

First, we noticed the running times of each workload were different. The m5.4xlarge workload took 9 hours, and the r6g.8xlarge workload took 6.96 hours, while the APU workload took 2.78 minutes. This is because the APU also supports query aggregation, allowing for greater throughput.

Now, we want a more comprehensive comparison between our workloads. OpenSearch Benchmark enables us to generate a CSV file where we can scompare between workloads easily.

First, we will need to find the workload IDs for each case. This can be done by either looking in the OpenSearch benchmark-test-executions index (which was created in step 2) or in the benchmarks folder:

Using the workloads IDs, run the following command to compare two workloads and display the output in a CSV file:

opensearch-benchmark compare --results-format=csv --show-in-results=search --results-file=data.csv --baseline=ecb4af7a-d53c-4ac3-9985-b5de45daea0d --contender=b714b13a-af8e-4103-a4c6-558242b8fe6a

Here’s a short summary comparing three of our workload results:

Image by the author

A brief explanation of the results in the table:

  1. Throughput: The number of operations that OpenSearch can perform within a certain period, usually per second.

  2. Latency: The time between submitting a request and receiving the complete response. It also includes wait time, i.e., the time the request spends waiting until it is ready to be serviced by OpenSearch.

  3. Service time: The time between sending a request and receiving the corresponding response. This metric can easily be confused with latency but does not include waiting time. This is what most load testing tools incorrectly refer to as “latency.”

  4. Test execution time: The total runtime from starting the workload until completion.


When looking at our results, we can see that the service time for the APU workload was 197 times faster than the m5.4xlarge workload and 151 times faster then the r6g.8xlarge. From a cost perspective, running the same workload on the APU costs $0.23 as opposed to $8.87 on the m5.4xlarge (38 times less expensive) and $13.02 on the r6g.8xlarge (56 times less expensive), and we got our search results almost 9 hours (m5.4xlarge) and 6.91 hours (r6g.8xlarge) earlier.

Now, imagine the magnitude of these benefits when scaling to even larger datasets, which is likely to be the case in our data-driven, fast-paced world.

I hope this helped you understand more about the power of OpenSearch’s benchmarking tool and how you can use it to benchmark your search performance.

For more information about’s plugin and the APU, please visit They even offer a free trial!

· 6 min read
Pat Lasserre

Photo by Nathan Dumlao on Unsplash

Natural language processing (NLP) is a major part of search — so much so that it is even being used in image search applications.

For example, Google said, when talking about its MUM model, “Eventually, you might be able to take a photo of your hiking boots and ask, ‘can I use these to hike Mt. Fuji?’ MUM would understand the image and connect it with your question to let you know your boots would work just fine. It could then point you to a blog with a list of recommended gear.” This makes MUM multimodal because it understands both text and images.

In this post, I’ll show how vector embeddings outperform keyword search for multimodal text-to-image search. I’ll also discuss a solution that allows you to leverage your existing OpenSearch installation to quickly and easily create a text-to-image search application.

Previously, when using text to search for relevant images, one would perform keyword search using the image captions to compare against the text query. This meant the image itself wasn’t even being used in the search.

One problem with this is there could be relevant images that don’t have captions. This could result in the images not being returned as candidates, even though they are relevant.

Another problem with keyword search is it could omit images with captions that don’t share many keywords with the query but are in fact relevant images. This could impact business in e-commerce applications because sellers often don’t enter the most descriptive text, so even if their item is exactly what the buyer is looking for, it might not be returned as a candidate.

Also, as shown in this post, keyword search has limited understanding of user intent and could return irrelevant images even if there are “multiple matching terms between the query and the result.” As shown below, it incorrectly returned an image where the caption matched the keywords eating fish, but it missed the main search term bear.

Query: A bear eating fish by a river

Result: heron eating fish

An irrelevant search result returned using keyword search for the query “A bear eating fish by a river.” Source

To address the previously mentioned keyword search limitations, we can use a multilingual CLIP model to generate vector embeddings. CLIP was created by OpenAI, and they state that it “efficiently learns visual concepts from natural language supervision.” Basically, CLIP maps text and images to the same embedding space where they can be compared for similarity.

As we discussed in a previous post, vector embeddings better understand the searcher’s intent and the contextual meaning of the query. Instead of simply matching the keywords, it takes into consideration what the words mean and not just the words themselves.

An example of that can be seen in the image below. In this case, vector embeddings were used instead of keywords. The same query about a bear eating a fish was used, but unlike the keyword approach that returned an irrelevant image, vector embeddings returned a relevant image.

A relevant search result using vector embeddings for the query “nehir kenarında balık yiyen ayı” (a bear eating fish by a river — Turkish)

Not only did vector embedding return a relevant image, but the vector embeddings approach also showed that it understands multiple languages, in this example, Turkish.

Vector embeddings can also improve recall. Recall is important because it can impact a company’s business. For example, in e-commerce, sellers often either don’t enter very descriptive text, don’t use the right keywords, or they might enter incorrect text descriptions. In these cases, keyword search could prevent a product from being returned as a match, even if it actually is. This means a missed business opportunity for the seller.

Vector embeddings address this recall issue because even though the text descriptions were poor in those examples above, if there were relevant images that went with them, the vector embeddings of the images would allow those images to be returned as matches. Thus, the seller is no longer penalized for entering poor product descriptions, or even no descriptions.

Easily Add Vector Embedding Search to Your OpenSearch

As we wrote about in this post, GSI Technology’s OpenSearch k-NN plugin allows users to easily add production-grade vector embedding search to their search pipeline. They can leverage their current OpenSearch installation rather than having to learn new software for one of the other vector search options out there. This saves them valuable time and resources.

Dmitry Kan and Aarne Talman recently published a great blog post where they explained how they used our OpenSearch k-NN plugin as part of their search stack to easily create a text-to-image search application.

In addition to saving developers valuable time and resources, our OpenSearch k-NN plugin allows for billion-scale neural search and addresses one of the key limitations of native OpenSearch — namely it’s lack of pre-filter support for nearest neighbor vector search.

Pre-filtering on metadata is used in many search applications. For example, product metadata such as item description, item title, category, color, or brand are often used as pre-filters to a search query.

The OpenSearch website states: “Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.” This means that native OpenSearch only supports post-filtering of the approximate nearest neighbor results and doesn’t support pre-filtering

As mentioned in one of our previous posts, post-filtering is problematic because it has a high likelihood of returning far fewer results than the intended k-nearest neighbors. In fact, it could lead to zero results being returned. This leads to an unsatisfying user experience since very few, or no, relevant results might be returned for a particular search query.

GSI’s OpenSearch plugin supports pre-filtering, and even supports range filtering. For example, if somebody was searching for shirts, in addition to using common filters such as brand, style, size, and color, they could also add a range filter, for example, to limit the search to shirts in the range between $55 and $85.


This post showed some of the advantages of vector embedding search over keyword search — for example, better understanding user intent and improving recall in e-commerce applications where sellers either don’t enter very descriptive text, don’t use the right keywords, or enter incorrect text descriptions. Ultimately, these vector embedding advantages lead to improved business for sellers.

We also presented our OpenSearch k-NN plugin that allows users to easily add production-grade vector embedding search to their search pipeline — saving them valuable time and resources. The plugin also provides billion-scale search along with strong filtering capability.

If you want to try out our OpenSearch k-NN plugin, please contact us at