Semantic product search with Aiven for AlloyDB Omni

Learn how to perform semantic search using Aiven for AlloyDB Omni. The vector embeddings for the search will be generated using SQL that calls out to VertexAI. The streamlit front-end will show search and insert functionalities in a webpage.

Semantic product search enables users to retrieve results based on the meaning of a sentence rather than the literal words contained in it. It's a great option for allowing users to discover products in a catalog by describing what they want to find, instead of having to guess the exact words used in the product description.

This tutorial walks you through how to perform semantic product search using Aiven for AlloyDB Omni and its native integration with AI tooling.
The embedding calculation, needed for the semantic search, will be done using the VertexAI textembedding-gecko model. The front-end will be developed with a streamlit application, used to display the search and insert functionalities in a webpage.

Create an Aiven for AlloyDB Omni service and enable the AI integration

The first step in the tutorial is to create an Aiven for AlloyDB Omni service. You can create an instance in Google Cloud, AWS or Azure, giving you the maximum flexibility of cloud deployment.

Create an Aiven for AlloyDB Omni service in the Aiven Console

You can review all the steps required to create the Aiven for AlloyDB Omni service in the dedicated Get Started page.

Once the service is up and running, connect to it with psql using the Service URI you can find in the service page of the Aiven Console.

psql [SERVICE_URI]

After connecting, enable the google_ml_integration and vector extensions with:

CREATE EXTENSION IF NOT EXISTS google_ml_integration; CREATE EXTENSION IF NOT EXISTS vector;

Define the Google credentials for the Vertex AI integration

With the extensions enabled, it's time to upload the Google service account key to be used for the Vertex AI connection.

If you don't already have a Google Cloud service account, follow the instructions at Create service accounts.

In the Aiven Console, navigate to the Aiven for AlloyDB Omni service created above and click on the Generative AI tab. There you can find the Google service account key section where you need to upload the Google credentials.

Google Service Account Key

You can find more information about how to retrieve the key in the dedicated Manage Google credential page.

Register the Vertex AI model

With the configuration in place, you now can register the model in Aiven for AlloyDB Omni. You can do it by executing the following SQL:

CALL google_ml.create_model( model_id => 'textembedding-gecko@001', model_provider => 'google', model_qualified_name => 'textembedding-gecko@001', model_type => 'text_embedding', model_auth_type => 'alloydb_service_agent_iam');

The above call registers the model and uses the authentication key defined in the previous step.

Create the data structures and upload the data

With the integration in place, you can then connect to Aiven for AlloyDB Omni and create the following table to store the product inventory.

CREATE TABLE PRODUCT ( ID SERIAL, URL TEXT, TITLE TEXT, IMAGES TEXT, DESCRIPTION TEXT, PRODUCT_ID bigint, SKU bigint, GTIN bigint, BRAND TEXT, PRICE decimal, CURRENCY TEXT, AVAILABILITY TEXT, UNIQ_ID TEXT, SCRAPED_AT TEXT, EMB REAL[768]);

The above table, called PRODUCTS contains a set of fields defining the product like ID, TITLE, DESCRIPTION, PRICE and IMAGES containing a set of product images delimited by ~.
It also contains a column EMB where you'll store the embeddings that will be calculated on the DESCRIPTION field using Vertex AI textembedding-gecko model.

You can either populate the data yourself, or, for example, use the Home Depot dataset from data.world. If you chose the latter, you can upload the products in the table with:

\copy PRODUCT (URL, TITLE, IMAGES, DESCRIPTION, PRODUCT_ID, SKU, GTIN, BRAND, PRICE, CURRENCY, AVAILABILITY, UNIQ_ID, SCRAPED_AT) FROM 'home_depot_data_1_2021_12.csv' DELIMITER ',' quote '"' HEADER csv;

Please note, that the home_depot_data_1_2021_12.csv file needs to be in the same folder where psql has been launched, or you will need to specify full path to the file.

Calculate embeddings for the existing data

The \copy command executed in the previous step populated all the columns apart from the EMB embedding column. To populate that you need to call the VertexAI textembedding-gecko model and retrieve the embeddings. This operation is not possible in standard PostgreSQL®, but AlloyDB Omni makes it available using the google_ml_integration extension you enabled earlier.

To get a sense of how to call the extension you can execute the following:

SELECT google_ml.embedding(model_id => 'textembedding-gecko@001', content=> 'Indoor lamp');

The above calls Vertex AI textembedding-gecko version 001 passing the content Indoor lamp and retrieving the embeddings.
The result will be an array of 768 numbers (exactly the size of the EMB array defined above), representing the prase Indoor lamp.

You can then use the same function to update the EMB column in PRODUCT table with:

UPDATE PRODUCT set EMB = google_ml.embedding(model_id => 'textembedding-gecko@001', content=> DESCRIPTION);

After running the above, you will have the EMB column populated with the embeddings representing the DESCRIPTION column.

Create a trigger to automatically calculate the embeddings in incoming data

You backfilled the EMP column for previously inserted data. What about handling newly inserted rows? For this you can create a trigger that, on every INSERT, calls VertexAI, retrieves the embeddings and stores them in the EMP column.

The first step to achieve this is to create a function to calculate the embeddings:

CREATE OR REPLACE FUNCTION embedding_trigger() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN NEW.EMB = calculate_embeddings(NEW.description); RETURN NEW; END; $$;

And then create a trigger to execute the function on every INSERT (you could similarly do the same for UPDATEs).

CREATE TRIGGER product_update_emb BEFORE INSERT ON product FOR EACH ROW EXECUTE PROCEDURE embedding_trigger();

If you now insert a new item in the PRODUCT table, the EMB column will be automatically populated. For instance:

INSERT INTO PRODUCT (TITLE, DESCRIPTION) VALUES ('New Lamp', 'New Lamp Description');

The next step is to perform semantic vector search, based on an input text. Let's assume the user made a query with the text Indoor Lamp.

You can retrieve the closest 10 items in the inventory with the following query:

SELECT title, substr(description, 0, 100) description, price FROM product ORDER by EMB::vector <-> google_ml.embedding(model_id => 'textembedding-gecko@001', content=> 'Indoor Lamp')::vector LIMIT 10;

In the above:

  • You are retrieving the title, the first 100 characters of the description and the price
  • The order by EMB::vector <-> google_ml.embedding performs a vector search. To learn more about vector search and optimization, look at the article Speed up PostgreSQL pgvector queries.

The results are:

title | description | price -------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+-------- New lamp | My new beautiful lamp | 150.0 Galvan 19 in. White LED Task Table Lamp | Design meets functionality in this Galvan table lamp. Perfect for task lighting, the lamp offers th | 25.55 15 in. Chandra Ivory Ceramic Table Lamp | Inspired by the soft elegance of a cottage along the coast, this contemporary table lamp adds a tou | 28.84 2-Light Iron Black Bath Light | This attractive bath light will look so wonderful inside your home. It features a beautiful iron bl | 119.99 Jane 26 in. Glass Table Lamp, Clear | Originally an icon of rustic charm, the French demijohn shape gets a modern update in sleek, clear | 62.56 Kayden 25. 5 in. Navy Blue Table Lamp with White Shade | Give your decor the perfect finishing touch with this Kayden Table Lamp. Pleated navy blue ceramic | 69.41 New Lamp | New Lamp Description | Polo 1-Light Brushed Nickel Pendant | The POLO pendant will light up your home with a retro industrial style. Featuring a wide bowl shade | 120.75 Hanging Globe 1-Light Satin Aluminum Pendant | The Sea Gull Lighting Hanging Globe 1-light indoor pendant in antique brass is the perfect way to a | 82.37 Happy Hedgehog 21.5 in. White, White Wash, Gray Accent Table Lamp | An adorable option for a child's room or nursery, this table lamp features a little hedgehog as the | 57.18 (10 rows)

Create a Web UI with streamlit

The last piece of the solution is to create a Web UI enabling users to perform semantic search and operators to add products to the inventory.

To showcase the functionality, you can build an app using streamlit with two pages, one dedicated to Search and one to Add items to the inventory.

Warning

The example presented here is an oversimplified prototype, not requiring any security login. For production workloads we strongly suggest you implement additional security layers.

After installing streamlit, you will need to create a main project folder, called demo-app. Within that folder create:

  • A requirements.txt file listing all the packages need for the applications, which are the following

    psycopg2-binary sqlalchemy google streamlit streamlit-aggrid
  • A secrets.toml containing the connection secrets to Aiven for AlloyDB Omni. Replace [HOSTNAME], [PORT], [USER], [PASSWORD] with the details you can find in the Aiven Console:

    [connections.postgresql] dialect = "postgresql" host = "[HOSTNAME]" port = "[PORT]" database = "defaultdb" username = "[USER]" password = "[PASSWORD]"

Create the semantic search page

To provide the semantic search functionality, create a page called Search.py with the following content:

import streamlit as st st.set_page_config(layout="wide") # Initialize connection. conn = st.connection("postgresql", type="sql") def search_in_db(query): df = conn.query(''' SELECT CASE WHEN position('~' IN images) = 0 THEN Images ELSE substr(images, 0, position('~' IN images)) END IMG_URL, title, price, substr(description, 0, 100) description FROM product order by EMB::vector <-> google_ml.embedding(model_id => 'textembedding-gecko@001', content=> :query)::vector limit 10''', params=query, ttl=10) st.data_editor( df, column_config={"img_url": st.column_config.ImageColumn( "Image", help="item preview", width="80px" )}) st.header("Semantic Product Search") col1, col2 = st.columns([6,1]) with col1: st.write("Powered by Aiven for AlloyDB Omni") with col2: st.image("https://aiven.shop/cdn/shop/files/aiven-logo_RGB.png", width=100) form = st.form(key="match") with form: query_str = st.text_input("Find Product") submit = st.form_submit_button("Search") if submit: search_in_db({"query":query_str})

Please note:

  • The SQL query retrieves the first image from the IMAGES column (separated by ~), alongside the title, price and truncated description. It's using the same syntax to perform embedding calculation and vector search that we defined in the previous section.
  • The query_str is collected in the form and passed to the search_in_db function

You can now run the application with:

streamlit run Search.py

The above command opens a new webpage where you can browse the inventory. Searching for knobs round will likely show you the following results:

Search result for round knobs

Create the add items page

Similarly we can create a page to add items to the inventory. To do so, create a folder called pages and, within it, a new file named Add.py with the following content:

import streamlit as st from sqlalchemy.sql import text st.set_page_config(layout="wide") conn = st.connection("postgresql", type="sql") st.header("Add new product to Inventory") col1, col2 = st.columns([6,1]) with col1: st.write("Powered by Aiven for AlloyDB Omni") with col2: st.image("https://aiven.shop/cdn/shop/files/aiven-logo_RGB.png", width=100) with st.form("New Product"): st.write("Add a new product") my_title = st.text_input('Title') my_description = st.text_input('Description') my_price = st.number_input('Price') my_image = st.text_input('Image Url') submit_button = st.form_submit_button('Submit item') if submit_button: with conn.session as s: try: s.execute( text('INSERT INTO product (title, description, price, images) VALUES (:title, :description, :price, :image);'), params={"title": my_title, "description": my_description, "price": my_price, "image": my_image}) s.commit() except Exception as e: print(e) finally: st.write("Saved")

Apart from decorative code, the main page functionality is contained in a single statement:

s.execute( text('INSERT INTO product (title, description, price, images) VALUES (:title, :description, :price, :image);'), params={"title": my_title, "description": my_description, "price": my_price, "image": my_image})

This inserts the data into Aiven for AlloyDB Omni. Be careful to check that the insert only sets the title, description, price and images columns. The EMB column containing the embedding will automatically be populated by the trigger defined above.

To test the new page you just have to refresh the web page opened before, after which you should see a new tab called Add. Clicking on it will take you to the form enabling you to add items to the inventory.

Add item page

It's time for testing now. Add items to the inventory and check if the semantic search results are influenced by the new products.

Conclusion

Semantic search is rapidly becoming a must have for any web facing tools. The ability to find content/products based on the semantics of the query rather than the actual wording enables a huge variety of users with different language skills to be able to find content rapidly and effectively.

While performing semantic search in the past required having separated compute instances to invoke AI models and retrieve embeddings, Aiven for AlloyDB Omni enables developers to provide semantic search functionality with only SQL statements and a basic front-end application. This improves the simplicity and robustness of any semantic search or AI driven application.

You can start trying out Aiven for AlloyDB Omni now, in the Aiven Console, or read more in the Aiven documentation.