TensorFlow, PostgreSQL®, PGVector & Next.js: building a movie recommender
Leveraging TensorFlow, PostgreSQL®, PGVector, and Next.js for vector search with this step-by-step video guide.
Here you'll find the instructions to build a movie recommendation system. Each step has a corresponding video that shows in detail what needs to be done. The complete working project can be found in the GitHub repository.
Step 1. Create the vector embeddings: TensorFlow universal-sentence-encoder and Node.js
Dataset
You'll find the original dataset in Kaggle. The dataset contains metadata about a movie (title, release year, etc) as well as descriptions of the movies from Wikipedia. It is in CSV format, however, we'll be working with JSON. You can download the dataset in JSON format dataset here.
New to Node.js?
Download and install Node.js here.
Add dependencies
Install dependencies for TensorFlow. Make sure that the path does not include spaces or special characters (tfjs-node
is very picky):
npm install @tensorflow-models/universal-sentence-encoder --save
npm install @tensorflow/tfjs-node --save
Installing these in order is important, otherwise you might have peer-dependency issues.
Add the encoder
In the root project directory, create the encoder.js file.
Include these dependencies:
const fs = require("fs"); require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder'); const moviePlots = require("./movie-plots.json");
Add code to get embeddings for a single movie:
use.load().then(async model => { const sampleMoviePlot = moviePlots[0]; const embeddings = await model.embed(sampleMoviePlot['Plot']); console.log(embeddings.arraySync()); });
Run:
node encoder.js
Note: though we don't use the output from require('@tensorflow/tfjs-node');
directly, do not remove this line, as TensorFlow needs it to work correctly.
Step 2. Free PostgreSQL setup: create Table, enable PGVector
Create service pg-movie-app
To host your PostgreSQL service for free in the cloud, use Aiven for PostgreSQL®. To get an extra 100$ credits when signing up with Aiven, use this link.
Test with pgAdmin
To use Aiven for PostgreSQL with pgAdmin, click on Quick Connect and choose Connect with pgAdmin. You'll see the steps that you need to perform and a link to download the pgConnect.json file. Open pgAdmin, import a new server and select downloaded pgConnect.json.
Enable PGVector:
CREATE EXTENSION vector;
Create a table:
CREATE TABLE movie_plots ( title VARCHAR, director VARCHAR, "cast" VARCHAR, genre VARCHAR, plot TEXT, "year" SMALLINT, wiki VARCHAR, embedding vector(512) );
Connect with Node.js
Install node-postgres
:
npm install pg --save
Install dotenv
to store credentials:
npm install dotenv --save
Create an .env file and add the following connection information:
PG_NAME= PG_PASSWORD= PG_HOST= PG_PORT=
Download the ca.pem certificate from the Aiven console.
Add both .env and ca.pem to .gitignore.
Send a request to PostgreSQL from Node.js
In encoder.js include:
require('dotenv').config();
and
const pg = require('pg');
Add the PostgreSQL connection configuration as well:
const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: fs.readFileSync('./ca.pem').toString(), }, };
Create the client, connect it to PostgreSQL and send a test SQL request:
const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT count(*) FROM movie_plots`); console.log(pgResponse.rows); } catch (err) { console.error(err); } finally { await client.end(); }
Step 3. Efficiency: Batch TensorFlow vector generation and data insertion with pg-promise multiple rows
Add pg-promise
To generate and send a multi-row insert query, we'll use pg-promise. Install it with:
npm install pg-promise --save
Include pg-promise in encoder.js:
const pgp = require('pg-promise')({ capSQL: true // capitalize all generated SQL }); const db = pgp(config);
Add the following code to send a multi-row insert query to PostgreSQL:
const storeInPG = async (moviePlots) => { const columns = new pgp.helpers.ColumnSet(['title', 'director', 'plot', 'year', 'wiki', 'cast', 'genre', 'embedding'], {table: 'movie_plots'}); const values = []; for(let i = 0; i < moviePlots.length; i++) { values.push({ title: moviePlots[i]['Title'], director: moviePlots[i]['Director'], plot: moviePlots[i]['Plot'], year: moviePlots[i]['Release Year'], cast: moviePlots[i]['Cast'], genre: moviePlots[i]['Genre'], wiki: moviePlots[i]['Wiki Page'], embedding: `[${moviePlots[i]['embedding']}]` }) } const query = pgp.helpers.insert(values, columns); await db.none(query); }
db.none executes a query that expects no data to be returned.
TensorFlow and batch processing
Next load the model and iterate over all movies to get encodings with TensorFlow.
We'll divide data into batches for faster processing:
use.load().then(async model => { const batchSize = 1000; for (let start = 0; start < moviePlots.length; start += batchSize) { const end = Math.min(start + batchSize, moviePlots.length); console.log(`Processing items from ${start} till ${end}.`); const movieBatch = moviePlots.slice(start, end); const plotDescriptions = movieBatch.map(plot => plot['Plot']); const embeddingsRequest = await model.embed(plotDescriptions); const embeddings = embeddingsRequest.arraySync(); for (let i = 0; i < movieBatch.length; i++) { movieBatch[i]['embedding'] = embeddings[i]; } await storeInPG(movieBatch); } });
Send the complete dataset with embeddings to PostgreSQL
To execute the code that we wrote and send data to PostgreSQL, run:
node encoder.js
Step 4. Contextual Search with PGVector: Node.js and TensorFlow Magic
Build recommendation logic
Create the recommender.js file and include dependencies:
require('dotenv').config(); const fs = require('fs'); const pg = require('pg'); require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder');
Connect to PostgreSQL:
const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: fs.readFileSync('./ca.pem').toString(), }, };
We'll be looking for "a lot of cute puppies". Generate an embedding for the test string and use PGVector to find the closest suggestions among the movies we have in the database:
use.load().then(async model => { const embeddings = await model.embed("a lot of cute puppies"); const embeddingArray = embeddings.arraySync()[0]; const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`); console.log(pgResponse.rows); } catch (err) { console.error(err); } finally { await client.end() } });
Run to get the results:
node recommender.js
Step 5. Next.js project setup: PostgreSQL and TensorFlow dependencies, test the backend
Get started with Next.js project
Find more about Next.js at https://nextjs.org/. Create a project with:
npx create-next-app@latest
We'll be using following settings:
What is your project named? what-to-watch Would you like to use TypeScript? No / *Yes* Would you like to use ESLint? *No* / Yes Would you like to use Tailwind CSS? No / *Yes* Would you like to use `src/` directory? *No* / Yes Would you like to use App Router? (recommended) *No* / Yes Would you like to customize the default import alias? *No* / Yes
Once the project is installed navigate to the folder where it is located, or open it in your preferred IDE.
Add dependencies
Before we can use TensorFlow and PostgreSQL, we need to install them:
npm install @tensorflow-models/universal-sentence-encoder --save
npm install @tensorflow/tfjs-node --save
npm install pg --save
Additionally, add a dependency for dotenv, to simplify the work with credentials:
npm install dotenv --save
Add PostgreSQL credentials
Create a .env file and add following placeholders for the properties that we need to define:
PG_NAME= PG_PASSWORD= PG_HOST= PG_PORT=
Go to the service page of your Aiven for PostgreSQL and copy User, Password, Host and Port from the tab with the connection information and add them to the appropriate fields above.
Download ca.pem and add it to a folder /certificates
Add both .env and /certificates to .gitignore.
.env /certificates
Run
Start the server with:
npm dev run
Open localhost:3000 to see the landing page. Open localhost:3000/api/hello
to see a test backend api call.
Step 6. Nearest vector retrieval: TensorFlow universal-sentence-encoder and PGVector-powered queries in Next.js
Add an interface for a movie
Declare a movie type by creating movie.d.ts and adding the following:
declare type Movie = { title: string, director: string, cast: string, genre: string, plot: string, year: number, wiki: string, embedding: number[] } export default Movie;
Add backend calls
Rename existing the pages/api/hello.ts API Route to pages/api/recommendations.ts.
Add dependencies to pages/api/recommendations.ts:
const {readFileSync} = require('fs'); const pg = require('pg'); const tf = require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder');
Create the connection configuration for PostgreSQL:
const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: readFileSync('./certificates/ca.pem').toString(), }, };
Add a handler to process the requests:
export default async function handler( req: NextApiRequest, res: NextApiResponse<Movie[]> ) { const model = await use.load(); const embeddings = await model.embed(req.body.search); const embeddingArray = embeddings.arraySync()[0]; const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`); res.status(200).json(pgResponse.rows) } catch (err) { console.error(err); } finally { await client.end() } }
Step 7. Frontend integration: Next.js movie recommender UI and calls to TensorFlow and PostgreSQL
Open pages/index.tsx and delete the existing layout and dependencies - we won't need them. Instead, add this code to connect to the API Route /api/recommendations :
const [moviePlots, setMoviePlots] = useState < Movie[] > ([]) const searchInput = useRef(); function search(event) { event.preventDefault(); const enteredSearch = searchInput.current.value; fetch('/api/recommendations', { method: 'POST', body: JSON.stringify({ search: enteredSearch }), headers: { 'Content-Type': 'application/json' } }).then(response => response.json()).then(data => { setMoviePlots(data); }); }
Add a simple layout to input a search phrase and see the results:
return ( <> <form onSubmit={search}> <input type="search" id="default-search" ref={searchInput} autoComplete="off" placeholder="Type what do you want to watch about" required/> <button type="submit"> Search </button> </form> <div> { moviePlots.map(item => <div key={item.title}> {item.director} {item.year} item.title} {item.wiki} </div>)} </div> </> )
Step 8. Polishing and testing: styling the movie recommender UI with the Tailwind CSS framework
We'll add some styling with Tailwind CSS.
Find tailwind.config.ts in your Next.js project and update it with:
module.exports = { content: [ './pages/**/*.{js,ts,jsx,tsx,mdx}', './components/**/*.{js,ts,jsx,tsx,mdx}', './app/**/*.{js,ts,jsx,tsx,mdx}', ], theme: { extend: { colors: { veryDarkBlue: '#1B262C', darkBlue: '#0F4C75', lightBlue: '#3282B8', veryLightBlue: '#BBE1FA', }, fontFamily: { sans: ['Poppins', 'sans-serif'] }, spacing: { }, }, }, plugins: [], }
In index.tsx Replace the form element with the section:
<section id="shorten"> <div className="max-w-4xl mx-auto p-6 space-y-6"> <form onSubmit={search}> <label htmlFor="default-search" className="mb-2 text-sm font-medium sr-only text-white">Search</label> <div className="relative"> <div className="absolute inset-y-0 left-0 flex items-center pl-3 pointer-events-none"> <svg className="w-4 h-4 text-gray-400" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 20 20"> <path stroke="currentColor" strokeLinecap="round" strokeLinejoin="round" strokeWidth="2" d="m19 19-4-4m0-7A7 7 0 1 1 1 8a7 7 0 0 1 14 0Z"/> </svg> </div> <input type="search" id="default-search" ref={searchInput} autoComplete="off" className="block w-full p-4 pl-10 text-sm border rounded-lg bg-gray-700 border-gray-600 placeholder-gray-400 text-white focus:ring-blue-500 focus:border-blue-500" placeholder="Type what do you want to watch about" required/> <button type="submit" className="text-white absolute right-2.5 bottom-2.5 focus:ring-4 focus:outline-none font-medium rounded-lg text-sm px-4 py-2 bg-lightBlue hover:bg-darkBlue focus:ring-blue-800">Search </button> </div> </form> </div> </section>
To style the list of the movies and add a loading indicator, replace the existing movie list with:
<div className="flex gap-8 flex-wrap flex-col grow shrink items-start mx-24"> {isLoading ? (<div className="flex justify-center items-center h-32 w-32 mx-auto"> {/* Embedding the SVG loading indicator */} <svg className="animate-spin h-6 w-6 text-white" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" > <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" ></circle> <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" ></path> </svg> </div>) : moviePlots.map(item => <div key={item.title} className="relative p-10 rounded-xl binline-block justify-start rounded-lg shadow-[0_2px_15px_-3px_rgba(0,0,0,0.07),0_10px_20px_-2px_rgba(0,0,0,0.04)] bg-darkBlue items-start"> <div className="text-6xl absolute top-4 right-4 opacity-80">🍿</div> <div> <h4 className="opacity-90 text-xl">From {item.director}</h4> <p className="opacity-50 text-sm">Year {item.year}</p> </div> <h1 className="text-4xl mt-6">{item.title}</h1> <p className="relative mt-6 text opacity-80 italic"> {item.plot} </p> <div> <p className="opacity-50 text-sm mt-6"> <a href={item.wiki} className="underline decoration-transparent transition duration-300 ease-in-out hover:decoration-inherit" >{item.wiki}</a > </p> </div> </div>)} </div>
You can find the complete index.js in the github repository.