Build Production-Ready RAG Systems with PostgreSQL

You've built an amazing AI chatbot using GPT or Claude, but there's one glaring problem: when a customer asks about your new pricing plans launched yesterday, it says "I don't have that information." Meanwhile, the answer sits right there in your product documentation, sitting unused in your docs.

Your users ask perfectly reasonable questions about your business, and your AI responds with "I don't have information about that" or worse—confidently makes something up entirely, leading to confused customers and support tickets.

This exact scenario plays out everywhere. This is exactly why Retrieval-Augmented Generation (RAG) has become the go-to solution for developers building practical AI applications that need to work with real, current data.

What Is RAG and Why Should You Care?

RAG is a technique that combines the reasoning power of large language models with the ability to retrieve and use specific, up-to-date information from your own data sources. Instead of relying solely on what the model learned during training, RAG lets your AI pull relevant context from documents, databases, or any text-based knowledge base in real-time.

Think of it as giving your AI a research assistant that can instantly find and reference the exact information needed to answer questions accurately—no more hallucinations about non-existent features or outdated policies.

The Problem RAG Solves

Large language models are incredible at understanding and generating human-like text, but they have three fundamental limitations that make them impractical for business applications:

Knowledge Cutoff: They only know information from their training data, which could be months or years old.

Hallucination: They sometimes generate plausible-sounding but completely incorrect information when they don't know the answer.

Static Knowledge: They can't access real-time or private company data like your internal documentation, customer records, or recent product updates.

RAG addresses all three issues by grounding the model's responses in actual, retrievable documents from your knowledge base.

How RAG Works: The Simple Version

Here's the RAG pipeline broken down into digestible steps:

Preparation Phase: Convert your documents into searchable chunks and store them with embeddings in a vector database.
Query Time: When a user asks a question, find the most relevant document chunks using semantic similarity.
Augmentation: Inject those chunks into the language model's prompt as context.
Generation: Let the model generate an answer based on the retrieved context.

The key advantage is that your AI now has access to your latest information without expensive retraining. Now let's build the complete system.

Building Your First RAG System

Let's build a production-ready RAG system using PostgreSQL with pgvector—a robust, scalable solution that integrates well with existing infrastructure.

Prerequisites and Setup

What You'll Need:

Node.js 18+
PostgreSQL 14+ with pgvector extension
OpenAI API key (costs: ~$0.10 per 1M embedding tokens, ~$30 per 1M chat tokens)
30 minutes setup time

Install Dependencies:

npm install openai pg @langchain/openai @langchain/community
npm install @langchain/postgres pgvector-node
npm install --save-dev @types/node @types/pg typescript ts-node

Database Setup:

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table for document chunks
CREATE TABLE document_chunks (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536), -- OpenAI embedding dimensions
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create index for fast similarity search
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops);

Step 1: Environment and Database Setup

import { OpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/postgres";
import { RetrievalQAChain } from "langchain/chains";
import { Pool } from 'pg';

// Environment validation
if (!process.env.OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY environment variable is required');
}

if (!process.env.DATABASE_URL) {
  throw new Error('DATABASE_URL environment variable is required');
}

// Database connection
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

// Test database connection
try {
  await pool.query('SELECT NOW()');
  console.log('Database connected successfully');
} catch (error) {
  console.error('Database connection failed:', error);
  process.exit(1);
}

Step 2: Document Processing and Intelligent Chunking

The key to effective RAG is breaking your documents into meaningful chunks that preserve context while staying within token limits.

interface Document {
  content: string;
  metadata?: Record<string, any>;
}

class DocumentProcessor {
  private textSplitter: RecursiveCharacterTextSplitter;

  constructor() {
    // Optimized chunking strategy
    this.textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,        // Sweet spot for most content
      chunkOverlap: 200,      // Preserve context between chunks
      separators: ["\n\n", "\n", ". ", "! ", "? ", "; ", " ", ""]
    });
  }

  async processDocuments(documents: string[]): Promise<string[]> {
    const chunks: string[] = [];

    for (const doc of documents) {
      try {
        const docChunks = await this.textSplitter.splitText(doc);
        chunks.push(...docChunks);
      } catch (error) {
        console.error('Error processing document:', error);
      }
    }

    console.log(`Processed ${documents.length} documents into ${chunks.length} chunks`);
    return chunks;
  }

  // Specialized chunking for different content types
  createTechnicalDocsChunker(): RecursiveCharacterTextSplitter {
    return new RecursiveCharacterTextSplitter({
      chunkSize: 800,
      chunkOverlap: 100,
      separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
    });
  }

  createConversationalChunker(): RecursiveCharacterTextSplitter {
    return new RecursiveCharacterTextSplitter({
      chunkSize: 1200,
      chunkOverlap: 200,
      separators: ["\n\n", "\n", ". ", "! ", "? ", " "]
    });
  }
}

Step 3: Vector Storage and Retrieval

class RAGSystem {
  private embeddings: OpenAIEmbeddings;
  private vectorStore: PGVectorStore;
  private llm: OpenAI;

  constructor() {
    this.embeddings = new OpenAIEmbeddings({
      modelName: "text-embedding-3-small",
    });

    this.llm = new OpenAI({
      modelName: "gpt-4-turbo-preview",
      temperature: 0.1, // Lower for more factual responses
    });
  }

  async initialize() {
    this.vectorStore = await PGVectorStore.initialize(
      this.embeddings,
      {
        postgresConnectionOptions: {
          connectionString: process.env.DATABASE_URL,
        },
        tableName: "document_chunks",
      }
    );
  }

  async indexDocuments(chunks: string[]) {
    await this.vectorStore.addDocuments(
      chunks.map(chunk => ({ pageContent: chunk, metadata: {} }))
    );
    console.log(`Indexed ${chunks.length} chunks`);
  }

  async query(question: string): Promise<string> {
    // Retrieve relevant chunks
    const relevantDocs = await this.vectorStore.similaritySearch(question, 4);

    // Build context from retrieved documents
    const context = relevantDocs
      .map(doc => doc.pageContent)
      .join("\n\n");

    // Generate response with context
    const prompt = `Based on the following context, answer the question.

Context:
${context}

Question: ${question}

Answer:`;

    const response = await this.llm.invoke(prompt);
    return response;
  }
}

Production Considerations

Error Handling and Fallbacks

Always implement graceful degradation when the retrieval system fails:

async query(question: string): Promise<string> {
  try {
    const docs = await this.vectorStore.similaritySearch(question, 4);

    if (docs.length === 0) {
      return "I couldn't find specific information about that in our documentation. Please contact support for assistance.";
    }

    // Continue with normal RAG flow...
  } catch (error) {
    console.error('RAG query failed:', error);
    return "I'm having trouble accessing our knowledge base right now. Please try again later.";
  }
}

Monitoring and Observability

Track these metrics in production:

Query latency (retrieval + generation)
Retrieval relevance scores
Token usage and costs
User feedback on answer quality

Common Pitfalls to Avoid

Chunks too large: LLMs struggle with very long contexts. Keep chunks under 1000 tokens.
No overlap: Without overlap, you lose context at chunk boundaries.
Ignoring metadata: Use metadata for filtering (date, category, source) to improve relevance.
Single retrieval strategy: Combine semantic search with keyword matching for best results.

Next Steps

This foundation gives you a working RAG system. To take it further:

Add hybrid search (semantic + keyword) for better retrieval
Implement reranking to improve result quality
Add caching for frequently asked questions
Build evaluation pipelines to measure and improve accuracy

RAG transforms your AI from a generic chatbot into a knowledgeable assistant that can actually help with your specific business questions. The investment in building it right pays dividends in user satisfaction and reduced support burden.

What Is RAG and Why Should You Care?

The Problem RAG Solves

Large language models are incredible at understanding and generating human-like text, but they have three fundamental limitations that make them impractical for business applications:

Knowledge Cutoff: They only know information from their training data, which could be months or years old.

Hallucination: They sometimes generate plausible-sounding but completely incorrect information when they don't know the answer.

Static Knowledge: They can't access real-time or private company data like your internal documentation, customer records, or recent product updates.

RAG addresses all three issues by grounding the model's responses in actual, retrievable documents from your knowledge base.

How RAG Works: The Simple Version

Here's the RAG pipeline broken down into digestible steps:

Preparation Phase: Convert your documents into searchable chunks and store them with embeddings in a vector database.
Query Time: When a user asks a question, find the most relevant document chunks using semantic similarity.
Augmentation: Inject those chunks into the language model's prompt as context.
Generation: Let the model generate an answer based on the retrieved context.

The key advantage is that your AI now has access to your latest information without expensive retraining. Now let's build the complete system.

Building Your First RAG System

Let's build a production-ready RAG system using PostgreSQL with pgvector—a robust, scalable solution that integrates well with existing infrastructure.

Prerequisites and Setup

What You'll Need:

Node.js 18+
PostgreSQL 14+ with pgvector extension
OpenAI API key (costs: ~$0.10 per 1M embedding tokens, ~$30 per 1M chat tokens)
30 minutes setup time

Install Dependencies:

npm install openai pg @langchain/openai @langchain/community
npm install @langchain/postgres pgvector-node
npm install --save-dev @types/node @types/pg typescript ts-node

Database Setup:

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table for document chunks
CREATE TABLE document_chunks (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536), -- OpenAI embedding dimensions
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create index for fast similarity search
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops);

Step 1: Environment and Database Setup

import { OpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/postgres";
import { RetrievalQAChain } from "langchain/chains";
import { Pool } from 'pg';

// Environment validation
if (!process.env.OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY environment variable is required');
}

if (!process.env.DATABASE_URL) {
  throw new Error('DATABASE_URL environment variable is required');
}

// Database connection
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

// Test database connection
try {
  await pool.query('SELECT NOW()');
  console.log('Database connected successfully');
} catch (error) {
  console.error('Database connection failed:', error);
  process.exit(1);
}

Step 2: Document Processing and Intelligent Chunking

The key to effective RAG is breaking your documents into meaningful chunks that preserve context while staying within token limits.

interface Document {
  content: string;
  metadata?: Record<string, any>;
}

class DocumentProcessor {
  private textSplitter: RecursiveCharacterTextSplitter;

  constructor() {
    // Optimized chunking strategy
    this.textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,        // Sweet spot for most content
      chunkOverlap: 200,      // Preserve context between chunks
      separators: ["\n\n", "\n", ". ", "! ", "? ", "; ", " ", ""]
    });
  }

  async processDocuments(documents: string[]): Promise<string[]> {
    const chunks: string[] = [];

    for (const doc of documents) {
      try {
        const docChunks = await this.textSplitter.splitText(doc);
        chunks.push(...docChunks);
      } catch (error) {
        console.error('Error processing document:', error);
      }
    }

    console.log(`Processed ${documents.length} documents into ${chunks.length} chunks`);
    return chunks;
  }

  // Specialized chunking for different content types
  createTechnicalDocsChunker(): RecursiveCharacterTextSplitter {
    return new RecursiveCharacterTextSplitter({
      chunkSize: 800,
      chunkOverlap: 100,
      separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
    });
  }

  createConversationalChunker(): RecursiveCharacterTextSplitter {
    return new RecursiveCharacterTextSplitter({
      chunkSize: 1200,
      chunkOverlap: 200,
      separators: ["\n\n", "\n", ". ", "! ", "? ", " "]
    });
  }
}

Step 3: Vector Storage and Retrieval

class RAGSystem {
  private embeddings: OpenAIEmbeddings;
  private vectorStore: PGVectorStore;
  private llm: OpenAI;

  constructor() {
    this.embeddings = new OpenAIEmbeddings({
      modelName: "text-embedding-3-small",
    });

    this.llm = new OpenAI({
      modelName: "gpt-4-turbo-preview",
      temperature: 0.1, // Lower for more factual responses
    });
  }

  async initialize() {
    this.vectorStore = await PGVectorStore.initialize(
      this.embeddings,
      {
        postgresConnectionOptions: {
          connectionString: process.env.DATABASE_URL,
        },
        tableName: "document_chunks",
      }
    );
  }

  async indexDocuments(chunks: string[]) {
    await this.vectorStore.addDocuments(
      chunks.map(chunk => ({ pageContent: chunk, metadata: {} }))
    );
    console.log(`Indexed ${chunks.length} chunks`);
  }

  async query(question: string): Promise<string> {
    // Retrieve relevant chunks
    const relevantDocs = await this.vectorStore.similaritySearch(question, 4);

    // Build context from retrieved documents
    const context = relevantDocs
      .map(doc => doc.pageContent)
      .join("\n\n");

    // Generate response with context
    const prompt = `Based on the following context, answer the question.

Context:
${context}

Question: ${question}

Answer:`;

    const response = await this.llm.invoke(prompt);
    return response;
  }
}

Production Considerations

Error Handling and Fallbacks

Always implement graceful degradation when the retrieval system fails:

async query(question: string): Promise<string> {
  try {
    const docs = await this.vectorStore.similaritySearch(question, 4);

    if (docs.length === 0) {
      return "I couldn't find specific information about that in our documentation. Please contact support for assistance.";
    }

    // Continue with normal RAG flow...
  } catch (error) {
    console.error('RAG query failed:', error);
    return "I'm having trouble accessing our knowledge base right now. Please try again later.";
  }
}

Monitoring and Observability

Track these metrics in production:

Query latency (retrieval + generation)
Retrieval relevance scores
Token usage and costs
User feedback on answer quality

Common Pitfalls to Avoid

Chunks too large: LLMs struggle with very long contexts. Keep chunks under 1000 tokens.
No overlap: Without overlap, you lose context at chunk boundaries.
Ignoring metadata: Use metadata for filtering (date, category, source) to improve relevance.
Single retrieval strategy: Combine semantic search with keyword matching for best results.

Next Steps

This foundation gives you a working RAG system. To take it further:

Add hybrid search (semantic + keyword) for better retrieval
Implement reranking to improve result quality
Add caching for frequently asked questions
Build evaluation pipelines to measure and improve accuracy

What Is RAG and Why Should You Care?

The Problem RAG Solves

How RAG Works: The Simple Version

Building Your First RAG System

Prerequisites and Setup

Step 1: Environment and Database Setup

Step 2: Document Processing and Intelligent Chunking

Step 3: Vector Storage and Retrieval

Production Considerations

Error Handling and Fallbacks

Monitoring and Observability

Common Pitfalls to Avoid

Next Steps

Related Articles

Your AI Can Explain Memes, But It Doesn't Think They're Funny

Build Production-Ready RAG Systems with PostgreSQL

What Is RAG and Why Should You Care?

The Problem RAG Solves

How RAG Works: The Simple Version

Building Your First RAG System

Prerequisites and Setup

Step 1: Environment and Database Setup

Step 2: Document Processing and Intelligent Chunking

Step 3: Vector Storage and Retrieval

Production Considerations

Error Handling and Fallbacks

Monitoring and Observability

Common Pitfalls to Avoid

Next Steps

Related Articles

Your AI Can Explain Memes, But It Doesn't Think They're Funny