Build Production-Ready RAG Systems with PostgreSQL
Learn RAG (Retrieval-Augmented Generation) from scratch. Complete TypeScript implementation guide with working code examples, best practices, and production tips for building AI systems that access real-time data.
15 min read
AIRAGTypeScriptPostgreSQLTutorial
You've built an amazing AI chatbot using GPT or Claude, but there's one glaring problem: when a customer asks about your new pricing plans launched yesterday, it says "I don't have that information." Meanwhile, the answer sits right there in your product documentation, sitting unused in your docs.
Your users ask perfectly reasonable questions about your business, and your AI responds with "I don't have information about that" or worse—confidently makes something up entirely, leading to confused customers and support tickets.
This exact scenario plays out everywhere. This is exactly why Retrieval-Augmented Generation (RAG) has become the go-to solution for developers building practical AI applications that need to work with real, current data.
What Is RAG and Why Should You Care?
RAG is a technique that combines the reasoning power of large language models with the ability to retrieve and use specific, up-to-date information from your own data sources. Instead of relying solely on what the model learned during training, RAG lets your AI pull relevant context from documents, databases, or any text-based knowledge base in real-time.
Think of it as giving your AI a research assistant that can instantly find and reference the exact information needed to answer questions accurately—no more hallucinations about non-existent features or outdated policies.
The Problem RAG Solves
Large language models are incredible at understanding and generating human-like text, but they have three fundamental limitations that make them impractical for business applications:
Knowledge Cutoff: They only know information from their training data, which could be months or years old.
Hallucination: They sometimes generate plausible-sounding but completely incorrect information when they don't know the answer.
Static Knowledge: They can't access real-time or private company data like your internal documentation, customer records, or recent product updates.
RAG addresses all three issues by grounding the model's responses in actual, retrievable documents from your knowledge base.
How RAG Works: The Simple Version
Here's the RAG pipeline broken down into digestible steps:
- Preparation Phase: Convert your documents into searchable chunks and store them with embeddings in a vector database.
- Query Time: When a user asks a question, find the most relevant document chunks using semantic similarity.
- Augmentation: Inject those chunks into the language model's prompt as context.
- Generation: Let the model generate an answer based on the retrieved context.
The key advantage is that your AI now has access to your latest information without expensive retraining. Now let's build the complete system.
Building Your First RAG System
Let's build a production-ready RAG system using PostgreSQL with pgvector—a robust, scalable solution that integrates well with existing infrastructure.
Prerequisites and Setup
What You'll Need:
- Node.js 18+
- PostgreSQL 14+ with pgvector extension
- OpenAI API key (costs: ~$0.10 per 1M embedding tokens, ~$30 per 1M chat tokens)
- 30 minutes setup time
Install Dependencies:
npm install openai pg @langchain/openai @langchain/community
npm install @langchain/postgres pgvector-node
npm install --save-dev @types/node @types/pg typescript ts-node
Database Setup:
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table for document chunks
CREATE TABLE document_chunks (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI embedding dimensions
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops);
Step 1: Environment and Database Setup
import { OpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/postgres";
import { RetrievalQAChain } from "langchain/chains";
import { Pool } from 'pg';
// Environment validation
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is required');
}
if (!process.env.DATABASE_URL) {
throw new Error('DATABASE_URL environment variable is required');
}
// Database connection
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
// Test database connection
try {
await pool.query('SELECT NOW()');
console.log('Database connected successfully');
} catch (error) {
console.error('Database connection failed:', error);
process.exit(1);
}
Step 2: Document Processing and Intelligent Chunking
The key to effective RAG is breaking your documents into meaningful chunks that preserve context while staying within token limits.
interface Document {
content: string;
metadata?: Record<string, any>;
}
class DocumentProcessor {
private textSplitter: RecursiveCharacterTextSplitter;
constructor() {
// Optimized chunking strategy
this.textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Sweet spot for most content
chunkOverlap: 200, // Preserve context between chunks
separators: ["\n\n", "\n", ". ", "! ", "? ", "; ", " ", ""]
});
}
async processDocuments(documents: string[]): Promise<string[]> {
const chunks: string[] = [];
for (const doc of documents) {
try {
const docChunks = await this.textSplitter.splitText(doc);
chunks.push(...docChunks);
} catch (error) {
console.error('Error processing document:', error);
}
}
console.log(`Processed ${documents.length} documents into ${chunks.length} chunks`);
return chunks;
}
// Specialized chunking for different content types
createTechnicalDocsChunker(): RecursiveCharacterTextSplitter {
return new RecursiveCharacterTextSplitter({
chunkSize: 800,
chunkOverlap: 100,
separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
});
}
createConversationalChunker(): RecursiveCharacterTextSplitter {
return new RecursiveCharacterTextSplitter({
chunkSize: 1200,
chunkOverlap: 200,
separators: ["\n\n", "\n", ". ", "! ", "? ", " "]
});
}
}
Step 3: Vector Storage and Retrieval
class RAGSystem {
private embeddings: OpenAIEmbeddings;
private vectorStore: PGVectorStore;
private llm: OpenAI;
constructor() {
this.embeddings = new OpenAIEmbeddings({
modelName: "text-embedding-3-small",
});
this.llm = new OpenAI({
modelName: "gpt-4-turbo-preview",
temperature: 0.1, // Lower for more factual responses
});
}
async initialize() {
this.vectorStore = await PGVectorStore.initialize(
this.embeddings,
{
postgresConnectionOptions: {
connectionString: process.env.DATABASE_URL,
},
tableName: "document_chunks",
}
);
}
async indexDocuments(chunks: string[]) {
await this.vectorStore.addDocuments(
chunks.map(chunk => ({ pageContent: chunk, metadata: {} }))
);
console.log(`Indexed ${chunks.length} chunks`);
}
async query(question: string): Promise<string> {
// Retrieve relevant chunks
const relevantDocs = await this.vectorStore.similaritySearch(question, 4);
// Build context from retrieved documents
const context = relevantDocs
.map(doc => doc.pageContent)
.join("\n\n");
// Generate response with context
const prompt = `Based on the following context, answer the question.
Context:
${context}
Question: ${question}
Answer:`;
const response = await this.llm.invoke(prompt);
return response;
}
}
Production Considerations
Error Handling and Fallbacks
Always implement graceful degradation when the retrieval system fails:
async query(question: string): Promise<string> {
try {
const docs = await this.vectorStore.similaritySearch(question, 4);
if (docs.length === 0) {
return "I couldn't find specific information about that in our documentation. Please contact support for assistance.";
}
// Continue with normal RAG flow...
} catch (error) {
console.error('RAG query failed:', error);
return "I'm having trouble accessing our knowledge base right now. Please try again later.";
}
}
Monitoring and Observability
Track these metrics in production:
- Query latency (retrieval + generation)
- Retrieval relevance scores
- Token usage and costs
- User feedback on answer quality
Common Pitfalls to Avoid
- Chunks too large: LLMs struggle with very long contexts. Keep chunks under 1000 tokens.
- No overlap: Without overlap, you lose context at chunk boundaries.
- Ignoring metadata: Use metadata for filtering (date, category, source) to improve relevance.
- Single retrieval strategy: Combine semantic search with keyword matching for best results.
Next Steps
This foundation gives you a working RAG system. To take it further:
- Add hybrid search (semantic + keyword) for better retrieval
- Implement reranking to improve result quality
- Add caching for frequently asked questions
- Build evaluation pipelines to measure and improve accuracy
RAG transforms your AI from a generic chatbot into a knowledgeable assistant that can actually help with your specific business questions. The investment in building it right pays dividends in user satisfaction and reduced support burden.