Learn RAG (Retrieval-Augmented Generation) from scratch. Complete TypeScript implementation guide with working code examples, best practices, and production tips for building AI systems that access real-time data.
You've built an amazing AI chatbot using GPT or Claude, but there's one glaring problem: when a customer asks about your new pricing plans launched yesterday, it says "I don't have that information." Meanwhile, the answer sits right there in your product documentation, sitting unused in your docs.
Your users ask perfectly reasonable questions about your business, and your AI responds with "I don't have information about that" or worse—confidently makes something up entirely, leading to confused customers and support tickets.
This exact scenario plays out everywhere. This is exactly why Retrieval-Augmented Generation (RAG) has become the go-to solution for developers building practical AI applications that need to work with real, current data.
RAG is a technique that combines the reasoning power of large language models with the ability to retrieve and use specific, up-to-date information from your own data sources. Instead of relying solely on what the model learned during training, RAG lets your AI pull relevant context from documents, databases, or any text-based knowledge base in real-time.
Think of it as giving your AI a research assistant that can instantly find and reference the exact information needed to answer questions accurately—no more hallucinations about non-existent features or outdated policies.
Large language models are incredible at understanding and generating human-like text, but they have three fundamental limitations that make them impractical for business applications:
Knowledge Cutoff: They only know information from their training data, which could be months or years old.
Hallucination: They sometimes generate plausible-sounding but completely incorrect information when they don't know the answer.
Static Knowledge: They can't access real-time or private company data like your internal documentation, customer records, or recent product updates.
RAG addresses all three issues by grounding the model's responses in actual, retrievable documents from your knowledge base.
Here's the RAG pipeline broken down into digestible steps:
The key advantage is that your AI now has access to your latest information without expensive retraining. Now let's build the complete system.
Let's build a production-ready RAG system using PostgreSQL with pgvector—a robust, scalable solution that integrates well with existing infrastructure.
What You'll Need:
Install Dependencies:
npm install openai pg @langchain/openai @langchain/community
npm install @langchain/postgres pgvector-node
npm install --save-dev @types/node @types/pg typescript ts-node
Database Setup:
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table for document chunks
CREATE TABLE document_chunks (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI embedding dimensions
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops);
import { OpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/postgres";
import { RetrievalQAChain } from "langchain/chains";
import { Pool } from 'pg';
// Environment validation
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is required');
}
if (!process.env.DATABASE_URL) {
throw new Error('DATABASE_URL environment variable is required');
}
// Database connection
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
// Test database connection
try {
await pool.query('SELECT NOW()');
console.log('Database connected successfully');
} catch (error) {
console.error('Database connection failed:', error);
process.exit(1);
}
The key to effective RAG is breaking your documents into meaningful chunks that preserve context while staying within token limits.
interface Document {
content: string;
metadata?: Record<string, any>;
}
class DocumentProcessor {
private textSplitter: RecursiveCharacterTextSplitter;
constructor() {
// Optimized chunking strategy
this.textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Sweet spot for most content
chunkOverlap: 200, // Preserve context between chunks
separators: ["\n\n", "\n", ". ", "! ", "? ", "; ", " ", ""]
});
}
async processDocuments(documents: string[]): Promise<string[]> {
const chunks: string[] = [];
for (const doc of documents) {
try {
const docChunks = await this.textSplitter.splitText(doc);
chunks.push(...docChunks);
} catch (error) {
console.error('Error processing document:', error);
}
}
console.log(`Processed ${documents.length} documents into ${chunks.length} chunks`);
return chunks;
}
// Specialized chunking for different content types
createTechnicalDocsChunker(): RecursiveCharacterTextSplitter {
return new RecursiveCharacterTextSplitter({
chunkSize: 800,
chunkOverlap: 100,
separators: ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]
});
}
createConversationalChunker(): RecursiveCharacterTextSplitter {
return new RecursiveCharacterTextSplitter({
chunkSize: 1200,
chunkOverlap: 200,
separators: ["\n\n", "\n", ". ", "! ", "? ", " "]
});
}
}
class RAGSystem {
private embeddings: OpenAIEmbeddings;
private vectorStore: PGVectorStore;
private llm: OpenAI;
constructor() {
this.embeddings = new OpenAIEmbeddings({
modelName: "text-embedding-3-small",
});
this.llm = new OpenAI({
modelName: "gpt-4-turbo-preview",
temperature: 0.1, // Lower for more factual responses
});
}
async initialize() {
this.vectorStore = await PGVectorStore.initialize(
this.embeddings,
{
postgresConnectionOptions: {
connectionString: process.env.DATABASE_URL,
},
tableName: "document_chunks",
}
);
}
async indexDocuments(chunks: string[]) {
await this.vectorStore.addDocuments(
chunks.map(chunk => ({ pageContent: chunk, metadata: {} }))
);
console.log(`Indexed ${chunks.length} chunks`);
}
async query(question: string): Promise<string> {
// Retrieve relevant chunks
const relevantDocs = await this.vectorStore.similaritySearch(question, 4);
// Build context from retrieved documents
const context = relevantDocs
.map(doc => doc.pageContent)
.join("\n\n");
// Generate response with context
const prompt = `Based on the following context, answer the question.
Context:
${context}
Question: ${question}
Answer:`;
const response = await this.llm.invoke(prompt);
return response;
}
}
Always implement graceful degradation when the retrieval system fails:
async query(question: string): Promise<string> {
try {
const docs = await this.vectorStore.similaritySearch(question, 4);
if (docs.length === 0) {
return "I couldn't find specific information about that in our documentation. Please contact support for assistance.";
}
// Continue with normal RAG flow...
} catch (error) {
console.error('RAG query failed:', error);
return "I'm having trouble accessing our knowledge base right now. Please try again later.";
}
}
Track these metrics in production:
This foundation gives you a working RAG system. To take it further:
RAG transforms your AI from a generic chatbot into a knowledgeable assistant that can actually help with your specific business questions. The investment in building it right pays dividends in user satisfaction and reduced support burden.