Retrieval-Augmented Generation (RAG)

What is retrieval-augmented generation?

Retrieval-augmented generation (RAG) is a way to improve an AI system's answers by retrieving relevant information before generating a response. Instead of relying only on what a model learned during training, the application searches a knowledge source, adds the most relevant passages to the prompt, and asks the model to answer using that context.

RAG is widely used for support assistants, documentation search, internal knowledge tools, legal and policy review, product discovery, analyst workflows, and AI search. It can make answers more current and more specific because the model can refer to documents, pages, tickets, records, or database entries that were not in its original training data.

RAG also creates security and operational questions. The application must decide what content is indexed, who can retrieve it, how stale information is handled, how source permissions are preserved, and what happens when retrieved text includes malicious or misleading instructions.

Why does it matter?

RAG matters because many AI failures are really retrieval failures. If the system retrieves the wrong document, an outdated policy, an unauthorized record, or a poisoned page, the model may produce a confident answer based on bad context. If it retrieves too much, it may expose sensitive data. If it retrieves too little, it may fill gaps with guesses.

For site owners, RAG also explains why AI crawlers and retrieval agents request web pages. Some systems fetch content for model training. Others fetch pages to build a search index or answer a user question in real time. These uses have different value and risk. A site may allow public documentation retrieval while blocking bulk extraction of paid articles, pricing data, inventory, or reviews. Related crawler guidance is covered in what are AI and LLM web scrapers?, how to detect AI crawlers, and how to block AI crawlers.

For platform teams, RAG is often the difference between a demo and a production AI feature. A good retrieval layer needs indexing, permissions, freshness, observability, evaluation, abuse controls, and incident response.

How RAG works

A RAG system usually has two paths: ingestion and answering.

During ingestion, source content is collected from web pages, documents, databases, tickets, chats, code repositories, or other systems. The content is cleaned and split into chunks. Each chunk is often converted into an embedding, which is a numerical representation of the text's meaning. The chunks, embeddings, metadata, and permissions are stored in an index or vector database.

During answering, the application converts the user's question into a search query or embedding, retrieves matching chunks, ranks them, and places selected passages into the model's context. The model then generates an answer using the retrieved material.

This workflow sounds simple, but design choices affect quality and security. Chunk size changes context, ranking decides which source wins when documents disagree, metadata supports citations, and permission filters decide whether the user can retrieve a document at all.

Where RAG appears

RAG appears in public and internal systems. A customer-support assistant may retrieve help articles, order details, and policy pages. An internal assistant may retrieve engineering docs, incident notes, or HR policies. An AI search feature may retrieve public web pages. A sales assistant may retrieve product descriptions, pricing rules, and account notes. A developer tool may retrieve code snippets, issues, and documentation.

RAG can also be part of agentic workflows. When retrieval and tool use are combined, governance becomes more important because bad context can influence real operations.

Risks and failure modes

The first risk is unauthorized retrieval. If the index does not preserve document-level permissions, a user may receive answers based on documents they could not access directly. This can happen when teams copy documents into a central vector store without tenant, role, or record-level access controls.

The second risk is prompt injection through retrieved content. A document, web page, ticket, or comment can contain instructions aimed at the model rather than the human reader. If the model treats that content as trusted instructions, it may ignore the original task, reveal data, or call tools incorrectly.

The third risk is poisoning. Attackers may create or modify content likely to be retrieved for important queries. Poisoning can be accidental too: stale policies, duplicate pages, and low-quality content can all degrade answers.

The fourth risk is misinformation. RAG reduces hallucination risk, but it does not eliminate it. The model may misread the retrieved passage, combine sources incorrectly, omit important limits, or answer beyond the evidence. If the system does not cite sources or expose retrieval evidence, users may not notice.

The fifth risk is resource abuse. Search, embedding, reranking, and model calls can be expensive. Attackers may send long queries, force broad retrieval, request repeated summaries, or automate access to high-cost endpoints. Public RAG interfaces should be treated as application and API surfaces, with rate limits and abuse controls. See what is API security? and what is REST API security? for related fundamentals.

Practical evaluation checklist

Teams evaluating RAG should ask:

What sources are ingested, and who owns them?
Are public, internal, customer, tenant, and regulated data separated?
Are document permissions preserved at retrieval time?
How are documents chunked, ranked, refreshed, and deleted?
Can users see citations or source evidence for important answers?
Are stale, duplicate, low-quality, or contradictory sources identified?
Can untrusted retrieved text influence tool calls or policy decisions?
Is sensitive data redacted before indexing or prompt construction?
Are queries, retrieved chunks, model outputs, and user feedback logged safely?
Are rate limits, quotas, timeouts, and cost controls enforced?
Is there a process to remove poisoned or incorrect content from the index quickly?

These checks should be repeated whenever the source set changes. A RAG system connected to one public documentation site is very different from one connected to customer records, support tickets, and internal operational tools.

Controls and governance

Strong RAG governance starts with source control. Do not index everything simply because it is available. Create an inventory of sources, owners, permission models, retention rules, and freshness requirements. Keep sensitive repositories separate unless there is a clear need to combine them.

Preserve authorization at retrieval time. It is not enough to check permissions when content is first indexed. The user asking the question should only retrieve content they are allowed to access at that moment. Tenant boundaries, role changes, document deletions, and account status changes need to be reflected in the retrieval layer.

Treat retrieved text as untrusted. Label it as context, not instruction. Use system prompts, retrieval filters, tool policies, and output validation to reduce prompt-injection risk. If the model can call tools, keep those tools narrow and require approval for sensitive actions.

Evaluate quality continuously. Test real user questions, adversarial questions, stale-source scenarios, and permission-boundary cases. Track whether answers are grounded in retrieved sources, whether citations are useful, and whether users correct or reject responses.

RAG can make AI systems more accurate and useful, but it also turns content governance into security architecture. Answer quality depends on what is retrieved, and security depends on who can retrieve it, how it is used, and what the model can do next.

Retrieval-Augmented Generation (RAG)

What is retrieval-augmented generation?

Why does it matter?

How RAG works

Where RAG appears

Risks and failure modes

Practical evaluation checklist

Controls and governance

Related learning

Related Articles

What is an Account-Control Surface?

How to defend against Account Takeovers

What is an Account Takeover?

AI Crawler User Agents

AI For Cybersecurity

AI Image Generation