Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 897

RAG

What is RAG?

RAGRetrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses.


RAG augments a Large Language Model with an external retrieval step. At inference, the user's query is embedded, a vector or keyword index returns the most relevant documents, and those documents are concatenated into the prompt so the LLM can cite or reason over them. RAG reduces hallucinations and lets models use private or fresh data without retraining. Security-wise it creates new attack surface: prompt injection from documents (indirect prompt injection), data poisoning of the corpus or vector store, exfiltration through model outputs, access-control mistakes when multiple tenants share an index, and embedding inversion attacks. Hardened RAG pipelines isolate untrusted content, enforce per-document access checks, sanitize inputs, monitor retrieved snippets, and apply output guardrails.

Examples

  1. 01

    An enterprise chatbot answers HR questions by retrieving policy PDFs from a vector store.

  2. 02

    A malicious wiki page contains hidden instructions that hijack a RAG assistant via indirect prompt injection.

Frequently asked questions

What is RAG?

Retrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses. It belongs to the AI & ML Security category of cybersecurity.

What does RAG mean?

Retrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses.

How does RAG work?

RAG augments a Large Language Model with an external retrieval step. At inference, the user's query is embedded, a vector or keyword index returns the most relevant documents, and those documents are concatenated into the prompt so the LLM can cite or reason over them. RAG reduces hallucinations and lets models use private or fresh data without retraining. Security-wise it creates new attack surface: prompt injection from documents (indirect prompt injection), data poisoning of the corpus or vector store, exfiltration through model outputs, access-control mistakes when multiple tenants share an index, and embedding inversion attacks. Hardened RAG pipelines isolate untrusted content, enforce per-document access checks, sanitize inputs, monitor retrieved snippets, and apply output guardrails.

How do you defend against RAG?

Defences for RAG typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for RAG?

Common alternative names include: Retrieval-Augmented Generation, Grounded generation.

Related terms