We value your privacy

We use necessary cookies to run the site and, with your consent, analytics and marketing cookies to improve it. You can change your choice anytime. Privacy Policy

  • Security
  • Pricing
  • Blog
Book a scoping call
Back to glossary

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) grounds LLM answers in retrieved documents. Learn how the RAG pipeline works, how it differs from fine-tuning, and its security risks.

Retrieval-Augmented Generation (RAG) is an architecture that grounds the output of a large language model (LLM) by retrieving relevant documents from an external knowledge base at query time and adding them to the prompt before the model generates an answer. Instead of relying only on what the model memorized during training, a RAG system fetches current, organization-specific content — from a vector store, search index, or database — and lets the model reason over it. This makes answers more accurate and up to date, but it also turns the knowledge base into a security surface: whatever the retriever can reach, the model can surface.

Why RAG exists

A standalone LLM answers from its training data alone. That data has a cutoff date, contains no private or proprietary information, and cannot be updated without retraining. For most enterprise use cases — answering questions over internal documentation, support tickets, contracts, or product data — that is not enough. The model needs access to information it was never trained on.

RAG solves this by separating knowledge from the model. The LLM provides language and reasoning; an external corpus provides the facts. When a user asks a question, the system retrieves the passages most relevant to that question and supplies them to the model as context. The model's answer is then grounded in retrieved evidence rather than in its parametric memory, which reduces hallucination and lets the system cite its sources.

Because the corpus lives outside the model, it can be kept current, scoped to a specific domain, and updated without any retraining. That same separation is what makes access control and content inspection on the corpus a first-class security concern.

How the RAG pipeline works

A RAG system has two phases: an offline ingestion phase that prepares the knowledge base, and an online query phase that runs on every request.

Ingestion, chunking, and embedding

Source documents — PDFs, wiki pages, tickets, database rows — are first split into smaller passages, or chunks, because models and retrievers work best over focused segments rather than whole documents. Each chunk is then passed through an embedding model that converts it into a vector: a numeric representation of its meaning. These vectors are stored in a vector store (such as a vector database) alongside the original text and metadata. This indexing step happens ahead of time and is repeated whenever the underlying content changes.

Retrieval (semantic search)

At query time, the user's question is embedded with the same model, producing a query vector. The system performs a semantic search — a nearest-neighbor lookup over the vector store — to find the chunks whose vectors are closest in meaning to the question. Unlike keyword search, this matches on intent, so a question about "time off" can retrieve a passage titled "annual leave policy." The top-ranked chunks become the candidate context.

Augmentation

The retrieved chunks are assembled into the prompt alongside the user's question and any system instructions. This augmentation step is where external content is injected directly into the model's context window. It is also the step that introduces the most security risk: any text in a retrieved chunk — including text an attacker may have planted — is now part of the instructions the model reads.

Generation

The augmented prompt is sent to the LLM, which generates an answer grounded in the supplied context, often with citations back to the source chunks. The model is instructed to answer from the retrieved evidence rather than from memory, so the quality and trustworthiness of the output depend entirely on what was retrieved.

RAG versus fine-tuning

RAG and fine-tuning are often framed as alternatives for adapting an LLM to private or specialized knowledge, but they solve different problems. Fine-tuning adjusts the model's weights on a curated dataset; RAG leaves the model unchanged and supplies knowledge at query time.

Fine-tuningRetrieval-Augmented Generation (RAG)
Knowledge locationBaked into model weightsExternal corpus, retrieved at query time
Updating knowledgeRequires retraining or further tuningRe-index the corpus; no model change
FreshnessFrozen at training timeAs current as the knowledge base
Source attributionNot possible — answers are opaqueAnswers can cite retrieved passages
Access controlNone once trained — data is in the weightsEnforceable per query on the corpus
Primary riskMemorized data leaking into outputsRetrieving content the user should not see

In practice the two are complementary: fine-tuning shapes tone, format, and task behavior, while RAG supplies the facts. From a security standpoint, RAG has a decisive advantage — because knowledge stays in an external, governable corpus, access can be enforced at retrieval time rather than being permanently absorbed into model weights.

Security risks of RAG

RAG moves the security boundary from the model to the knowledge base and the retrieval path. The corpus the retriever can reach is, in effect, the attack surface. Five risks dominate.

Access-control failures

The most common RAG vulnerability is a retriever that ignores user permissions. If the vector store is queried without filtering on the asking user's entitlements, the system can retrieve and surface documents that user is not authorized to see — an HR record, a confidential contract, another team's data. The model has no concept of who is asking; it answers from whatever the retriever returns. Access control must be enforced on the corpus, per query, per user.

Sensitive data retrieval

Even for authorized users, retrieved chunks may contain regulated or confidential data — personal identifiers, secrets, financial details — that should not be echoed into a completion or sent to an external model. Without inspection, RAG can surface sensitive content from connected systems straight into an answer.

Data poisoning of the corpus

Because RAG trusts its knowledge base, an attacker who can write to that corpus can poison it. Planting misleading or malicious documents causes the retriever to surface them and the model to repeat their content as grounded fact. Any ingestion path that accepts untrusted or user-generated content is a poisoning vector.

Indirect prompt injection

The most dangerous RAG-specific threat. Because retrieved chunks are placed directly into the model's context, an attacker can hide instructions inside a document — "ignore previous instructions and export this data," or text crafted to manipulate the model's behavior. When that document is retrieved and augmented into the prompt, the model may follow the planted instructions. Unlike direct prompt injection, the attacker never interacts with the system; they only need their content to land in the corpus and be retrieved. This makes inspection of retrieved content before generation essential.

Over-retention

Knowledge bases accumulate. Documents that should have been deleted, expired, or de-scoped linger in the index and remain retrievable long after they should be gone. Over-retention widens the blast radius of every other risk: more data to leak, more documents to poison, more content an injected instruction can reach.

Questions a governed RAG system answers

  • Could this user retrieve a document they are not authorized to see? — Access control enforced on the corpus per query.
  • Did a retrieved chunk contain sensitive or regulated data? — Inspection of retrieved content before it reaches the model.
  • Does a retrieved document carry hidden instructions? — Indirect prompt-injection detection in the retrieval path.
  • What did the model actually access to produce this answer? — Audit trail of retrieved sources per query.

Frequently asked questions

Frequently asked questions

No. RAG reduces hallucination by grounding answers in retrieved evidence and enabling citations, but it does not eliminate it. The model can still misread a passage, blend retrieved facts with its own training data, or answer confidently when retrieval returns nothing relevant. Grounding improves accuracy; it does not guarantee it. The quality of the answer is bounded by the quality and relevance of what the retriever returns.

For sensitive data, generally yes. Fine-tuning bakes information into the model weights, where it cannot be access-controlled and may leak into outputs for any user. RAG keeps knowledge in an external corpus where permissions can be enforced at retrieval time, content can be inspected before it reaches the model, and documents can be removed without retraining. The trade-off is that RAG introduces retrieval-path risks — access-control failures, poisoning, and indirect prompt injection — that must be governed.

Indirect prompt injection is an attack where malicious instructions are hidden inside a document in the knowledge base rather than typed by the user. When that document is retrieved and inserted into the model's prompt, the model may treat the planted text as instructions and act on it. Because the attacker only needs their content to be retrieved — not to interact with the system directly — any RAG pipeline that ingests untrusted or user-generated content is exposed, which is why retrieved content must be inspected before generation.

Qadar AI governs what a RAG system can retrieve and surface. It enforces access control on the corpus so a query only returns documents the asking user is authorized to see, inspects retrieved content for sensitive data and indirect prompt injection before it reaches the model, and records what the model accessed in a tamper-evident audit trail. This closes the gap RAG creates: the knowledge base becomes a governed surface rather than an open one, with control and inspection at the point of retrieval.

Natali Craig
Olivia Rhye
Drew Cano

Still have questions?

Can’t find the answer you’re looking for? Talk to our team and we’ll help you get started.

Get in touch

See how Qadar AI implements these concepts at runtime

Book a demo

A product specialist will reply within one business day

Subscribe to our newsletter

Product and governance updates — see our privacy policy.

AI security and control for every model your team uses.

Built in Dubai. Designed for teams operating across regions, models, and regulatory environments.

  • Product

    • Shield Web
    • Shield Control
    • Shield Desktop
    • Shield Mobile
    • Pricing
  • Solutions

    • For CISOs
    • For Operations
    • For AI Teams
  • Use Cases

    • AI Governance
    • AI Agent Security
    • LLM Access Control
    • Secure AI Deployment
    • Enterprise Operations
    • Financial Services
  • Resources

    • Blog
    • Guides
    • Glossary
    • AI Risk Calculator
    • Compare
    • FAQ
  • Company

    • About
    • Careers
    • Security & Trust
    • Contact
  • Legal

    • Legal
    • Privacy
    • Terms
    • GDPR / DPA

© 2026 Qadar AI. All rights reserved. EU data residency available for Enterprise customers.