OWASP Top 10 for LLMs
The OWASP Top 10 for LLM Applications is a community-driven list of the most critical security risks in apps built on large language models. Learn the categories and mitigations.
Why the OWASP Top 10 for LLMs exists
Traditional application security frameworks assume a clear separation between code, data, and control flow. LLM applications break that assumption. A model treats its system prompt, retrieved documents, user input, and tool outputs as a single stream of natural language, and it can be steered by any of them. New components — vector databases, retrieval pipelines, agent tool calls, third-party model providers — introduce failure modes that the original OWASP Top 10 for web applications never anticipated.
The OWASP Top 10 for LLM Applications was created to close that gap. It is the product of contributions from hundreds of security practitioners, AI researchers, and industry participants, and it is intended as an awareness document — a prioritized starting point for threat modeling, not an exhaustive standard or a certification checklist. Its value is in naming the risks consistently so that teams building on LLMs can communicate about them, prioritize them, and map them to concrete controls.
The list evolves by version
It is important to treat the OWASP Top 10 for LLMs as a living document. Entries have been renamed, merged, split, and re-ranked between versions as real-world incidents and agentic architectures reshaped the threat landscape. For example, later revisions broadened "Model Denial of Service" toward the wider notion of unbounded resource consumption, and added categories reflecting newer concerns such as system prompt leakage and weaknesses in vector and embedding pipelines. When citing a specific rank or identifier (such as an "LLM0x" code), always anchor it to a stated version of the list rather than assuming it is stable across releases.
The core risk categories
The categories below appear, in one form or another, across versions of the OWASP Top 10 for LLM Applications. They are presented here by theme rather than by numbered rank.
Prompt Injection
Malicious instructions embedded in user input or in external content cause the model to ignore its intended instructions and follow the attacker instead. This includes direct injection (a user overriding the system prompt) and indirect injection (instructions hidden in a document, web page, email, or API response that the model later processes). In agentic systems with tool access, prompt injection is widely regarded as the defining risk, because a successful injection can translate into real-world actions.
Insecure Output Handling
Applications that pass model output to downstream systems without validation inherit the model's untrustworthiness. If an LLM's response is rendered as HTML, executed as code, used in a SQL query, or passed to a shell, the model effectively becomes an unvetted input source for classic injection vulnerabilities such as XSS, SSRF, or remote code execution.
Sensitive Information Disclosure
Models can reveal data they should not — training data, secrets, other users' content, or confidential records surfaced from connected systems. This spans data leaked through completions, data exposed via retrieval pipelines, and personal or regulated information returned in response to crafted prompts.
Training Data Poisoning
An attacker manipulates the data used to train or fine-tune a model, introducing backdoors, biases, or vulnerabilities. Because training data is often sourced at scale from public or semi-trusted origins, poisoning can be difficult to detect and persists in the model's behavior after deployment.
Supply Chain Vulnerabilities
LLM applications depend on pre-trained models, datasets, plugins, libraries, and hosted inference providers. Compromised or untrustworthy components anywhere in that chain — a tampered model artifact, a malicious package, a vulnerable extension — can undermine the security of the whole application.
Model Denial of Service / Unbounded Consumption
Adversarial or resource-intensive inputs can exhaust compute, memory, context-window, or token budgets, degrading availability or driving runaway cost. Later framings of this category emphasize unbounded consumption more broadly, including model extraction and cost-amplification attacks against metered inference.
Excessive Agency
When a model is granted more functionality, permissions, or autonomy than the task requires, the blast radius of any failure or compromise grows accordingly. An over-privileged agent that can send email, modify records, or call external APIs can cause serious harm if it is manipulated — for instance, via prompt injection.
Insecure Plugin / Extension Design
Plugins and tool integrations that accept free-form input, lack authorization checks, or fail to validate parameters give attackers a path to abuse the capabilities the model can invoke. Weak extension design is a common bridge between a manipulated model and a real-world side effect.
Overreliance
Treating model output as authoritative without human oversight or verification leads to acting on hallucinated, incorrect, or insecure content — propagating bad code, flawed decisions, or misinformation into production systems and business processes.
Model Theft
Unauthorized access to, exfiltration of, or extraction of a proprietary model — whether by stealing weights directly or by reconstructing behavior through systematic querying — represents loss of intellectual property and a potential precursor to further attacks.
Newer additions
Later versions of the list added categories reflecting how production LLM systems are actually built. System Prompt Leakage addresses the risk that the contents of a system prompt — which may contain secrets, business logic, or access assumptions — can be extracted or relied upon as a security boundary it was never meant to be. Vector and Embedding Weaknesses cover risks specific to retrieval-augmented generation (RAG) pipelines, including embedding inversion, data leakage across tenants, and poisoning of the vector store.
Mapping risk categories to mitigations
No single control addresses the full list. The table maps several representative categories to the kinds of mitigations that meaningfully reduce them. It is illustrative, not exhaustive, and the precise naming and ranking of each risk depends on the version of the OWASP list being referenced.
| Risk category | Representative mitigations |
|---|---|
| Prompt Injection | Least-privilege tool access, runtime action validation, human approval for high-risk actions, prompt/completion inspection |
| Insecure Output Handling | Treat model output as untrusted; encode, validate, and sandbox before downstream use |
| Sensitive Information Disclosure | Prompt and completion inspection, redaction, data minimization, tenant isolation in retrieval |
| Excessive Agency | Scoped permissions, allowlisted tools, per-action authorization, agent-runtime governance |
| Supply Chain Vulnerabilities | Vetted model and package provenance, signing, dependency and artifact inventory |
| Unbounded Consumption / DoS | Rate limiting, token and cost budgets, input-size caps, anomaly detection on usage |
| Overreliance | Human-in-the-loop review, output verification, clear provenance and confidence signals |
The recurring pattern across these mitigations is that the most durable controls operate at the runtime and action layer — governing what an LLM or agent is allowed to access and do — rather than relying solely on input filtering or better prompts, which determined attackers can bypass.
How to use the list in practice
The OWASP Top 10 for LLM Applications is most useful as an input to threat modeling rather than as a pass/fail checklist. A practical approach is to enumerate every place where untrusted content enters the model, every tool or system the model can act on, and every downstream consumer of model output — then walk each category against those surfaces to identify where controls are missing.
Because the list overlaps with broader AI risk frameworks, it pairs well with the NIST AI Risk Management Framework and MITRE ATLAS for adversarial techniques. OWASP names the risks; those frameworks help structure governance and catalogue concrete attack patterns. Used together, they give security teams both a prioritized risk taxonomy and the surrounding process to manage it.
Related: Prompt Injection · AI Firewall · Data Loss Prevention (DLP)