Data Loss Prevention (DLP)
Data Loss Prevention (DLP) is a set of controls that detect and stop sensitive data from leaving an organization. Learn how DLP works, where it falls short for AI, and how AI-era data protection extends it.
What DLP was designed to do
DLP emerged to solve a specific problem: confidential data leaving the organization through known egress points. A complete DLP program typically covers three states of data:
- Data in motion — content crossing the network boundary: email, web uploads, file transfers, messaging.
- Data at rest — content stored on endpoints, file shares, databases, and cloud storage that should not be there.
- Data in use — content being copied, printed, or moved to removable media on an endpoint.
For each, DLP applies a detection method — pattern matching, fingerprinting of known documents, or classification labels — and an enforcement action: log, alert, quarantine, encrypt, or block.
How DLP detection works
DLP engines identify sensitive data using a combination of techniques:
Pattern and regex matching
The most common method. Rules look for structured patterns — a 16-digit number passing a Luhn check (payment cards), a formatted national identifier, an IBAN. Fast and cheap, but blind to context: a number in a test fixture and a real customer card look identical.
Exact data matching and fingerprinting
The DLP system is given a database or document set to protect and computes hashes of its contents. Outbound data is checked against those fingerprints, catching exact or partial copies of known-sensitive records. More precise than regex, but limited to data the organization has already registered.
Classification and labels
Data is tagged at creation — Confidential, Internal, Restricted — and DLP enforces handling rules per label. Effective only when labeling is consistent, which in practice it rarely is.
Where traditional DLP falls short for AI
DLP assumes data leaves through inspectable channels in inspectable formats. AI interactions break both assumptions.
When an employee pastes a customer list into a chatbot, summarizes a confidential contract with an AI assistant, or an autonomous agent reads an internal record and sends it to an external API, the sensitive data is moving — but not as a file attachment or a flagged email. It is prompt text, a tool-call argument, or a model completion. Most DLP deployments never see it.
| Traditional DLP | AI-era data protection | |
|---|---|---|
| Primary channel | Email, file transfer, web upload | AI prompts, completions, tool calls, agent actions |
| Detection signal | Regex, fingerprints, file labels | Intent and semantics of the prompt, plus pattern match |
| Format | Structured files and known documents | Free-form natural language and unstructured prompts |
| Agent coverage | None | Tool-call and MCP inspection before execution |
| Enforcement point | Network egress, endpoint, mail gateway | The AI interaction layer — browser, desktop, agent runtime |
The gap is not that DLP is wrong; it is that AI created a new, high-volume egress channel that legacy DLP was never positioned to inspect.
What AI-era data protection adds
Extending data protection to AI does not mean replacing DLP — it means adding inspection at the layer where AI activity happens:
- Prompt inspection — sensitive content is detected in the prompt before it is submitted to any model, and can be redacted or blocked on policy match.
- Completion inspection — model outputs are checked before they reach the user, catching sensitive data surfaced from connected systems.
- Agent tool-call governance — arguments passed to external tools and APIs by autonomous agents are inspected before execution, closing the channel DLP cannot see at all.
- Semantic classification — instead of relying solely on regex, AI-era controls assess the meaning of a prompt, distinguishing a genuine data-exfiltration attempt from benign text that happens to match a pattern.
Questions an AI-era DLP capability answers
- Is sensitive data being pasted into external AI tools? — Prompt-level detection with redaction or block.
- Did a model return regulated data from a connected source? — Completion inspection before display.
- What data did this AI agent send to an external API? — Tool-call argument inspection and audit.
- Which users and tools handle the most sensitive prompts? — Usage analytics across AI surfaces.