We value your privacy

We use necessary cookies to run the site and, with your consent, analytics and marketing cookies to improve it. You can change your choice anytime. Privacy Policy

  • Security
  • Pricing
  • Blog
Book a scoping call
Back to glossary

Personally Identifiable Information (PII)

Personally Identifiable Information (PII) is any data that can identify a person directly or indirectly. Learn how PII differs from GDPR personal data and why it matters for AI.

Personally Identifiable Information (PII) is any data that can be used to identify a specific individual, either on its own or in combination with other information. Some PII identifies a person directly — a full name, a Social Security number, a passport number, an email address. Other PII identifies a person only indirectly, when several otherwise innocuous data points are combined. As employees move sensitive work into AI tools, PII increasingly flows through prompts and completions rather than the structured records and databases where it was historically governed.

Direct and indirect identifiers

Not all PII identifies a person the same way. Practitioners distinguish two categories, and the distinction matters because indirect identifiers are far easier to overlook.

Direct identifiers

A direct identifier names a specific person without any additional context. A passport number, a national insurance or Social Security number, a full legal name paired with a date of birth, a personal email address, or a biometric template each point to one individual on their own. These are the patterns most data controls are tuned to catch.

Indirect identifiers (quasi-identifiers)

An indirect identifier does not single out a person by itself, but becomes identifying when combined with other data. A postal code, a job title, an employer, a gender, and a birth year are each unremarkable in isolation. Together, they can narrow a population to one person. Re-identification research has repeatedly shown that a small number of quasi-identifiers is often enough to uniquely identify individuals in supposedly anonymized datasets. This is why removing names alone does not make data non-personal.

PII versus related categories

"PII" is a term rooted in U.S. privacy practice. European data protection law uses the broader concept of "personal data," and most frameworks carve out a more tightly regulated subset for especially sensitive attributes. These categories overlap but are not interchangeable, and conflating them leads to under-protection.

PII (U.S. usage)Personal data (GDPR)Sensitive / special-category data
ScopeData that identifies a specific individualAny data relating to an identified or identifiable personA defined subset warranting heightened protection
BreadthNarrower; focused on identifying attributesBroader; includes data merely relating to a personNarrowest; an enumerated list of categories
Typical examplesName, SSN, passport number, email addressThe above plus IP addresses, device IDs, location, online identifiersHealth, biometric, genetic, racial or ethnic, religious, sexual-orientation data
Indirect dataOften treated as PII when combinedExplicitly covered when a person is identifiableCovered, with stricter conditions for processing

The practical takeaway: GDPR's "personal data" is wider than the classic notion of PII — an IP address or a device identifier may be personal data even if it would not be considered PII under a narrow U.S. reading. Sensitive or special-category data (health, biometrics, race, religion, sexual orientation, and similar) is a smaller set that nearly every framework subjects to stricter handling. When in doubt, treat the broadest applicable definition as the operative one.

Why PII matters for AI

AI tools have created a new, high-volume path for PII to leave an organization — one that most data controls were never positioned to watch. Four exposure patterns dominate.

Users pasting PII into prompts

The most common exposure is also the most mundane: an employee pastes a customer list, a support transcript, a CV, or a contract into a chatbot to summarize or rewrite it. The PII is now prompt text submitted to a third-party model, outside the channels that traditional data loss prevention inspects.

Model memorization and training

When prompts are used to train or fine-tune a model, PII contained in them can be retained in the model's parameters and, under some conditions, resurfaced later. Even where a provider states that inputs are not used for training, an organization that cannot inspect its own outbound prompts cannot verify what PII it has exposed or to whom.

Completions surfacing PII from connected systems

As AI assistants and agents are wired into internal systems — CRMs, ticketing tools, knowledge bases, databases — model completions can return PII drawn from those sources to a user who should not see it, or echo it into a downstream tool call. The sensitive data leaves not through an upload but through the model's output.

Regulatory exposure

PII is the object most privacy regulation is built to protect. Mishandling it through AI tools can implicate frameworks such as the GDPR in the EU, the CCPA/CPRA in California, and HIPAA for health information in the United States — among others. Obligations commonly include lawful basis for processing, data minimization, purpose limitation, and breach notification. Uninspected PII flowing into external models undermines an organization's ability to demonstrate any of these.

Governing PII across AI surfaces

Protecting PII in the AI era means adding inspection at the layer where AI activity actually happens, rather than relying solely on network egress or endpoint controls:

  • Prompt inspection — PII is detected in the prompt before submission to any model, and can be redacted or blocked on policy match.
  • Completion inspection — model outputs are checked before they reach the user, catching PII surfaced from connected systems.
  • Agent tool-call governance — arguments passed to external tools and APIs by autonomous agents are inspected before execution.
  • Semantic detection — beyond regex for structured identifiers, semantic analysis helps catch indirect identifiers and PII embedded in free-form natural language.
  • Audit trail — every inspected interaction is recorded, so an organization can demonstrate what PII was handled, where, and under what policy.

Questions a PII governance capability answers

  • Is PII being pasted into external AI tools? — Prompt-level detection with redaction or block.
  • Did a model return personal data from a connected source? — Completion inspection before display.
  • What PII did this AI agent send to an external API? — Tool-call argument inspection and audit.
  • Which users and tools handle the most PII? — Usage analytics across AI surfaces.

Frequently asked questions

Frequently asked questions

Not exactly. "PII" comes from U.S. privacy practice and centers on data that identifies a specific person. The GDPR's "personal data" is broader: it covers any information relating to an identified or identifiable person, which can include IP addresses, device identifiers, and location data that a narrow reading of PII might exclude. In an EU context, the wider "personal data" definition is the one that governs.

Often not. Names are direct identifiers, but indirect identifiers — postal code, job title, employer, birth year, gender — can re-identify a person when combined. Datasets stripped of names but rich in quasi-identifiers have repeatedly been re-identified. True de-identification requires reducing the risk of re-identification across all attributes, not just removing obvious ones.

AI tools turn free-form prompts and completions into a high-volume channel for PII to move. Employees paste personal data into chatbots, models can memorize PII from training inputs, and AI assistants connected to internal systems can surface personal data in their outputs. Most of this never appears as a file upload or flagged email, so traditional data controls do not see it.

Qadar AI inspects prompts and completions for PII at the AI interaction layer across Shield Web, Shield Desktop, and Shield Mobile, so personal data can be redacted or blocked before it reaches an external model and model outputs are checked before they reach the user. Agent tool calls are inspected before execution, and every interaction is recorded in a tamper-evident audit trail managed centrally through Shield Control — giving organizations visibility and policy enforcement over PII across every AI surface.

Natali Craig
Olivia Rhye
Drew Cano

Still have questions?

Can’t find the answer you’re looking for? Talk to our team and we’ll help you get started.

Get in touch

See how Qadar AI implements these concepts at runtime

Book a demo

A product specialist will reply within one business day

Subscribe to our newsletter

Product and governance updates — see our privacy policy.

AI security and control for every model your team uses.

Built in Dubai. Designed for teams operating across regions, models, and regulatory environments.

  • Product

    • Shield Web
    • Shield Control
    • Shield Desktop
    • Shield Mobile
    • Pricing
  • Solutions

    • For CISOs
    • For Operations
    • For AI Teams
  • Use Cases

    • AI Governance
    • AI Agent Security
    • LLM Access Control
    • Secure AI Deployment
    • Enterprise Operations
    • Financial Services
  • Resources

    • Blog
    • Guides
    • Glossary
    • AI Risk Calculator
    • Compare
    • FAQ
  • Company

    • About
    • Careers
    • Security & Trust
    • Contact
  • Legal

    • Legal
    • Privacy
    • Terms
    • GDPR / DPA

© 2026 Qadar AI. All rights reserved. EU data residency available for Enterprise customers.