A Complete Guide to Personally Identifiable Information

Key Takeaways

PII (Personally Identifiable Information) is any data that can identify an individual, alone or in combination with other data.
PII information is divided into two categories: direct identifiers (high-risk, identify someone on their own) and indirect identifiers (lower-risk individually, but identifying when combined).
Major privacy laws governing PII include GDPR, HIPAA, and CCPA.
Organizations in regulated industries face strict legal obligations around how PII is collected, stored, and protected.
Mishandling PII has consequences on two fronts. For the individuals whose data is exposed, it can mean identity theft and financial fraud. For the organization responsible, it means regulatory fines, legal liability, and reputational damage.

Introduction

A compliance officer gets an alert: a file containing employee records was shared externally.

The immediate question isn’t just who accessed it, but what was in it.

If the file contains Personally Identifiable Information (PII) — names, government IDs, contact details, or other identifying data — the situation may trigger regulatory reporting, internal investigations, and serious reputational risk.

This is why understanding PII is so critical.

Every organization that collects or processes information about people is responsible for protecting it. And the first step toward doing that effectively is knowing what data qualifies as PII in the first place.

In this guide, we cover:

What is PII and how it’s officially defined
What is considered PII — direct vs. indirect identifiers
Examples of PII across common data types
Why PII protection matters for compliance and data security
Which regulations govern how companies must handle PII

What Is PII?

PII, or Personally Identifiable Information, is any data that can be used to identify, contact, or locate a specific individual, either by itself or when combined with other available information.

Understanding what PII data includes, and where it lives in your systems, is the starting point for any serious data protection program.

The most widely cited definition comes from the National Institute of Standards and Technology (NIST), which describes PII as “any information that can be used to distinguish or trace an individual’s identity, including information linked or linkable to an individual, such as medical, educational, financial, and employment records.”

PII is primarily a U.S. framework. The EU’s General Data Protection Regulation (GDPR) uses the term “personal data,” defined broadly as any information relating to an identified or identifiable natural person.

The terminology differs, but the core principle is the same: information that connects to a person deserves protection.

What Is Considered PII?

There are two different types of PII based on how directly it identifies someone — direct and indirect PII

Direct PII

Direct PII identifies a person by itself. It carries the highest risk if exposed and is subject to the strictest legal protections.

Some examples:

Full legal name
Social Security number (SSN)
Passport or driver’s license number
Biometric data (fingerprints, facial recognition data, voiceprints)
Financial account or credit card numbers
Medical record numbers
Email address (when linked to an individual)
Home address
Phone number
Place of birth
Mother’s maiden name
Employee ID or student ID number
Tax identification number (TIN)
Vehicle identification number (VIN)
Digital signatures
Device identifiers (MAC address, IMEI number)

Indirect PII

Indirect PII can’t identify someone on its own, but can do so when combined with other data.

Examples:

Date of birth
ZIP code
Gender
Race or ethnicity
Job title
IP address
Employer name

Research has shown that roughly 87% of U.S. residents can be uniquely identified using just three indirect data points: ZIP code, birth date, and gender.

This is why indirect identifiers must be treated carefully and why the stakes around PII mishandling are higher than most organizations assume.

Sensitive vs. non-sensitive PII

PII is also categorized by the level of harm its exposure could cause.

Sensitive PII, including SSN, medical records, financial account numbers, biometric data, and government ID numbers, carries a high risk if compromised and is subject to the strictest legal protections.

Non-sensitive PII, such as email addresses, phone numbers, usernames, and general employment details, is still identifiable but poses a lower standalone risk.

Why PII Protection Matters

PII is one of the most valuable targets in any data breach. Once exposed, it rarely stays isolated, as attackers use it to pivot into broader fraud schemes, and the damage tends to compound quickly on both sides of the breach.

For the individuals affected, the consequences are immediate and personal: stolen identities, drained accounts, fraudulent loans taken out in their name, and targeted phishing campaigns built from their own contact details. Victims often don’t discover the damage until it has already been done, and the financial and credit fallout can take years to reverse.

For the organization responsible, the exposure is just as serious:

Regulatory penalties — Under GDPR, fines can reach €20 million or 4% of global annual revenue, whichever is higher. HIPAA violations carry penalties up to $1.9 million per violation category per year. CCPA gives California residents the right to sue directly, with statutory damages of $100–$750 per consumer per incident.
Legal liability — A single breach can trigger audits, class action litigation, and mandatory remediation that disrupts operations for months.
Reputational damage — Organizations that suffer a breach don’t just face legal exposure but also face customer loss, media scrutiny, and long-term erosion of trust that no fine schedule can fully capture.

The volume of exposed data makes this urgent. In 2023 alone, there were over 3,200 publicly reported data breaches in the U.S., exposing more than 353 million records. For organizations in regulated industries, the question is less whether a breach will happen and more whether they’ll be prepared when it does.

PII and Privacy Regulations

PII doesn’t exist in a regulatory vacuum. Depending on your industry, geography, and the type of data you handle, one or more privacy frameworks will dictate how you collect, store, access, and dispose of it. Non-compliance is more than just a theoretical risk, as regulators across jurisdictions have demonstrated a clear willingness to act.

One reason compliance is more complex than it appears is that organizations often make the mistake of treating PII as a fixed checklist.

In reality, the sensitivity of any given data point depends heavily on context. This is why modern privacy laws don’t just regulate obvious identifiers but also extend protections to any information that is linked or linkable to an individual.

For multinational organizations, this is especially significant, as GDPR’s definition is broader than U.S. frameworks and covers any information related to an identifiable person, even indirectly.

Here are the three frameworks that come up most often:

GDPR (General Data Protection Regulation)

The EU’s flagship privacy law applies to any organization that handles the personal data of EU residents, regardless of where the organization is based.

GDPR takes a broad view of what counts as PII, covering any data that can “directly or indirectly” identify a person, which includes IP addresses, location data, and online identifiers. Organizations must have a lawful basis for processing personal data, honor data subject rights (including the right to erasure), and report breaches within 72 hours of discovery.

HIPAA (Health Insurance Portability and Accountability Act)

HIPAA governs health-related PII, referred to as Protected Health Information (PHI), in the U.S. healthcare sector.

It applies to covered entities, including hospitals, clinics, and insurers, and their business associates.

What makes HIPAA compliance particularly demanding is its scope: even indirect identifiers like admission and discharge dates, appointment times, geographic data smaller than a state, and room numbers can constitute PHI when linked to patient records. HIPAA requires administrative, physical, and technical safeguards, and penalties scale with the level of negligence.

CCPA (California Consumer Privacy Act)

CCPA gives California residents significant rights over their personal data: the right to know what is collected, the right to delete it, and the right to opt out of its sale.

Unlike HIPAA, CCPA covers a wide range of identifiers beyond the obvious — household data, behavioral data, and inferences drawn from other data points all fall within scope. Businesses that fail to comply face fines of up to $7,500 per intentional violation, and consumers can bring private lawsuits for data breaches involving certain categories of information.

These three frameworks represent the most broadly applicable PII regulations, but they are far from the only ones.

Depending on your sector and the jurisdictions you operate in, you may also need to account for FERPA (education records), GLBA (financial data), SOX (financial reporting), or state-level privacy laws that continue to expand across the U.S.

How PII Gets Exposed and Why Redaction Matters

Not every PII breach is the result of a cyberattack. In regulated industries, some of the most common exposures happen through routine operations, and often without anyone realizing it.

A few of the most frequent sources of accidental PII exposure include:

FOIA and public records requests — When government agencies or public institutions respond to Freedom of Information Act requests, documents containing PII must be reviewed and redacted before release. Without a reliable process, names, SSNs, addresses, and other identifiers can accidentally be disclosed.
Ediscovery and litigation — Legal teams producing documents in response to discovery requests face the same challenge at scale. A single case can involve thousands of emails and attachments, any of which may contain sensitive PII that needs to be identified and redacted before production.
Internal document sharing — HR files, financial records, and compliance reports shared across departments or with external auditors can expose PII if access controls and review processes aren’t in place.
Misconfigured systems — Databases, cloud storage, and archiving platforms that aren’t properly secured can make PII accessible to users who shouldn’t have it, or visible in search results and exports.

For legal and compliance teams, the challenge is both to protect PII from external threats as well as have systems in place that make PII identifiable, reviewable, and redactable on demand.

Whether responding to a FOIA request, preparing documents for litigation, or fulfilling a data subject access request under GDPR, the ability to locate and redact PII quickly is a core compliance requirement.

For organizations in regulated industries, managing PII is an operational concern. PII flows through email, messaging platforms, HR systems, and customer records daily, and keeping track of where it lives, who has access to it, and how long it’s retained is a compliance obligation in its own right.

Jatheon Cloud addresses this directly as its bulk redaction capability allows legal and compliance teams to locate and redact PII across large volumes of archived communications before responding to FOIA requests, ediscovery, or regulatory inquiries, without having to review documents one by one.

Combined with access controls, audit trails, and configurable retention policies, it gives regulated organizations the infrastructure to manage PII compliantly at scale.

Summary of the Main Points

PII (Personally Identifiable Information) is any data that can identify an individual, either on its own or when combined with other information.
There are two types of PII: direct PII (full names,SSNs, passport numbers, biometric data) that identify someone without additional context, and indirect PII (ZIP codes, birth dates, IP addresses) that become identifying when combined.
The sensitivity of any data point depends on context. For example, an email address linked to a medical record carries far greater risk than one that stands alone.
A PII breach can result in identity theft, financial fraud, targeted phishing for individuals, and regulatory penalties, and long-term reputational damage for companies.
The three most widely applicable privacy laws governing PII are the GDPR in the EU, and HIPAA and CCPA in the U.S.. Each has a distinct scope, obligations, and penalty structures.
Other sector-specific regulations, such as FERPA, GLBA, SOX, and expanding state-level privacy laws, may also apply depending on industry and jurisdiction.
For any organization that collects, stores, or transmits personal data, understanding PII is the foundation of sound data governance.

If your organization handles personal data, having the right archiving infrastructure in place is a core part of PII compliance. Contact us at sales@jatheon.com or book a demo to see how Jatheon helps regulated organizations balance data retention for compliance with strict privacy controls and masking for GDPR, CCPA, and HIPAA.

FAQ

What does PII stand for?

PII stands for Personally Identifiable Information. It refers to any data that can be used to identify, locate, or contact a specific individual, either on its own or when combined with other information.

What are some PII examples?

Common PII examples include full name, Social Security number, email address, phone number, driver’s license number, credit card number, medical record numbers, fingerprints, and home address. Indirect examples include date of birth, gender, ZIP code, and IP address.

Is an email address PII?

Yes. An email address is considered PII because it can be used to identify and contact a specific individual. It qualifies as a direct identifier under most privacy frameworks.

What is the difference between PHI vs. PII?

PHI (Protected Health Information) is a subset of PII specific to healthcare. While PII is a broad term covering any data that can identify an individual, PHI refers specifically to PII that is linked to a person’s past, present, or future health condition, healthcare services, or payment for those services. PHI is governed by HIPAA, which imposes stricter handling requirements than general PII regulations. All PHI is PII, but not all PII is PHI.

Which types of records may contain PII?

PII can appear across a wide range of records, including: email and messaging communications, personnel and HR files, medical and patient records, financial statements and tax documents, educational records (governed by FERPA in the U.S.), legal and court documents, government ID and licensing records, and customer databases. Any system that stores or transmits information about individuals is a potential repository of PII and should be subject to appropriate access controls and retention policies.

Who is responsible for protecting PII in an organization?

Responsibility for PII protection is shared across the organization, but typically falls under the CISO, DPO (Data Protection Officer), compliance teams, and IT security. All employees who handle personal data have an obligation to follow the organization’s data protection policies.

Archive

Capture

New Data Sources

By Type

By Industry

By Business Needs

The Jatheon Difference

Blog

Content Hub

Government Compliance Resource Pack

Finance Compliance Resource Pack

Healthcare Compliance Resource Pack

Education Compliance Resource Pack

The Jatheon Difference

The Jatheon Difference

The Jatheon Difference

Read Next:

Schedule Your Personal Demo

CONNECT WITH US

Archive

Capture

New Data Sources

By Type

By Industry

By Business Needs

The Jatheon Difference

Blog

Content Hub

Government Compliance Resource Pack

Finance Compliance Resource Pack

Healthcare Compliance Resource Pack

Education Compliance Resource Pack

The Jatheon Difference

The Jatheon Difference

The Jatheon Difference

What Is PII? A Complete Guide to Personally Identifiable Information

Key Takeaways

Introduction

What Is PII?

What Is Considered PII?

Direct PII

Indirect PII

Sensitive vs. non-sensitive PII

Why PII Protection Matters

PII and Privacy Regulations

GDPR (General Data Protection Regulation)

HIPAA (Health Insurance Portability and Accountability Act)

CCPA (California Consumer Privacy Act)

How PII Gets Exposed and Why Redaction Matters

Summary of the Main Points

FAQ

What does PII stand for?

What are some PII examples?

Is an email address PII?

What is the difference between PHI vs. PII?

Which types of records may contain PII?

Who is responsible for protecting PII in an organization?

Read Next:

Schedule Your Personal Demo

See how data archiving can simplify compliance and ediscovery for your organization

Related Posts

What Is Data Archiving? Definition, Benefits, and Best Practices

Why Information Governance Is the Key to Effective Email Management

Best Government Software Solutions: 10 Options for Municipalities

Products

Industries

New Data Sources

Solutions

Resources

Compare us to

Contact Support

Contact Sales

Awards & Recognition

Connect With Us

CONNECT WITH US