What Is PII in Web Analytics?
Personally identifiable information defined for analytics: what counts, common accidental leaks (URLs, search terms), and prevention.
PII — personally identifiable information — is any data that identifies a person directly (name, email, phone) or in combination (IP + timestamp + behavior). In analytics, PII is less a category you collect than a category that leaks in: through URLs, search boxes, and form echoes, into systems that were supposed to be aggregate.
The classic accidental leaks
- URLs carrying identity: /account/jane.doe@example.com, password-reset tokens, ?email= parameters in campaign links. Your pageview report becomes a PII database nobody meant to build. Defense: never put identity in URLs; scrub query parameters at the tracker or collector level.
- Site-search queries: confused users type emails and order numbers into search boxes; if you track queries, filter PII patterns before sending.
- Event properties with form data: track('form_submitted', { email: ... }) — the lazy property that turns an event stream into a register. Track that, never what (the form-tracking rules).
- Session replay: recording keystrokes is PII collection as a product feature — its own minefield.
The GDPR framing
European law uses the broader term 'personal data' — anything relating to an identifiable person, explicitly including IP addresses and online identifiers. This is why classic analytics (persistent ID + IP + behavior) processes personal data by definition and inherits the full GDPR apparatus: legal basis, DPAs, rights handling.
Data minimization as the design answer
The robust strategy is architectural: collect so little that there is no PII to protect. Cookieless analytics with rotating identifiers and aggregate reporting keeps the analytics layer personal-data-minimal by construction; the one deliberate exception — identify() at signup — happens with a real user relationship and consent behind it. PII you never collected is PII you cannot breach, subpoena, or apologize for.