Healthcare Data Layer Implementation: Structuring Events Without PHI

A custom pixel event named "Paid: Weekly Therapy" seems harmless. It describes a product tier, not a patient. But when Monument, an alcohol addiction telehealth platform, sent that event alongside email addresses and IP addresses to Meta, the FTC permanently banned the company from sharing health data for advertising. The event name itself became evidence that the platform was disclosing the nature of treatment to an ad network.

The problem wasn't the tracking pixel. It was what the data layer fed into it.

Your data layer is the JavaScript object that sits between your website and every downstream tool: your tag manager, your CDP, your analytics platform, your ad pixels. It defines what information gets collected, how it's structured, and what labels it carries. In healthcare, a poorly designed data layer doesn't just create bad analytics. It creates a PHI disclosure vector that scales with every page view.

This guide covers how to design a data layer that gives your marketing team the event data they need for attribution and optimization while keeping protected health information out of the pipeline entirely.

Why Healthcare Data Layers Require a Different Approach

A data layer, at its simplest, is a structured JavaScript object (typically window.dataLayer in Google Tag Manager implementations) that stores information about the current page, the visitor's actions, and the context around those actions. When a visitor clicks a button, submits a form, or views a page, the data layer captures the relevant details and makes them available to any tag, pixel, or analytics tool that's listening.

In most industries, the design goal is straightforward: capture as much behavioral context as possible. E-commerce data layers track product names, categories, prices, cart contents, and purchase values. SaaS data layers track feature usage, plan tiers, and onboarding milestones. The more detail in the data layer, the better the analytics.

Healthcare breaks this model. The behavioral context on a healthcare website is inseparable from the visitor's health information. A page view on /conditions/diabetes/insulin-management reveals a health interest. A form submission on a therapy intake page reveals mental health status. A search for "pediatric oncologist near me" reveals a family's medical situation. Every interaction carries potential PHI.

The challenge is that marketing teams still need event data. They need to know which campaigns drive appointment bookings, which pages have high bounce rates, which CTAs convert, and which landing pages perform. The solution isn't to stop tracking. It's to design a data layer that captures marketing-relevant signals without encoding health information into the event stream.

Where PHI Hides in Standard Data Layers

Before designing a safe schema, you need to understand where PHI typically appears. Most healthcare organizations don't intentionally put PHI into their data layers. It leaks in through default configurations and well-intentioned customization.

Page URLs and Titles

The most common PHI vector is the page URL itself. Most data layers automatically capture the current URL with every event. On a healthcare website, that URL often contains clinical context:

/providers/cardiology/dr-smith/schedule
/conditions/hiv-testing/results-portal
/services/bariatric-surgery/bmi-calculator
/mental-health/anxiety-depression/intake-form

Page titles carry the same risk. A title like "Schedule Your Oncology Consultation" combined with an IP address and timestamp creates a record that a specific person, at a specific time, was seeking cancer treatment.

Kaiser Permanente's $47.5 million settlement stemmed in part from search terms and medical information flowing through tracking code to Google, Microsoft, Meta, and X. The data layer on Kaiser's websites passed this contextual information to every tag listening on the page, and 13.4 million members were affected over a seven-year window.

Form Field Values

Form interactions are the second major leak point. Healthcare websites have intake forms, screening questionnaires, appointment request forms, and patient registration flows. A default data layer configuration might capture:

Field names: diagnosis, medication, insurance_provider
Field values: the actual text a visitor types
Form identifiers that describe their purpose: depression_screening_phq9

BetterHelp's $7.8 million FTC settlement resulted from mental health intake questionnaire responses flowing to Facebook, Snapchat, Criteo, and Pinterest via tracking pixels. The intake form data moved through the data layer and into the pixel event payloads. The FTC found that BetterHelp used the fact that users had previously been in therapy to build Facebook lookalike audiences.

Custom Event Names

This is the vector that catches teams who think they've been careful. You strip out form values, you sanitize URLs, but you name your events descriptively:

appointment_booked_psychiatry
prescription_refill_requested
insurance_verified_mental_health
paid_weekly_therapy (Monument's exact event name)

Each of these event names, when paired with a user identifier, an IP address, or even a session ID, constitutes a PHI disclosure. The event name itself encodes the health context. Monument's FTC case made this explicit: their custom pixel events had titles like "Paid: Weekly Therapy" and "Paid: Med Management," and these were sent alongside email addresses and IP addresses to Meta.

Referrer URLs and UTM Parameters

A less obvious source: the referring URL or UTM campaign parameters. If a visitor arrives from a Google search for "bipolar disorder treatment near me," that search query often appears in the referrer string. If your paid campaigns use UTM parameters like utm_campaign=depression_therapy_q1, that campaign name flows into the data layer and, from there, into every connected platform.

Safe Event Naming Conventions

The principle behind safe event naming is simple: events should describe what the visitor did, not why they did it or what health context surrounds it.

Use Generic Action Categories

Replace condition-specific or service-specific event names with generic behavioral categories:

Unsafe Event Name	Safe Alternative
`appointment_booked_psychiatry`	`appointment_booked`
`prescription_refill_requested`	`form_submitted`
`oncology_page_viewed`	`service_page_viewed`
`depression_screening_completed`	`screening_completed`
`paid_weekly_therapy`	`subscription_started`
`insurance_verified_mental_health`	`insurance_verified`

The safe versions tell your analytics platform that a form was submitted or a page was viewed. They do not tell the platform what kind of form, what condition the page addressed, or what type of appointment was booked.

Preserve Marketing Utility Through Aggregation

The immediate objection is: "If all appointments are just appointment_booked, how do I know which service lines are driving conversions?"

The answer is that service-line attribution happens in your own first-party analytics, not in the data layer that feeds third-party tools. Your server-side tracking infrastructure can enrich events with service-line data after they've been filtered, ensuring that your internal dashboards have the detail you need while external platforms receive only the generic event.

Version Your Naming Conventions

Document your event taxonomy and version it. When marketing teams need to add new events, they should check the taxonomy first. A living document prevents the gradual drift back toward descriptive naming that happened at organizations like Monument, where someone created "Paid: Weekly Therapy" because it was useful for reporting. Useful for reporting is exactly the criterion that leads to PHI in event names.

Data Layer Schema Design

Beyond event names, the properties attached to each event need the same scrutiny. Here's a schema framework that separates what belongs in a data layer from what doesn't.

Properties to Include

These properties provide marketing analytics value without health context:

window.dataLayer.push({
  event: 'form_submitted',
  event_category: 'conversion',
  page_type: 'service',           // generic: 'service', 'blog', 'location', 'landing'
  traffic_source: 'paid_search',  // how the visitor arrived
  device_type: 'mobile',          // device category
  region: 'northeast',            // broad geographic region, not city/zip
  form_type: 'appointment',       // generic: 'appointment', 'contact', 'newsletter'
  consent_status: 'granted',      // consent state at time of event
  session_id: 'abc123',           // anonymous session identifier
  timestamp: '2026-03-13T14:22:00Z'
});

window.dataLayer.push({
  event: 'form_submitted',
  event_category: 'conversion',
  page_type: 'service',           // generic: 'service', 'blog', 'location', 'landing'
  traffic_source: 'paid_search',  // how the visitor arrived
  device_type: 'mobile',          // device category
  region: 'northeast',            // broad geographic region, not city/zip
  form_type: 'appointment',       // generic: 'appointment', 'contact', 'newsletter'
  consent_status: 'granted',      // consent state at time of event
  session_id: 'abc123',           // anonymous session identifier
  timestamp: '2026-03-13T14:22:00Z'
});

window.dataLayer.push({
  event: 'form_submitted',
  event_category: 'conversion',
  page_type: 'service',           // generic: 'service', 'blog', 'location', 'landing'
  traffic_source: 'paid_search',  // how the visitor arrived
  device_type: 'mobile',          // device category
  region: 'northeast',            // broad geographic region, not city/zip
  form_type: 'appointment',       // generic: 'appointment', 'contact', 'newsletter'
  consent_status: 'granted',      // consent state at time of event
  session_id: 'abc123',           // anonymous session identifier
  timestamp: '2026-03-13T14:22:00Z'
});

Every property here describes the interaction context without revealing health information. A page_type of "service" tells you the visitor was on a service page without specifying which service. A form_type of "appointment" tells you they booked an appointment without indicating what kind.

Properties to Exclude

These should never appear in a data layer that feeds external tools:

Condition or specialty names: service_name: 'cardiology', condition: 'diabetes'
Provider identifiers: provider_name: 'Dr. Smith', provider_specialty: 'psychiatry'
Clinical form data: any field value from intake forms, screening questionnaires, or medical history forms
Granular location data: ZIP codes, city names, or facility identifiers that could narrow down a small patient population
Insurance information: carrier names, plan types, member IDs
Full page URLs or titles: pass a sanitized page_type instead of the raw URL
Search queries: internal site search terms often contain health conditions
Referrer URLs: may contain the visitor's original search query

Handle URL Sanitization at the Data Layer Level

Rather than relying on downstream tools to strip sensitive URL components, sanitize at the source. Your data layer initialization should transform URLs before any tag reads them:

// Instead of passing the raw URL
// '/providers/oncology/dr-smith/schedule'

// Pass a sanitized version
page_type: 'provider',
page_action: 'schedule'

// Instead of passing the raw URL
// '/providers/oncology/dr-smith/schedule'

// Pass a sanitized version
page_type: 'provider',
page_action: 'schedule'

// Instead of passing the raw URL
// '/providers/oncology/dr-smith/schedule'

// Pass a sanitized version
page_type: 'provider',
page_action: 'schedule'

This way, even if a tag manager misconfiguration or a new marketing pixel reads the data layer, it only finds sanitized values. The raw URL never enters the event stream.

Server-Side Filtering as a Safety Net

A well-designed data layer is your first line of defense. Server-side tracking architecture is your second. Even with the most disciplined event naming and property exclusion, mistakes happen. A developer adds a new event with a descriptive name during a product launch. A CMS update changes page title patterns. A marketing team member creates a custom event for a specific campaign.

Server-side filtering catches what the data layer design misses. Here's how the layers work together:

Layer 1: Data layer design. Events are structured with generic names and sanitized properties. This prevents PHI from entering the event stream under normal operation.

Layer 2: Server-side event processing. Before any event reaches an external destination (Google Ads, Meta, your analytics platform), it passes through server-side infrastructure that applies a second round of filtering. This layer can:

Strip any property that matches a blocklist of sensitive field names
Reject events whose names contain condition-specific keywords
Redact URL parameters that weren't explicitly allowlisted
Enforce consent verification before dispatching any event to any destination

Layer 3: Consent-gated dispatch. Consent and privacy are becoming the next frontier of healthcare compliance, driven by state privacy laws and rising patient expectations around data control. Even if an event passes through Layers 1 and 2, it should only reach external platforms if the visitor has granted consent, verified server-side. A consent management platform that enforces consent at the server level, not through a JavaScript check that can be bypassed, ensures that no event reaches a destination without explicit permission.

This layered approach means a single failure (a descriptive event name slipping through, a new form field captured by default) doesn't automatically become a PHI disclosure. The server-side layer catches it before it reaches any third party.

Testing and Validation: Auditing Your Data Layer for PHI Leakage

Designing a compliant data layer is necessary but not sufficient. You need to verify that it works as intended, continuously. Healthcare websites change constantly. Marketing teams add campaigns, developers ship features, CMS plugins update, and tag managers accumulate tags. What was compliant at launch may not be compliant six months later.

Manual Audit Process

Start with a manual audit of your current data layer:

Inventory all events. Open your browser's developer console on every major page template (homepage, service pages, provider pages, appointment flow, blog, patient portal login). Run console.log(window.dataLayer) and review every event object. List every event name and every property.
Flag descriptive names. Any event name that contains a condition, specialty, treatment type, or provider name is a PHI risk. Search for keywords: specific medical terms, department names, procedure types.
Check property values. Look for properties that contain raw URLs, page titles, form field names, or search queries. Any property that would let a recipient infer a health condition or treatment interest is a problem.
Test form interactions. Submit test data through every form on the site. Check what the data layer captures during and after submission. Pay attention to field-level tracking that might capture keystrokes or partial entries.
Review referrer handling. Navigate to your site from a search engine using a health-related query. Check whether the referrer string or search terms appear anywhere in the data layer.

Automated Scanning

Manual audits are a point-in-time snapshot. They miss the drift that happens between audits. Automated web scanning tools fill this gap by crawling your website on a regular schedule and flagging:

New scripts or tags that weren't present in the last scan
Cookies set by third-party domains without a BAA
Data layer events that contain keywords from a healthcare-specific sensitivity dictionary
Network requests to ad platforms or analytics vendors that carry unfiltered URL paths

Every major enforcement case involved tracking that ran for years before detection. Kaiser's tracking ran for seven years. Advocate Aurora's ran for five. Automated scanning is the mechanism that prevents your organization from joining that list.

Pre-Deployment Validation

Build data layer validation into your deployment pipeline. Before any code change goes live:

Run a schema validator against the data layer output to confirm all events match your approved taxonomy
Check for new event names that haven't been reviewed
Verify that URL sanitization is applied consistently across all page templates
Confirm that consent checks fire before any data layer push

This shifts PHI detection from "we'll catch it in the next audit" to "it can't ship without passing validation." The cost of catching a descriptive event name in a CI/CD pipeline is zero. The cost of catching it after it's been running for three years is measured in the millions.

Building Consent Into the Foundation

The organizations building consent into their data layer architecture today are positioning themselves ahead of a regulatory curve that shows no signs of flattening. State privacy laws are expanding. Patient expectations around data control are rising. The December 2022 OCR guidance on tracking technologies and the joint OCR/FTC warning letters sent to 130 hospital systems in July 2023 made the direction clear.

A consent-aware data layer doesn't just check whether consent was granted. It conditions the entire event stream on consent status:

No consent granted: The data layer captures no behavioral events. Zero events fire. Zero properties are set.
Essential consent only: The data layer captures functional events (page load, error tracking) with no marketing properties.
Full consent granted: The data layer captures the full set of approved generic events with sanitized properties.

This three-tier model ensures that your data layer respects patient choices at the structural level, not as an afterthought bolted onto an existing event stream. When paired with server-side consent enforcement, it creates a system where consent violations require multiple independent failures rather than a single oversight.

Getting Started

If you're evaluating your current data layer or building one from scratch, the priority order is:

Audit your existing events. Inventory every event name and property in your current data layer. Flag anything that encodes health context.
Define a safe taxonomy. Create a versioned document of approved event names and properties. Distribute it to every team that touches your website.
Sanitize at the source. Rewrite your data layer initialization to transform URLs, strip form values, and enforce generic naming before any downstream tool reads the data.
Add server-side filtering. Implement a second layer of protection that catches anything the data layer design misses.
Automate validation. Add data layer schema checks to your deployment pipeline and continuous scanning to your production monitoring.

The gap between "our data layer is probably fine" and "our data layer is verified compliant" is the gap that cost Monument its ability to advertise, BetterHelp $7.8 million, and Kaiser $47.5 million. Each of those organizations thought their event data was routine marketing instrumentation. It was. It was also PHI.

Ours Privacy provides server-side tracking infrastructure with built-in data layer sanitization, consent-gated event dispatch, and continuous web scanning for healthcare organizations. Every event passes through server-side filtering before reaching any external platform, and a comprehensive BAA covers the full data pipeline backed by SOC 2 Type II certification across all five trust criteria.

FAQ

What exactly makes a data layer event PHI under HIPAA?

An event becomes PHI when it combines health-related information (a condition name, a treatment type, a provider specialty) with an individual identifier (an IP address, a device ID, a session cookie, an email address). A generic event like page_viewed paired with an IP address is not PHI. An event like oncology_appointment_booked paired with the same IP address likely is. The OCR's December 2022 guidance clarified that even IP addresses on unauthenticated healthcare pages could constitute PHI when combined with health context.

Can I use Google Tag Manager's built-in data layer for healthcare?

Google Tag Manager's dataLayer object is a delivery mechanism, not a compliance control. You can structure compliant events within GTM's data layer, but GTM itself doesn't sanitize or filter what you push into it. The compliance responsibility falls on how you design the events, what properties you include, and what tags consume the data. A more robust approach is to use a healthcare-specific tag management solution that enforces filtering rules at the platform level rather than relying on manual configuration.

How do I preserve marketing attribution if I can't put service-line details in events?

Service-line attribution is handled by separating your internal analytics from your external event stream. Your server-side infrastructure can enrich events with service-line data in your own first-party analytics platform (where you control access and have appropriate safeguards) while sending only generic events to external platforms like Google Ads or Meta. You still get full attribution reporting internally. External platforms receive the conversion signal they need for optimization without the health context they don't need and shouldn't have.

What's the difference between sanitizing the data layer and using server-side tracking?

They solve different parts of the same problem. Data layer sanitization controls what information enters the event stream at the point of collection. Server-side tracking controls where that event stream goes after collection, routing it through your infrastructure instead of sending it directly from the browser to third parties. A compliant healthcare setup uses both: a sanitized data layer ensures clean inputs, and server-side architecture ensures controlled outputs. One without the other leaves gaps.

How often should I audit my data layer for PHI?

Continuous automated scanning is the baseline. Manual audits should happen whenever you launch a new page template, add a new form, introduce a new marketing campaign with custom events, or update your CMS or tag manager. At minimum, conduct a thorough manual review quarterly. The organizations in enforcement cases weren't auditing at all, which is how tracking ran undetected for five to seven years. The goal is to make PHI in your data layer impossible to ship, not something you hope to catch eventually.

Continue Learning

Explore more HIPAA compliance resources for healthcare marketers.

HIPAA Compliant Tools

Tool Compliance Reviews

Find out which marketing tools are HIPAA compliant and which ones put your organization at risk.

Server-Side Tracking

Server-Side Tracking Guides

Replace risky client-side pixels with secure, compliant data collection that protects patient privacy.

← Back to Learn Center