Billing Event Schema Design: The Engineering Decision That Determines Your Compliance Posture

Photo by Luis Gomes on pexels.com
The billing event schema is decided before you write the first line of billing code. Most engineering teams do it the other way around: instrument first, get pricing in front of customers, then discover that the session timestamps in their billing events just became PHI — or that the float precision errors in their quantity field compound across 80 million events into a $2,400 rounding discrepancy on a single invoice.
The schema defines what your billing pipeline can measure. It determines which fields become compliance liabilities. It fixes the granularity ceiling for every audit trail you’ll ever produce. And it is expensive to change in production — customers have integration dependencies on your billing event format, your historical invoice data is stored in the current schema, and a schema migration touches the entire pipeline simultaneously.
This article covers the engineering decisions that go into a billing event schema that works correctly under real operating conditions: idempotency key construction, field-level compliance design, decimal precision, schema versioning, and vertical-specific examples across Dev Tools, Healthtech, and IoT.
Why the Schema Comes Before the Pricing Model
Most teams discover this order problem after the fact. The sequence typically goes:
- Build the product, instrument it for observability (logs, traces, metrics)
- Decide on a pricing model
- Try to build billing on top of existing instrumentation
- Discover the instrumentation doesn’t capture what the pricing model requires
A CI/CD platform that instruments at the build-step level cannot produce a per-build-minute invoice without re-instrumenting. A telehealth platform that logs session IDs for debugging can’t remove them from the billing pipeline without a schema migration. An IoT platform that aggregates device readings to daily fleet totals for dashboards cannot reconstruct per-device billing retroactively.
The instrumentation granularity decision is a one-way door. You can aggregate fine-grained data upward; you cannot disaggregate coarse-grained data downward. The correct sequence:
1. Choose your billable metric
2. Design the billing event schema to capture that metric
3. Instrument your product to emit billing events at that granularity
4. Build the billing pipeline on top of the event stream
5. Set your pricing rates
Rates can change without touching the schema. The schema is fixed once it’s in production with real data behind it.
The Two-Layer Pattern
The most important structural decision in billing event schema design is what goes into the billing event and what stays in the originating system. This is particularly critical in regulated industries — but the discipline of separating billable metadata from operational records is good practice in every vertical.
Layer 1: The billing event (what goes to the billing pipeline)
Contains only the fields needed to calculate, aggregate, and audit the invoice. No content, no user-identifiable data, no operational context that isn’t relevant to billing.
| |
Layer 2: The source record (stays in your application database)
Contains the operational detail that the billing event references. Never flows to the billing pipeline.
| |
The source_reference in Layer 1 (req_8b3e9c4d) is enough to resolve a billing dispute — trace back to the source record, verify the event occurred, produce the evidence. But the billing pipeline has no visibility into the user identity, request content, or session context.
For Healthtech: the source_reference is a de-identified session token mapped to a patient encounter in a separately secured clinical record store. The billing event never contains the patient ID, provider ID, or session timestamp that would constitute PHI.
For IoT: the source_reference is a device-message reference — the device hardware ID and message sequence number as a composite key. The patient’s medical record number (for connected medical devices) never appears in the billing event.
Field-by-Field: The Minimum Required Schema
Every billing event needs exactly these six fields. Additional fields add complexity without proportional value.
event_id — Stable Idempotency Key
The most important field. Gets almost no attention in billing tutorials.
The event_id must be deterministic — the same physical event must always produce the same ID. If two copies of the same event arrive at the billing pipeline (from a retry, a network duplicate, or a batch replay), both must produce the same event_id so the second write is deduplicated.
A random UUID assigned at ingestion is wrong. Two separate delivery attempts for the same underlying event will produce different UUIDs and both will be stored, creating a double-charge.
The correct approach: derive the event_id from the content of the event itself.
| |
What to include in the hash input:
customer_id— prevents cross-tenant collision if source references are not globally uniquemetric— a single source event may generate multiple billing events for different metrics; include metric to distinguish themsource_reference— the unique identifier of the originating event in your application
Do not include timestamp or quantity in the hash input if either could vary between delivery attempts (e.g., a quantity that’s calculated at emission time from a fluctuating counter).
Vertical-specific idempotency key design:
| Vertical | Source fields for idempotency hash |
|---|---|
| Dev Tools (API) | customer_id + request_id |
| Dev Tools (CI/CD) | customer_id + build_id + step_id |
| Healthtech (telehealth) | customer_id + session_token (de-identified) |
| IoT (MQTT) | customer_id + device_id + message_sequence_number |
| IoT (HTTP) | customer_id + device_id + reading_timestamp |
For IoT with MQTT QoS 1: the message_sequence_number from the MQTT protocol header is the correct deduplication anchor — it’s stable across retransmissions for the same original message.
schema_version — Non-Negotiable From Day One
Add this field even if you never intend to change the schema. You will change the schema. Having version 1 events and version 2 events coexisting in the same table is routine — without a version field you cannot tell them apart.
| |
Use a simple integer string. Do not use semver for event schemas — minor version differences in billing events are not minor. Any field addition, removal, or type change that affects invoice calculation is a major version.
The version field enables the aggregation pipeline to route events to version-specific calculation logic:
| |
Historical invoices recalculated for dispute resolution will use the schema version that was active when the invoice was originally generated — not the current version. Without the version field, recalculation is impossible.
customer_id — Tenant, Not End-User
The customer_id must identify the billing entity — the company or account that will receive the invoice. It must never be an end-user identifier.
This distinction matters for three reasons:
- Compliance: End-user IDs in billing events are personal data under GDPR;
customer_id(a B2B tenant identifier) typically is not. - Aggregation: Invoice totals aggregate over all events for a
customer_id. End-user-level aggregation is a different pipeline — do not conflate. - Audit: A customer disputing their invoice queries by
customer_id. An end-user identifier in this field forces the customer to understand your internal data model to verify their own invoice.
For multi-tenant B2B SaaS: customer_id is the organization/workspace/account ID. For B2B2C platforms where a corporate customer’s end-users generate billing events: customer_id is the corporate customer’s ID, never the consumer’s ID.
metric — The Billable Unit Type
A string enum identifying what is being measured. Not a description — a machine-readable identifier your pricing engine uses to apply the correct rate.
| |
Design principles:
- Use snake_case, all lowercase, no spaces
- Make it specific enough to distinguish between different billable units you might charge separately (
api_call_v1vs.api_call_v2if different rates apply to different API versions) - Never use free-text descriptions — “API call to /inference endpoint” is a description, not a metric identifier
- Document your metric vocabulary as a schema artifact, not just in code comments
When you add a new billable metric, add a new metric identifier — do not reuse existing identifiers with different semantics.
quantity — Decimal, Not Float
This field carries more risk than any other. Floating-point arithmetic errors in billing accumulate across millions of events.
| |
At 80 million API calls, float arithmetic produces a $0.000000003 error per call. Multiplied across the invoice, that’s a $0.00 discrepancy — harmless. But at higher rates or with aggregated quantities:
| |
Float arithmetic on this example produces a 2-cent error. At scale across thousands of customers, these errors aggregate into material discrepancies that appear in billing disputes.
The rule: store quantity as a string representation of a decimal in the billing event. Deserialize to Decimal (Python) or BigDecimal (Java/Kotlin) in the pricing engine. Never use float or double for any monetary or quantity calculation.
SQL schema — use DECIMAL(20, 10) for quantity:
| |
The PRIMARY KEY (event_id) constraint enforces idempotency at the database level — a duplicate insert with the same event_id will be rejected without requiring application-level deduplication logic. This is your last line of defense; the application should still deduplicate before attempting the insert, but the database constraint catches anything that slips through.
timestamp — Event Time, Not Ingestion Time
The timestamp field must record when the event occurred in the product, not when it was received by the billing pipeline. The difference matters at billing period boundaries.
A user completes an API call at 23:59:57 on March 31. The event arrives at the billing pipeline at 00:00:03 on April 1 (network latency + queuing). If timestamp is set to ingestion time, this event is billed in April instead of March. At scale, this creates systematic period boundary errors that compound with every late-arriving event.
| |
Requirements:
- UTC — never local time
- ISO 8601 format with millisecond precision
- Set by the application at the point of event emission, not the billing pipeline
Late-arriving events (events where timestamp is more than N hours before the billing pipeline’s current processing time) require an explicit policy. See Engineering Metered Billing for IoT for the three policy options and their trade-offs.
source_reference — De-Identified Audit Pointer
A pointer back to the originating record in your application system. Used to resolve billing disputes (“show me the evidence that this event occurred”) without exposing application data in the billing pipeline.
For this field to be useful, the mapping from source_reference to application record must be maintained and queryable indefinitely — or at minimum for the retention period of the billing event that references it.
Design the reference to be:
- Stable: the reference must remain valid as long as the billing event that uses it exists
- De-identified: no PII, no PHI, no sensitive operational data
- Resolvable: given the
source_reference, an authorized internal query must be able to retrieve the original event from the application database
What to Explicitly Exclude
The fields that should not be in a billing event:
| Field type | Why to exclude | Risk if included |
|---|---|---|
| User ID / patient ID | Personal data under GDPR; PHI under HIPAA if health-context | Billing pipeline becomes a PHI store; requires BAA |
| Session timestamps correlated to individuals | Behavioral data; HIPAA-adjacent for healthcare | Session timing can identify provider/patient activity |
| Request/message payload | Operational data; may contain PII or proprietary content | Billing store holds business-sensitive data beyond audit purposes |
| IP addresses | Personal data under GDPR; geolocation inference | Billing pipeline becomes a personal data processor |
| Internal routing metadata | Operational context irrelevant to billing | Schema bloat; version migration complexity |
| Device hardware identifiers (MAC, IMEI) | Can map to individual or patient in medical contexts | Equivalent to patient ID in clinical IoT billing events |
| Pricing rates | Rates change; embedding them in events locks historical recalculation | Cannot reprice historical data for corrections; rate changes require event schema migration |
Pricing rates in events deserves special attention. The billing event records what happened — quantity, metric, when. The pricing engine records what it costs — rate per unit, tiers, effective dates. These are separate concerns. Never store the rate that was applied in the billing event; store it in a separately versioned pricing configuration. This lets you correct a pricing error retroactively by reprocessing events through the corrected rate schedule without touching the event store.
Schema Versioning in Practice
Schema version 1 is always the version you built with insufficient forethought. Version 2 is when you add the fields you should have included from the start. Version 3 is when a compliance requirement changes what you’re allowed to store.
The migration pattern that works without downtime:
1. Add the new field to the schema as NULLABLE (for compatibility with v1 writers)
2. Update writers to emit schema_version: "2" and populate the new field
3. Update the aggregation pipeline to handle both versions
4. Backfill v1 events where the new field can be derived (not all cases — accept gaps)
5. Once all active writers have deployed v2, enforce NOT NULL on the new field in schema
What does not work: changing the type of an existing field. If you need to change quantity from FLOAT to DECIMAL (a real migration many teams face), you need a new field (quantity_decimal) with a version bump, run both in parallel, and deprecate the old field after all historical data is migrated.
Schema changelog as a first-class artifact:
billing_events schema changelog
-------------------------------
v1 (2025-05-01): Initial schema. customer_id, metric, quantity (FLOAT), timestamp.
v2 (2026-01-15): Added schema_version field. Added source_ref. Changed quantity to
DECIMAL(20,10) stored as string in JSON events. Old float events
preserved in quantity_v1 column; quantity_decimal added for v2+.
v3 (2026-03-01): Added schema_version to primary key hash for multi-version
idempotency safety. No data migration required.
Vertical-Specific Schema Examples
Dev Tools: Token-Based AI API
| |
Note: metric is llm_input_token, not api_call. If you charge separately for input tokens, output tokens, and fine-tuned model calls, each is a separate metric with a separate billing event. One API request may generate 2–3 billing events (input tokens + output tokens + cache read tokens). All share the same source_reference (req_7d2e4f81), so a dispute can be resolved by querying all events for that request ID.
Healthtech: Telehealth Consultation Session
| |
The source_reference is tok_5c8d2a1f — a platform-generated session token mapped to the clinical encounter record in the EHR. The billing event contains no patient ID, no provider ID, no session duration, and no session timestamp beyond the billing event’s own timestamp. The clinical details live in the EHR under separate access controls.
For a detailed treatment of the PHI exclusion pattern, see Usage-Based Billing for Healthtech SaaS.
IoT: Industrial Sensor Reading
| |
The source_reference encodes the device ID and MQTT message sequence number as a composite key: dev_b7e3:seq_00094712. This is the idempotency anchor for MQTT QoS 1 retransmissions — the same device message retransmitted after a network drop will generate the same event_id (since the source_reference is identical) and be deduplicated at the database level.
The device hardware identifier (dev_b7e3) is a platform-internal alias, not the device’s hardware MAC address or IMEI — the hardware address never appears in the billing pipeline.
The Pre-Production Schema Checklist
Before your first billing event hits production:
-
event_idis deterministic — the same event always produces the same ID -
event_iduniqueness is enforced as a PRIMARY KEY constraint at the database level -
quantityis stored asDECIMAL, notfloatordouble -
timestampis event time, not ingestion time; UTC; ISO 8601 -
schema_versionis present, even on version 1 -
customer_ididentifies the billing entity (tenant), not the end-user -
source_referenceis de-identified — no PII, no PHI, no hardware identifiers - No user IDs, patient IDs, session content, IP addresses, or payload data in the event
- Late-arrival policy is defined and tested (what happens to events timestamped before the current billing period?)
- Schema changelog document exists before v1 ships
- Pricing rates are stored in a separate rate schedule, not in the billing event

ABAXUS includes a production-validated billing event schema — deploy inside your own infrastructure with idempotency, decimal precision, and late-arrival handling built in
Self-hosted usage-based billing engine. Your billing data stays in your own database, in your own cloud region, under your own compliance controls. No per-transaction fees. Runs in your Kubernetes cluster.
See PricingCommon Schema Mistakes That Reach Production
Using random UUIDs as event IDs. Every retry, every network duplicate, every batch replay creates a new event in the billing store. The double-billing is silent — no error is thrown, the charge just appears twice on the invoice. Discovered in production during billing dispute resolution, when the customer’s event count doesn’t match yours.
Storing float for quantity. The error is invisible in development (amounts are small, discrepancies are fractions of cents). In production at scale, float accumulation produces invoice totals that don’t match what the pricing engine calculated. The discrepancy is random and non-reproducible, which makes it nearly impossible to debug.
Not including schema_version. You will change the schema. When you do, you need to process v1 and v2 events differently in the aggregation pipeline. Without the version field, the only way to distinguish them is by the presence or absence of the new field — which is fragile and breaks when you add optional fields.
Patient IDs or user IDs in source_reference. The intent is good — create a direct link back to the originating record. The problem is that a patient ID in the billing event is PHI, regardless of field name. The billing pipeline is now a PHI processor. All the HIPAA obligations that apply to the clinical system now apply to the billing system.
Timestamp set at ingestion, not event time. Events that arrive late (network latency, queue depth, IoT connectivity gaps) get assigned to the wrong billing period. At period boundaries this creates systematic errors: the last few minutes of a billing period are consistently under-counted, and the first few minutes of the next period are over-counted with events that belong to the prior period.
Pricing rates embedded in events. When a pricing error is discovered — the wrong rate was applied to a customer’s events for two weeks — the fix requires either reprocessing the events or correcting the rate in the event records. If rates are stored in the events, you’ve created an audit trail problem: the corrected invoice no longer matches the stored events. Rates belong in versioned pricing configuration, not in the event store.
Book an Architecture Review for Your Billing Event Schema
Getting the schema right before production is the highest-leverage engineering decision in your billing infrastructure. Getting it wrong creates problems that are expensive to fix: double-billing requires customer reconciliation, precision errors require retroactive recalculation, and PHI in billing events requires compliance remediation that goes well beyond a code change.
ABAXUS offers 30-minute architecture reviews for engineering teams designing or auditing their billing event schema. In one session:
- Schema review — walk through your current or planned event schema field by field; identify compliance risks, idempotency gaps, and precision issues before they hit production
- Idempotency key design — review your key construction for your specific event sources (API requests, MQTT messages, database CDC events, webhook callbacks)
- Vertical-specific guidance — PHI exclusion patterns for Healthtech, MQTT deduplication for IoT, high-frequency API billing for Dev Tools
- Migration path — if your current schema has known issues, a realistic migration plan that doesn’t require a big-bang redeployment
This is a technical conversation, not a product demo. Bring your current event schema or your implementation plan.
Book your 30-minute billing schema architecture review →
Related Reading
- Metered Billing Explained — the full billing pipeline: event ingestion, aggregation, pricing engine, and invoicing
- Usage-Based Billing for Healthtech SaaS — PHI-aware event schema design in the context of telehealth, EHR, and medical device billing
- 10 Use Cases: IoT SaaS — where the idempotency and connectivity gap patterns in this article apply in practice
- 5 Key Features of Usage-Based Billing Software — the infrastructure requirements that the schema design must support
- Common Usage-Based Pricing Mistakes — broader billing implementation pitfalls, including precision errors and idempotency gaps
- Engineering Metered Billing for IoT — late-arrival policies, clock skew, and fleet-level aggregation for IoT pipelines
ABAXUS is a self-hosted usage-based billing engine for engineering teams that need production-correct billing infrastructure. It ships with idempotent event ingestion, DECIMAL-precision quantity handling, schema versioning, configurable late-arrival policies, and a full audit trail — running inside your own Kubernetes cluster with your data in your own database. See pricing · Book a schema review
FAQs
Stop debugging billing. Start shipping product.
Your billing layer should be invisible infrastructure. In 30 minutes we map your event sources, identify your data contract gaps, and show you exactly what fixing the architecture looks like. No sales pitch.