Engineering Metered Billing for IoT: From Device Event to Customer Invoice

Cristian Curteanu
26 min read
Engineering Metered Billing for IoT: From Device Event to Customer Invoice

Photo by Pixabay on pexels.com

Table of Contents

A billing pipeline for a SaaS API handles maybe 10,000 events per second on a busy day, from clients that stay connected, send each request once, and have clocks synced to the millisecond. The billing pipeline for an industrial IoT platform handles millions of events per second, from devices that drift their clocks, go offline for days without warning, and retransmit every message at least once by protocol design.

The failure modes are completely different — and so is the pipeline architecture that handles them correctly.

General metered billing guides cover event ingestion, aggregation, and invoicing. What they don’t cover: what happens when 50,000 devices reconnect simultaneously after a network outage and dump three days of backlogged events. What happens to billing accuracy when a device’s clock is 6 minutes behind the billing period boundary. How fleet-level invoice totals are produced from per-device event streams without losing the per-device audit trail. And how billing event data must be partitioned when devices are deployed across EU and US jurisdictions.

This article is the pipeline guide that IoT SaaS engineering teams need before they ship their first metered invoice.


The Pipeline Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         IoT Device Fleet                                │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐               │
│  │ Device A │  │ Device B │  │ Device C │  │ Device N │  ...           │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘               │
└───────┼─────────────┼─────────────┼──────────────┼───────────────────-─┘
        │ MQTT / HTTP │             │              │
        ▼             ▼             ▼              ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    Protocol Gateway / Broker                            │
│   (AWS IoT Core / Azure IoT Hub / Mosquitto / custom MQTT broker)       │
│   - TLS termination                                                     │
│   - Device authentication (X.509 certificates / SAS tokens)            │
│   - Message routing to downstream queue                                 │
└──────────────────────────────┬──────────────────────────────────────────┘
                               │ normalized message envelope
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Message Queue / Event Stream                        │
│   (Apache Kafka / AWS Kinesis / Azure Event Hub)                        │
│   - Partition by customer_id for ordered processing per tenant          │
│   - Retention: ≥ max expected connectivity gap duration + 50% buffer    │
│   - Replayable: supports re-processing on pipeline failures             │
└──────────────────────────────┬──────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                   Billing Consumer (per partition)                      │
│   - Extract billing-relevant fields from message envelope               │
│   - Derive deterministic idempotency key                                │
│   - Apply late-arrival policy (check event timestamp vs period state)   │
│   - Attempt idempotent write to billing event store                     │
└──────────────────────────────┬──────────────────────────────────────────┘
                               │ deduplicated billing events
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      Billing Event Store                                │
│   - Immutable append-only log (no updates, no deletes)                  │
│   - PRIMARY KEY on event_id enforces idempotency at DB level            │
│   - Partitioned by (customer_id, billing_period) for query efficiency   │
│   - Retention: matches regulatory requirements (HIPAA: 6yr, SaMD: 10yr)│
└──────────┬───────────────────────────────────────────┬──────────────────┘
           │ per-device granularity                    │ per-tenant totals
           ▼                                           ▼
┌──────────────────────┐                  ┌────────────────────────────┐
│  Device Audit Trail  │                  │  Invoice Aggregation Job   │
│  (per-device counts  │                  │  (runs at period close,    │
│  for dispute         │                  │  applies rate schedule,    │
│  resolution)         │                  │  generates invoice)        │
└──────────────────────┘                  └────────────┬───────────────┘
                                                       │
                                                       ▼
                                          ┌────────────────────────────┐
                                          │  Customer-Facing Invoice   │
                                          │  + Real-Time Usage         │
                                          │    Dashboard               │
                                          └────────────────────────────┘

Each stage has a specific job and a specific failure mode. The rest of this article covers the five that IoT makes hard: MQTT deduplication, connectivity gap handling, clock skew, fleet-level aggregation, and multi-region data residency.


Stage 1: The Protocol Gateway

The gateway is not a billing component — it’s the network boundary between device and platform. But the decisions made here shape everything downstream.

Protocol choice shapes message delivery guarantees:

ProtocolDelivery guaranteeBilling implication
MQTT QoS 0At most once (fire and forget)Messages may be lost; billing can under-count
MQTT QoS 1At least onceDuplicates guaranteed; billing pipeline must deduplicate
MQTT QoS 2Exactly onceNo duplicates; expensive (4-packet handshake per message); rarely used at scale
HTTP POSTAt least once (on retry)Application-level idempotency required on retries
CoAPAt most / at least onceDepends on message type (CON vs NON)

The practical choice: MQTT QoS 1 is the standard for IoT deployments that care about data completeness. At-most-once (QoS 0) creates billing under-counts when messages are dropped. Exactly-once (QoS 2) adds 4× the network overhead. QoS 1 duplicates are manageable with proper idempotency — which the billing consumer handles.

What the gateway must add to each message:

  • A stable session or connection identifier (for clock skew detection)
  • The broker-received timestamp (for events where the device clock is untrusted)
  • The tenant/customer mapping (derived from the device’s X.509 certificate or provisioning record)

The gateway is the last point where you can enrich messages with trusted infrastructure-side data before they enter the billing pipeline. Use it.


Stage 2: The Message Queue

The queue decouples ingestion rate from processing rate. An industrial sensor network that generates 500,000 events per second cannot write directly to a billing database — the database cannot sustain that write rate, and any downstream failure would cause event loss.

Queue configuration decisions that affect billing:

Partition key: Partition by customer_id. This ensures all events for a given tenant are processed in order by a single consumer, which simplifies the in-order idempotency check and prevents cross-tenant interference. It also enables per-tenant consumer scaling.

Retention window: The retention period must be at least as long as your maximum expected connectivity gap, plus a processing buffer. If your late-arrival policy accepts events up to 72 hours after the billing period closes, and devices can be offline for up to 48 hours, your queue retention must be at least 120 hours. A standard 24-hour Kafka retention will lose events from devices that reconnect after a 48-hour outage — the events are emitted by the device, reach the gateway, but are no longer in the queue by the time the consumer catches up.

Replayability: The queue must support replay from an arbitrary offset. When the billing consumer crashes mid-processing, recovery requires replaying from the last committed offset without duplicating events already written to the event store. This is idempotent replay — the idempotency key on the event store handles the duplicates, but the queue must expose the mechanism to re-read from an earlier position.


Stage 3: The Billing Consumer and MQTT Deduplication

The consumer is where MQTT QoS 1 duplicates meet idempotency logic. This is the stage most teams get wrong in their first implementation.

The duplicate problem in numbers: Under normal MQTT QoS 1 operation, a device that publishes 1 million messages per day to a reliable broker will see approximately 0.1–1% duplicate delivery rate — between 1,000 and 10,000 duplicates per day, per connected device. For a fleet of 10,000 devices, that’s 10 million to 100 million potential double-billing events per day before idempotency.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import hashlib
import json
from decimal import Decimal
from datetime import datetime, timezone

def process_billing_event(raw_message: dict, db_session) -> bool:
    """
    Idempotent billing event consumer.
    Returns True if event was written (new), False if deduplicated.
    """

    # 1. Extract billing-relevant fields
    customer_id      = raw_message["tenant_id"]
    device_id        = raw_message["device_alias"]   # platform alias, not hardware ID
    message_seq      = raw_message["mqtt_message_id"]
    metric           = derive_metric(raw_message)
    quantity         = Decimal(str(raw_message["quantity"]))
    event_ts         = parse_timestamp(raw_message["device_timestamp"])

    # 2. Derive deterministic idempotency key
    #    Same device + same message sequence = same event_id, always
    canonical = json.dumps({
        "customer_id": customer_id,
        "device_id":   device_id,
        "msg_seq":     message_seq,
        "metric":      metric,
    }, sort_keys=True)
    event_id = "sha256:" + hashlib.sha256(canonical.encode()).hexdigest()

    # 3. Apply late-arrival policy
    period_state = get_billing_period_state(customer_id, event_ts)
    if period_state == "closed_beyond_grace":
        log_late_rejection(event_id, event_ts, customer_id)
        return False  # reject — outside grace window
    elif period_state == "closed_within_grace":
        target_period = get_closed_period(customer_id, event_ts)
    else:
        target_period = get_open_period(customer_id)

    # 4. Idempotent write — DB PRIMARY KEY constraint handles duplicate rejection
    try:
        db_session.execute("""
            INSERT INTO billing_events
                (event_id, schema_version, customer_id, metric, quantity,
                 timestamp, source_reference, billing_period_id, ingested_at)
            VALUES
                (%s, 2, %s, %s, %s, %s, %s, %s, NOW())
        """, (
            event_id, customer_id, metric, quantity,
            event_ts, f"{device_id}:{message_seq}",
            target_period, 
        ))
        db_session.commit()
        return True   # new event written

    except UniqueViolationError:
        db_session.rollback()
        return False  # duplicate — silently dropped

The UniqueViolationError on event_id is not an error condition — it’s the expected deduplication path. The billing consumer should not log these as errors, but it should track the deduplication rate as a metric. A sudden spike in deduplication rate indicates a device or gateway issue producing abnormal retransmission rates.


The Connectivity Gap Problem

The connectivity gap is the IoT billing failure mode that has no equivalent in standard SaaS billing. A device goes offline — power cycle, network outage, firmware update, physical transit through a dead zone — and reconnects days later with a batch of timestamped events from the offline period.

From the billing pipeline’s perspective, events with timestamps from 72 hours ago are arriving now. The billing period they belong to may already be closed.

Three Policy Options

Option 1: Grace Window (Recommended)

Accept late events if their timestamp falls within a defined grace window after the billing period closed. Reject events beyond the window.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
GRACE_WINDOW_HOURS = 72  # configurable per customer contract

def get_billing_period_state(customer_id: str, event_ts: datetime) -> str:
    period = find_billing_period_for_timestamp(customer_id, event_ts)

    if period.is_open:
        return "open"

    hours_since_close = (datetime.now(timezone.utc) - period.closed_at).total_seconds() / 3600

    if hours_since_close <= GRACE_WINDOW_HOURS:
        return "closed_within_grace"
    else:
        return "closed_beyond_grace"

Trade-offs: Invoices are not issued until after the grace window closes (you can’t finalize the invoice while you might still accept more events). Customers must be told the grace window — it determines when they can expect to receive their invoice. For a monthly billing period with a 72-hour grace window, invoices issue on the 4th of the following month, not the 1st.

Option 2: Defer to Current Period

Ignore the event’s timestamp for billing period assignment. All events are credited to the current open billing period, regardless of when they occurred.

1
2
def assign_billing_period(customer_id: str, event_ts: datetime) -> str:
    return get_current_open_period(customer_id)  # always current period

Trade-offs: Simple to implement, simple to explain to customers. The billing is period-inaccurate — a customer who was offline in March and reconnected in April will see March’s usage on the April invoice. For most commercial IoT customers, this is acceptable. For regulated IoT (CMS-aligned RPM billing, utility metering for regulatory reporting), period accuracy is contractually required.

Option 3: Reopen Closed Periods

Accept late events into their correct billing period by re-opening the closed period, recalculating the total, and re-issuing the invoice.

Trade-offs: Accurate, but operationally complex. Customers receive amended invoices. Payment timing becomes unpredictable. This approach is only warranted when billing period accuracy is contractually or regulatorily required and the complexity cost is justified.

Choosing a Policy

FactorGrace WindowDefer to CurrentReopen
Invoice timing predictabilityMedium (grace window + N days)High (close immediately)Low (any period may reopen)
Billing accuracyHighLowHighest
Operational complexityMediumLowHigh
Customer communication burdenMedium (communicate the window)LowHigh (explain amended invoices)
Regulatory suitabilityMost casesCommercial IoT onlyRegulated IoT (CMS, utility)

The Clock Skew Problem

Device clocks drift. An industrial PLC in an air-gapped facility may have drifted 8 minutes from UTC. A GPS tracker loses GPS lock and reverts to its internal RTC, which drifts at 30 seconds per day. A cellular-connected device in a poor coverage area may be 90 seconds behind actual UTC.

Clock skew becomes a billing accuracy problem at billing period boundaries. A device whose clock is 5 minutes behind emits an event at what it believes is 23:58:00 on March 31, but the actual time is 00:03:00 on April 1. If you use the device timestamp, the event lands in March. If you use the server-received timestamp, it lands in April. Neither is perfectly accurate — but one is systematically predictable.

The Hybrid Timestamp Strategy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
MAX_CLOCK_SKEW_SECONDS = 300  # 5 minutes — tunable per device type

def resolve_event_timestamp(device_ts: datetime, server_received_ts: datetime) -> datetime:
    """
    Use device timestamp when it's within acceptable skew of server time.
    Fall back to server-received timestamp when the skew indicates an unreliable device clock.
    """
    skew_seconds = abs((server_received_ts - device_ts).total_seconds())

    if skew_seconds <= MAX_CLOCK_SKEW_SECONDS:
        return device_ts          # device clock is trustworthy; use it
    else:
        # Clock skew detected. Options:
        # 1. Use server-received timestamp (loses event-time accuracy)
        # 2. Use device timestamp anyway (accepts the skew)
        # 3. Flag for manual review if the event is near a period boundary
        if is_near_period_boundary(server_received_ts, threshold_minutes=10):
            flag_for_review(device_ts, server_received_ts, skew_seconds)
        return server_received_ts  # conservative: use server time

Period boundary tolerance window in aggregation:

When running the aggregation query at period close, include a tolerance window that catches events timestamped slightly outside the period due to clock skew:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
-- Aggregate billing events for customer in March 2026
-- Tolerance: accept events timestamped up to 5 minutes before period start
--            and flag events within 5 minutes after period end as skew-candidates

SELECT
    customer_id,
    metric,
    SUM(quantity) AS total_quantity,
    COUNT(*)      AS event_count,
    SUM(CASE WHEN timestamp < '2026-03-01 00:00:00+00'
                          AND timestamp > '2026-03-01 00:00:00+00' - INTERVAL '5 minutes'
             THEN 1 ELSE 0 END) AS skew_candidate_count
FROM
    billing_events
WHERE
    customer_id  = $1
    AND timestamp >= '2026-03-01 00:00:00+00' - INTERVAL '5 minutes'   -- clock skew buffer
    AND timestamp <  '2026-04-01 00:00:00+00'
    AND billing_period_id = 'period_2026_03'
GROUP BY
    customer_id, metric;

The skew_candidate_count in the result lets you audit how many events were attributed to this period because of the clock skew tolerance, and whether the tolerance window is calibrated correctly for your device fleet.


Fleet-Level Aggregation vs. Per-Device Audit Trail

Billing is per tenant. Events come from thousands of device IDs. These are two different outputs from the same event stream, and conflating them creates either billing inaccuracy or audit trail loss.

Two separate aggregation jobs, one event store:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Job 1: Invoice aggregation (per-tenant totals for invoicing)
-- Runs at billing period close

SELECT
    customer_id,
    metric,
    SUM(quantity)  AS total_quantity,
    COUNT(DISTINCT source_reference) AS unique_events
FROM
    billing_events
WHERE
    customer_id      = $1
    AND billing_period_id = $2
GROUP BY
    customer_id, metric;


-- Job 2: Device audit trail (per-device breakdown for dispute resolution)
-- Runs on demand or daily for customer dashboard

SELECT
    SPLIT_PART(source_reference, ':', 1) AS device_id,  -- extract from "dev_X:seq_Y"
    metric,
    COUNT(*)       AS event_count,
    SUM(quantity)  AS total_quantity,
    MIN(timestamp) AS first_event,
    MAX(timestamp) AS last_event
FROM
    billing_events
WHERE
    customer_id      = $1
    AND billing_period_id = $2
GROUP BY
    device_id, metric
ORDER BY
    device_id, metric;

Job 1 produces the invoice line item total. Job 2 produces the per-device breakdown that answers “why did my invoice increase — which devices generated more events this month?”

Important: Both queries run against the same billing_events table, which stores events at device-event granularity. The fleet-level total is always derivable by aggregating up; the device-level detail is preserved. Do not store only the fleet-level total — doing so destroys the audit trail.


Tiered Fleet Pricing

Tiered fleet pricing — where the per-unit rate depends on the total fleet size — is common in IoT because large fleet operators warrant volume discounts. It introduces an ordering problem that doesn’t exist in flat-rate billing.

The ordering problem: You cannot determine which pricing tier a customer falls into until all events for the billing period are in. If a customer has 18,500 active devices in a month, and your tiers are:

  • 1–10,000 devices: $5.00 per device
  • 10,001–50,000 devices: $3.50 per device
  • 50,000+: $2.00 per device

…then you need to know the final device count before you can apply the rate. The rate for device #1 depends on whether devices 2–18,500 also become active in the same period.

Two implementation patterns:

Pattern A: Apply tiered rate to the entire fleet at period close

Count total active devices at period close. Apply the appropriate tier rate uniformly to all active devices in the period. Simple, clean, and what most customers expect (“we had 18,500 devices active — we’re in the $3.50 tier for all of them”).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
-- At period close, calculate active device count and apply tiered rate

WITH active_devices AS (
    SELECT COUNT(DISTINCT SPLIT_PART(source_reference, ':', 1)) AS device_count
    FROM billing_events
    WHERE customer_id = $1 AND billing_period_id = $2
),
tier AS (
    SELECT
        rate_per_device,
        tier_name
    FROM pricing_tiers
    WHERE
        min_devices <= (SELECT device_count FROM active_devices)
        AND max_devices >= (SELECT device_count FROM active_devices)
)
SELECT
    (SELECT device_count FROM active_devices) * (SELECT rate_per_device FROM tier) AS invoice_amount,
    (SELECT tier_name FROM tier) AS applied_tier,
    (SELECT device_count FROM active_devices) AS active_device_count;

Pattern B: Marginal tiered pricing (escalating rates)

Apply each tier’s rate only to devices within that tier’s range. The first 10,000 devices are billed at $5.00; devices 10,001–18,500 are billed at $3.50. More complex to calculate and explain, but may be preferable if you want to avoid the cliff effect where a customer at 10,001 devices suddenly pays less per device than they did at 9,999.


Multi-Region Data Residency

IoT deployments are inherently cross-border. A logistics platform tracking containers from Hamburg to Houston generates events from EU-based devices that, under GDPR, must be processed under EU data protection rules. A remote patient monitoring platform serving EU patients must keep patient-correlated data within the EU. A SaaS billing vendor with US-only infrastructure receives your EU device billing events outside the EU by default.

The architecture decision:

Option A: Single global billing pipeline (simplest, highest compliance risk)

EU Devices ─────────────────────────────────────────────────────────┐
                                                                     ▼
US Devices ─────────────────────────────► Global billing pipeline ──► Invoice
                                         (US-based SaaS platform)
                                         ⚠ EU device billing events
                                           now in US infrastructure

Option B: Regional event stores, centralized invoicing

EU Devices ──► EU billing event store ──────────────────────────────┐
               (EU cloud region)                                     ▼
US Devices ──► US billing event store ──────────────────────────► Aggregation
               (US cloud region)                                  (per-region or
                                                                   federated)
                                                                     ▼
                                                                  Invoice

Option C: Self-hosted billing per deployment region (full isolation)

EU Devices ──► EU-hosted billing engine ──► EU invoice
               (customer's EU infra)

US Devices ──► US-hosted billing engine ──► US invoice
               (customer's US infra)

Option A is the path of least resistance. It creates GDPR cross-border transfer obligations for EU device data and requires Standard Contractual Clauses or equivalent with the billing vendor. It also puts EU patient data (for connected medical devices) in a non-EU environment, which is a BAA-adjacent problem.

Option B requires regional event stores and federated aggregation. This is the minimum viable architecture for platforms with EU deployments. The aggregation join across regional stores must be designed carefully — the aggregate of two regional totals is the invoice total, but the audit trail must remain in each region.

Option C is the architecture that self-hosted billing enables. Each deployment region runs its own billing engine. Billing data never leaves the region where it was generated. No cross-border transfer obligations, no SCC negotiation, no BAA required for the billing layer. The complexity cost is operational: two billing engine deployments to maintain instead of one.

For most IoT platforms with EU deployments, the practical path is to start with Option B using a SaaS billing vendor that has EU data residency support (a genuine EU region, not just EU-proxied to a US backend), and migrate to Option C as the compliance requirements or scale economics justify the operational overhead.


Pre-Production IoT Billing Pipeline Checklist

Before the first metered invoice goes out:

MQTT / protocol layer:

  • QoS 1 selected for all billing-relevant device telemetry
  • Message sequence number included in gateway message envelope (required for idempotency key)
  • Broker-received timestamp added to envelope alongside device timestamp
  • Device-to-tenant mapping is resolvable at the gateway (certificates provisioned, device registry populated)

Message queue:

  • Partition key is customer_id (not device ID, not topic)
  • Queue retention period ≥ max connectivity gap + 72-hour buffer
  • Consumer group offset management tested for crash recovery
  • Queue replay from arbitrary offset verified end-to-end

Billing consumer:

  • Idempotency key derived from message content, not assigned at ingestion
  • UniqueViolationError on duplicate write is handled silently (not logged as error)
  • Deduplication rate metric emitted per customer per hour
  • Late-arrival policy implemented and tested with synthetic late events
  • Clock skew detection threshold configured for your device fleet characteristics

Billing event store:

  • event_id PRIMARY KEY constraint enforced at DB level
  • quantity stored as DECIMAL(20,10), not FLOAT
  • timestamp is event time, not ingestion time
  • billing_period_id populated at write time based on late-arrival policy
  • Table partitioned by (customer_id, billing_period_id) for query performance

Aggregation and invoicing:

  • Fleet-level invoice aggregation and per-device audit trail are separate queries against the same table
  • Tiered pricing calculation tested against edge cases (exactly at tier boundary, fleet size changes mid-period)
  • Aggregation query includes clock skew tolerance window
  • Grace window enforcement tested: events beyond window are rejected before write attempt

Data residency:

  • EU device deployments routing to EU-region event store
  • GDPR cross-border transfer mechanism in place if using SaaS billing vendor with US infrastructure
  • Device hardware identifiers (MAC, IMEI) confirmed absent from billing event schema

ABAXUS runs inside your own Kubernetes cluster — IoT billing data stays in your own database, in your own cloud region, with no SaaS API throughput ceiling

ABAXUS runs inside your own Kubernetes cluster — IoT billing data stays in your own database, in your own cloud region, with no SaaS API throughput ceiling

Self-hosted billing engine with idempotent device-event ingestion, configurable connectivity gap policies, fleet-level aggregation, and real-time customer dashboards. Handles MQTT QoS 1 deduplication and multi-region deployments. No per-transaction fees.

See Pricing

When the Build vs. Buy Question Surfaces

After reading this article, some engineering teams will ask whether building this pipeline in-house is the right call — or whether a billing platform that handles the IoT-specific edge cases already exists.

The build vs. buy decision for IoT billing depends on three factors:

1. Throughput requirements. SaaS billing APIs are rate-limited in the hundreds of requests per second. Industrial IoT deployments generate millions of events per second. No SaaS billing vendor’s API survives a direct integration with a high-frequency IoT event stream. The choice is not “build vs. buy” but “build an in-cluster queue consumer that writes to a billing event store” vs. “use a self-hosted billing engine that ships this pipeline pre-built.”

2. Data residency requirements. A SaaS billing vendor with US-only infrastructure cannot satisfy GDPR data residency requirements for EU IoT deployments without additional transfer mechanisms. A self-hosted billing engine deployed in the customer’s own EU cloud region satisfies these requirements by design.

3. Long-term audit trail retention. SaaS billing platforms default to 12–24 months of data retention. HIPAA requires 6 years. SaMD post-market surveillance requires 10+. If your IoT product is in a regulated vertical, a SaaS billing vendor’s default retention policy will require you to archive billing data externally — which you then have to manage separately.

For a detailed cost comparison between self-hosted and SaaS billing options at various IoT billing volumes, see Self-Hosted vs. SaaS Billing Infrastructure: The Engineering Trade-Off Analysis.


Book an Architecture Review for Your IoT Billing Pipeline

IoT billing pipelines have specific failure modes — connectivity gaps, clock skew, MQTT deduplication, fleet-level aggregation — that require design decisions before the first device event reaches production. Getting these wrong creates billing inaccuracies that are discovered during customer invoice disputes, not during development.

ABAXUS offers 30-minute architecture reviews for engineering teams building IoT billing pipelines. In one session, we’ll work through:

  • Queue architecture — partition strategy, retention window sizing for your connectivity gap profile, consumer group design
  • Idempotency key construction — specific to your device protocol (MQTT message IDs, HTTP request IDs, CDC offsets, CoAP message tokens)
  • Connectivity gap policy — which of the three policy options fits your billing period structure, customer contract terms, and regulatory requirements
  • Clock skew tolerance — threshold calibration for your device fleet’s clock reliability characteristics
  • Data residency — whether your EU device deployments require regional event store separation, and what that means for your aggregation architecture

This is an engineering review, not a product demo. Come with your current pipeline design or a description of your device fleet and billing model.

Book your 30-minute IoT billing pipeline review →



ABAXUS is a self-hosted usage-based billing engine for IoT engineering teams. It runs inside your own Kubernetes cluster — handling MQTT-sourced device-event ingestion at IoT scale with idempotency, configurable connectivity gap policies, fleet-level aggregation, and multi-year audit trail retention — with all billing data in your own database and no per-transaction fees. See pricing · Book a pipeline review

FAQs

Stop debugging billing. Start shipping product.

Your billing layer should be invisible infrastructure. In 30 minutes we map your event sources, identify your data contract gaps, and show you exactly what fixing the architecture looks like. No sales pitch.