Credit billing infrastructure: how to build a credit system that scales for AI products

Credit billing infrastructure: how to build a credit system that scales for AI products

Mar 28, 2026

Arnon Shimoni

The first version of every credit system looks the same: a balance column in the users table, a function that decrements it on each API call, and a Stripe checkout session to top up. It takes a week to build, and it works.

It works juuuust until a customer asks why their credit balance doesn't match their usage, or alternatively until your finance asks how to recognize revenue on unused credits that expire next quarter. I've also seen an enterprise prospect asks for pooled credits across 50 seats with different exchange rates per model.

That first version isn't wrong per se, but it's definitely not a credit system. More like a balance field with a decrement/increment function.

The gap between that and billing infrastructure is where most AI companies lose 6-12 months of engineering time.

This guide makes sense of the architecture of credit billing systems that work at scale.

What a credit system actually is

A credit is not a payment method, a coupon, or a line item on a subscription invoice (despite what some implementations may make you think)

A credit is a unit of prepaid value that exists as a financial object with its own lifecycle: it's purchased, it's allocated, it's consumed against usage, and it's either depleted or it expires. At every stage, it has accounting implications. Purchased credits are deferred revenue (a liability). Consumed credits are recognized revenue, expired credits are breakage revenue, and ultimately each of these needs to be tracked, audited, and reconciled.

The 5 layers of credit billing infrastructure

A production-grade credit system has five distinct layers. Most implementations cover one or two. The gaps between them are where the engineering debt accumulates.

Layer 1: The credit ledger

The ledger is the source of truth for all credit balances and transactions. Every credit event (purchase, allocation, consumption, expiry, refund, transfer) is an immutable entry in the ledger.

What it needs to handle:

  • Transaction history per credit (not just current balance)

  • Multiple credit types (purchased vs. granted vs. promotional, each with different accounting treatment)

  • Immutable audit trail (append-only, no balance overwrites)

  • Idempotency (the same event processed twice doesn't double-decrement)

Where implementations fail: storing credits as a mutable balance field. When a customer disputes a charge and you can't show the transaction-by-transaction history of how their balance went from 10,000 to 0, you have a support problem. When your auditor asks the same question, you have a compliance problem.

Layer 2: Wallet architecture

Wallets are containers for credits. A simple system has one wallet per customer. A production system needs more.

What it needs to handle:

  • Multiple wallets per customer (purchased credits, promotional credits, enterprise-allocated credits, each with different rules)

  • Wallet priority (which wallet depletes first)

  • Pooled wallets (shared across a team or organization)

  • Per-user wallets within a pool (individual allowances drawn from a shared balance)

  • Wallet scoping (credits that can only be used for specific products or models)

Where implementations fail: one wallet per customer with no scoping or priority. This breaks the moment you run a promotion ("1,000 free credits for new users") alongside purchased credits. Which credits get consumed first? What happens when promotional credits expire but purchased credits don't? Without wallet priority and separate tracking, the answer is "whatever the code does," and what the code does is usually wrong from an accounting perspective.

Layer 3: Exchange rates and consumption rules

In AI billing, one credit rarely equals one unit of anything. A credit might buy 1,000 Haiku tokens or 100 Opus tokens. It might buy 10 images at standard resolution or 2 at high resolution. The exchange rate between credits and consumption depends on the product, the model, the tier, and sometimes the time of day.

What it needs to handle:

  • Multi-dimensional rate cards (credit cost varies by model, token type, resolution, etc.)

  • Rate versioning (when you change exchange rates, existing credits should honor original or updated rates, depending on policy)

  • Volume-based exchange rates (bulk pricing where the per-credit value changes at thresholds)

  • Currency conversion (credits denominated in USD consumed against compute priced in multiple currencies)

Where implementations fail: hard-coded exchange rates. When you launch a new model, add a token type, or change pricing, the exchange rate update requires a code change, a deploy, and a prayer that the migration doesn't break in-progress billing cycles. Every pricing change becomes an engineering project instead of a configuration change.

Layer 4: Real-time metering and burn-down

Credits are consumed in real time. A customer making API calls expects their balance to reflect usage as it happens, not tomorrow morning after the nightly reconciliation job runs.

What it needs to handle:

  • Sub-second balance updates as usage events stream in

  • Burn-down tracking (rate of consumption over time, projected depletion date)

  • Threshold alerts (notify customers and internal teams when balance drops below configurable levels)

  • Hard and soft caps (optionally block usage when credits are depleted, or allow overage and bill separately)

  • Late and out-of-order event handling (usage events that arrive after the billing period closes)

Where implementations fail: nightly batch reconciliation. The customer sees a balance of 5,000 credits, makes 6,000 credits worth of API calls, and gets a surprise overage bill. The internal team sees stale dashboards and can't identify runaway usage until the next morning. Real-time metering isn't a nice-to-have. It's the difference between a credit system customers trust and one they fight.

Layer 5: Financial reconciliation and reporting

This is the layer most engineering teams skip entirely, and it's the layer that finance teams care about most.

What it needs to handle:

  • Revenue recognition (deferred revenue on purchase, recognized on consumption, breakage on expiry)

  • Cost reconciliation (what did the consumption that burned these credits actually cost in compute infrastructure?)

  • Margin per customer (revenue from credits consumed minus infrastructure cost of that consumption)

  • Reporting by cohort, plan, model, and time period

  • Audit-ready transaction logs

Where implementations fail: treating billing and finance as separate problems. The engineering team builds a credit system that tracks balances. The finance team builds spreadsheets that track revenue. The two systems don't share data, so every month-end close requires manual reconciliation. At scale, this is a 2-week close cycle and a full-time role.

Credit system maturity model

Stage

What you have

What breaks next

v0: Balance field

A balance column, a decrement function, Stripe for top-ups

Customer disputes (no transaction history), promotional credits mixed with purchased, revenue recognition is manual

v1: Basic ledger

Transaction log, single wallet per customer, simple exchange rates

Enterprise asks for pooled credits, finance asks for deferred revenue reporting, pricing change requires deploy

v2: Multi-wallet

Multiple wallets with priority and expiry, configurable exchange rates

Real-time balance is stale (nightly batch), no cost reconciliation, custom code for every new model/tier

v3: Real-time

Real-time metering, burn-down tracking, threshold alerts

Finance still reconciles manually, margin per customer is unknown, but rate card changes still need some engineering to migrate

v4: Full infrastructure

All 5 layers connected. Ledger, wallets, exchange rates, real-time metering, and financial reconciliation in one system

Nothing breaks thankfully. You spend engineering time on product.

Most AI companies are at v0 or v1 when they realize credits are an infrastructure problem. The jump from v1 to v4 is either 6-12 months of engineering time or a platform migration.

Build vs. buy: the considerations you should make

The build-it-yourself path is tempting because v0 is genuinely easy. A week of work and you have credits. The problem is that credit systems don't stay at v0. They grow in complexity with your business, and the engineering cost compounds.

What "building it yourself" actually costs:

Component

Engineering time

Ongoing maintenance

Credit ledger with audit trail

2-4 weeks

Schema migrations as requirements change

Multi-wallet architecture

3-6 weeks

Every new wallet type is custom code

Exchange rate engine

2-4 weeks

Every pricing change is a code change

Real-time metering integration

4-8 weeks

Scaling, deduplication, late event handling

Financial reconciliation layer

4-8 weeks

Revenue recognition rules change; audit requirements evolve

Total

15-30+ weeks

2-3 engineers permanently

At $200K fully loaded cost per engineer, a homegrown credit system costs $300K-$600K to build and $200K-$400K per year to maintain. That's before you count the opportunity cost of those engineers not building product.

What "buying it" costs:

Platforms with native credit infrastructure range from open-source (Lago, free + your ops cost) to commercial (Solvimon, free up to $3M billed, then 0.40% for AI companies). The total cost at $10M ARR is $0-$40K/year, and you get all five layers on day one.

The build path makes sense if credits are a simple balance with a decrement function and will stay that way. If your credit system needs wallets, exchange rates, real-time metering, or financial reconciliation, the buy path saves 6+ months of engineering time and the ongoing maintenance cost.

What production credit infrastructure looks like

A well-architected credit system connects all five layers:

Usage eventMetering layer (ingests, deduplicates, aggregates) → Exchange rate engine (maps usage to credit consumption based on model/tier/rate card) → Wallet layer (selects correct wallet by priority, decrements with scoping rules) → Ledger (records immutable transaction) → Financial layer (updates deferred revenue, calculates margin, feeds reporting)

This pipeline runs on every API call, every token consumed, every image generated. At scale, it processes millions of events per day. Each event needs to complete this pipeline in real time so the customer sees an accurate balance and the finance team sees accurate revenue.

The platforms that handle this natively treat credits as a financial primitive in the billing architecture. Solvimon, for example, runs this entire pipeline as a built-in flow: credit wallets with full ledger logic, tiered exchange rates per model, real-time burn-down tracking, and revenue-per-customer next to cost-per-customer. The architecture was designed by the team that built Adyen's internal billing at €970B+ annual payment volume, where this kind of financial-grade pipeline is non-negotiable.

Open-source alternatives like Lago provide the ledger and wallet layers, with metering and financial reconciliation requiring more custom integration. Orb provides strong metering and block-based credits with pricing simulation. Each covers different layers at different depths.

Choosing your credit infrastructure

Three questions:

  1. How complex are your credit rules today? If it's a single balance with a flat exchange rate, V0 works. If you have (or will soon have) multiple credit types, model-specific rates, team pooling, or expiry rules, you need wallet architecture and exchange rate logic.

  2. How important is real-time accuracy? If customers check their balance before making API calls (they do), stale data creates support tickets and trust erosion. If your finance team needs intra-month revenue visibility (they will), nightly reconciliation isn't enough.

  3. Who owns credit system maintenance? Every billing engineer maintaining your credit system is an engineer not building your product. The total cost of ownership isn't the build cost. It's the build cost plus the permanent maintenance cost plus the opportunity cost of those engineers.

If the answer to any of these points toward complexity, real-time requirements, or engineering cost concerns, evaluate platforms that treat credits as infrastructure before building your own.

For a comparison of platforms that handle credit billing, see: Best billing systems for AI startups in 2026 →

For pricing model guidance, see: AI credit pricing models: how tokens, credits, and hybrid billing actually work →

Solvimon: credit wallets, token metering, hybrid billing. Built by the team that scaled Adyen to €970B+. Free up to $5M billed.